-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Engine Refactor Proposal #3752
base: develop
Are you sure you want to change the base?
[RFC] Engine Refactor Proposal #3752
Conversation
Signed-off-by: Ashwin Vaidya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pulling back my initial design and reviewing it. I believe there are pros and cons to both the current and past designs, and while the this PR's design maximizes scalability, but This looks quite different from Engine's current policy. There still seems to be a lot to discuss.
AutoEngine
is not the right name in my opinion, I think it would be better to keepEngine
but define each Backend's SubEngine as**Engine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the main focus of this PR is refactoring for Engine-related interfaces and scalability.
Currently, our CLI already has the functionality of this code, so if this PoC is going to merge into develop, we need to look at compatibility with our existing CLI code.
I think the following discussion should be made to the existing CLI code with this changes.
- We need to make sure that our configuration allows us to receive
DataModule
other thanOTXDataModule
. -> I think this is possible with some tweaks toadd_argument
. - We need to think about how to build
Dataset
from external frameworks other thanDataModule
in the CLI.
Anyway, for the CLI part, I think it's better to wait until Engine's design is finalized and see if there are any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the main intention of this PR is to propose the change to the Engine class. I included CLI for completeness, and also to ensure that the new design does not break it.
from otx.engine.base import METRICS, Adapter | ||
|
||
|
||
def wrap_to_anomalib_datamodule(datamodule: OTXDataModule) -> AnomalibDataModule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this?
Can't we just use anomalib.data.Folder
?
(like https://github.com/openvinotoolkit/anomalib/blob/main/configs/data/folder.yaml)
|
||
class AnomalibAdapter(Adapter): | ||
def __init__(self): | ||
self._engine = AnomalibEngine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like the Engine
is holding the Adapter
and the Adapter is holding the Engine
in turn. Even if they have the same name, wouldn't it be better to use a different name to distinguish them (e.g. trainer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the naming is super tricky here. The problem is that internally, Anomalib Engine also has a trainer. So currently, it is Engine->Adapter->Engine (anomalib)->Trainer
. Which isn't the best. I am open to alternatives.
def train( | ||
self, | ||
model: OTXModel | AnomalyModule | Model, | ||
datamodule: OTXDataModule | AnomalibDataModule | UltralyticsDataset, | ||
**kwargs, | ||
) -> METRICS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the arguments for each function in Engine
are very well thought out (per each commands), so I think we should try to keep the existing arguments as much as possible.
the guiding principle of the OTX Engine is 1 Model = 1 Engine
. -> I don't currently see any reason to break this principle.
Therefore, I think it is correct to go to __init__
to receive the model
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def train(
self,
max_epochs: int = 10,
seed: int | None = None,
deterministic: bool | Literal["warn"] = False,
precision: _PRECISION_INPUT | None = "32",
callbacks: list[Callback] | Callback | None = None,
logger: Logger | Iterable[Logger] | bool | None = None,
resume: bool = False,
metric: MetricCallable | None = None,
run_hpo: bool = False,
hpo_config: HpoConfig = HpoConfig(), # noqa: B008 https://github.com/omni-us/jsonargparse/issues/423
checkpoint: PathLike | None = None,
adaptive_bs: Literal["None", "Safe", "Full"] = "None",
**kwargs,
) -> dict[str, Any]:
I like the current way of doing things better, where we clearly define what we need for each command as an argument to each function, as shown above. Also, I don't see any reason to change the model through the function.
Current OTX: Engine is created (Model is determined) -> E2E flow is used as this model.
This PR's Design: There's room for the Model to change with every command. -> Do we need this use-case?
I believe the above discussion was discussed and decided upon in the first Engine design proposal, and I don't see any reason to change it at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I prefer explicit arguments as well. I didn't include all of them here as I wanted to focus on the adapter. I think we should have explicit arguments also so that these will be serialised by the CLI when creating the YAML file. Maybe this will get polished after a few iterations of discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see, and the reason I commented was that I thought it would be a good idea to avoid breaking existing Engine Usage as much as possible.
def adapter(self, adapter: adapter) -> None: | ||
self._adapter = adapter | ||
|
||
def get_adapter(self, model: OTXModel | AnomalyModule | Model) -> adapter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to separate adapters into models. I think it would be better to be able to distinguish between adapters and models and have them be received as Engine's __init__
, but what do you think?
-> I believe that Adapters and Models should be somewhat independent.
def __init__(
self,
...
backends = Literal["base", "anomalib", "..."] = "base",
...
) -> None:
logger = logging.getLogger(__name__) | ||
|
||
|
||
class Engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the current structure is an implementation that doesn't take auto-configuration into account at all. Is auto-configuration considered in the design?
In my opinion, Auto-Configuration is one of OTX Key-Features.
) | ||
|
||
|
||
class UltralyticsTrainer(BaseTrainer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this class separate from Adapter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid any mistakes, I'm going to Request Changes to this PR first.
@harimkang an alternative to Adapter can be some sort of factory class. Since we pass model to the from abc import ABC, abstractmethod
from typing import Any
from anomalib.models import AnomalyModule
from lightning.pytorch import LightningModule
class BaseEngine(ABC):
@abstractmethod
def train(self, **kwargs):
pass
@classmethod
@abstractmethod
def is_valid_model(cls, model: Any) -> bool:
pass
class EngineRegistry:
engines = set()
@classmethod
def register(cls):
def decorator(engine_class: BaseEngine):
cls.engines.add(engine_class)
return engine_class
return decorator
@EngineRegistry.register()
class AnomalibEngine(BaseEngine):
def __init__(self, model: AnomalyModule, **kwargs):
print("Anomalib Engine")
def train(self, **kwargs):
pass
@classmethod
def is_valid_model(cls, model: Any) -> bool:
return isinstance(model, AnomalyModule)
@EngineRegistry.register()
class LightningEngine(BaseEngine):
def __init__(self, model: LightningModule, **kwargs):
print("Lightning Engine")
def train(self, **kwargs):
pass
@classmethod
def is_valid_model(cls, model: Any) -> bool:
return isinstance(model, LightningModule) Then we can initialise the right engine something like this. Here class Engine:
def __new__(
cls, model: Any, **kwargs
) -> BaseEngine: # pass all init parameters to this
for engine in EngineRegistry.engines:
if engine.is_valid_model(model):
return engine(model, **kwargs) And to test class MockAnomalyModule(AnomalyModule):
def __init__(self):
pass
def learning_type(self):
pass
def trainer_arguments(self):
pass
class MockLightningModule(LightningModule):
def __init__(self):
pass
if __name__ == "__main__":
model = MockAnomalyModule()
engine = Engine(model)
assert isinstance(engine, AnomalibEngine)
model = MockLightningModule()
engine = Engine(model)
assert isinstance(engine, LightningEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to use https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/models/image/padim/torch_model.py instead of lightning model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what components are included in anomalib trainer so as to customize OTX engine?
@wonjuleee good question. The separation of torch model in Anomalib was something we had designed initially to make the model export easier. However, over time the export mechanism has changed in Anomalib, and we are looking into embedding the post-processing in the exported model. Additionally, the Anomalib lightning model stores certain metadata that is necessary during different stages of the pipeline. These itself are configured from callbacks in the engine. The callbacks are hidden here so that the users do not modify them inadvertently. While current OTX Anomaly models inherit from Anomalib's models (like Padim), they also need these necessary callbacks to work. While these are present in the task, they are quite limited and cannot be configured.
The lightning model also stores model specific transforms that are used to configure the datamodule if required. Hope this clarifies your question. |
TL;DR
This PoC aims to propose and discuss refactoring of the current OTX Engine. It will help make the task more flexible, reduce code duplication, and help directly expose all features of the upstream framework. The code changes here are at a very high level.
Intro
This is a very limited PoC (think of it like pseudo-code) to get the discussions rolling. It is not as extensive in features as what Harim had made last year (https://github.com/harimkang/training_extensions/tree/harimkan/feature-v2-ultralytics/src/otx/v2/adapters/torch/ultralytics)
The main reason is that Anomalib Engine does a lot of things under-the-hood which would require major effort when implementing in OTX. I'll share the presentation in Teams. It would be nice to get feedback on this to flesh it out further. Also, we can schedule 1:1s to discuss any concerns and modify accordingly.
Motivation
Approach
AutoEngine
as proposed by Harim in his initial design. see:engine_poc.py
AnomalibEngine
to directly use the Anomalib's own engine for training.Outcome
API Example
Single Engine as entrypoint for all backends.
Allows us to not only directly use all the models of the other framework but also allows us to use its dataloaders.
Ultralytics example just to show that we can extend it to other frameworks.
Current OTX models are compatible and use the same entry-points.
CLI Registration
If the
Engine
class is decorated correctly to accept models from different frameworks, we can directly expose these models to thecli
Models from the upstream repos are automatically exposed, and can work with OTX