You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a short writeup of a fundamental issue that Vernon keeps wrestling with.
The recurring question
Here's a visual way of describing the problem, with two examples. In both of these examples, there is a core experiment and there are two optional extensions. (I'm using "extension" in the generic sense -- sometimes it refers to a mixin, sometimes it refers to an alternate version of an experiment). In both cases, additional code needs to be written to handle the scenario where the two extensions are used together. The fundamental question is: where should this Extension1 x Extension2 code live?
Example 1: Distributed experiments with extra validations
Each rectangle in this picture denotes code that needs to live somewhere.
Some experiments need to aggregate validation results. Some experiments need to run multiple validations per epoch. Some do both, and need to also aggregate the results from these extra validations.
I solved this specific example by creating a separate vernon.distributed module. (More on this later, see "Option 4" below.)
Example 2: Fixed duration experiments with varying batch size
(Background: In order to use certain learning schedules like OneCycleLR, our experiments need to have a fixed number of optimizer steps. In a future change, I want to make fixed schedules optional, no longer part of the core experiment.)
Some experiments have a fixed number of optimizer steps. Some experiments use different batch sizes in different epochs. Some do both, and need extra logic to compute the correct number of total optimizer steps.
The solution space
Option 1: Implement all the methods, don't worry about it
This is a hacky solution that leaves a bunch of broken methods hanging off of the experiment class. The goal of Vernon is to make it easy to create any experiment class you want while reusing code. Most people do not want experiment classes that have a bunch of dangling broken methods. Also, this will break our Interface functionality; in the current code, the experiment class will fail to instantiate if you use a mixin with distributed logic with a non-distributed experiment class.
Option 2: Create different versions of mixins that implement additional interfaces
This is architecturally a good solution, but it provides a bad user interface. The user needs to carefully choose which variant of each mixin to use. When running distributed, they must use ExtraValidationsDistributed, not ExtraValidations, but they can use LogEveryLearningRate, but notLogEveryLoss (instead, they should use LogEveryLossDistributed). This will be error-prone, even for people who are Vernon experts.
Option 3: Do option 2, and add error checks to make sure you're not using the wrong mixin
This slightly improves the user interface, but it would essentially be a guess-and-check API, and it would add a bunch of irrelevant error checks. Any mixin that has a Distributed version is punished; it is forced to take on complexity, checking whether the self is Distributed. But the whole point of this base mixin is that it is supposed to be oblivious of the existence of a Distributed version.
Option 4: Do option 2, but also create a separate vernon module (e.g. vernon.distributed) that makes choosing the right Distributed mixin easy
This works great, but you can only use this trick once. If you choose to use this trick for the Distributed interface, you can't also use it for the FixedDurationExperiment interface.
Option 5: Do option 2, and introduce library functionality for properly creating an Experiment class
Here's one quick sketch of what this API might look like.
This solution will work, but it changes the feel of using Vernon. Dependencies would need to be registered somewhere. This ExperimentClass function would have built-in logic that chooses the appropriate mixin classes given the context. Rather than relying on the Python syntax of class creation, the user needs to learn our API, though we could choose to make the API feel similar to Python class creation, as in the above code. This change would preserve Vernon's current core architecture, but it would change the experience of creating an experiment.
The text was updated successfully, but these errors were encountered:
I like this distillation of the fundamental issue Vernon has been struggling with.
You listed two examples. Are these the main ones you've been considering? Could you briefly list a few more?
I'm curious to get a better scope of the problem. I can appreciate the general issue you laid out here, but my mind is having a hard time thinking of more examples.
That said, if the problem constantly recurring, I could see option 5 being a good way to go. Most of all, it seems more flexible and adaptable as we go. Currently, we just cater to combinations of Distributed + OtherMixin, but moving forward we could have unforeseen combinations as you describe in example 2. Moving along we can throw errors when certain combinations of mixins aren't supported and gradually expand what we do support. I imagine this is what you have in mind, correct?
Here's a short writeup of a fundamental issue that Vernon keeps wrestling with.
The recurring question
Here's a visual way of describing the problem, with two examples. In both of these examples, there is a core experiment and there are two optional extensions. (I'm using "extension" in the generic sense -- sometimes it refers to a mixin, sometimes it refers to an alternate version of an experiment). In both cases, additional code needs to be written to handle the scenario where the two extensions are used together. The fundamental question is: where should this
Extension1 x Extension2
code live?Example 1: Distributed experiments with extra validations
Each rectangle in this picture denotes code that needs to live somewhere.
Some experiments need to aggregate validation results. Some experiments need to run multiple validations per epoch. Some do both, and need to also aggregate the results from these extra validations.
I solved this specific example by creating a separate
vernon.distributed
module. (More on this later, see "Option 4" below.)Example 2: Fixed duration experiments with varying batch size
(Background: In order to use certain learning schedules like OneCycleLR, our experiments need to have a fixed number of optimizer steps. In a future change, I want to make fixed schedules optional, no longer part of the core experiment.)
Some experiments have a fixed number of optimizer steps. Some experiments use different batch sizes in different epochs. Some do both, and need extra logic to compute the correct number of total optimizer steps.
The solution space
Option 1: Implement all the methods, don't worry about it
This is a hacky solution that leaves a bunch of broken methods hanging off of the experiment class. The goal of Vernon is to make it easy to create any experiment class you want while reusing code. Most people do not want experiment classes that have a bunch of dangling broken methods. Also, this will break our Interface functionality; in the current code, the experiment class will fail to instantiate if you use a mixin with distributed logic with a non-distributed experiment class.
Option 2: Create different versions of mixins that implement additional interfaces
This is architecturally a good solution, but it provides a bad user interface. The user needs to carefully choose which variant of each mixin to use. When running distributed, they must use
ExtraValidationsDistributed
, notExtraValidations
, but they can useLogEveryLearningRate
, but notLogEveryLoss
(instead, they should useLogEveryLossDistributed
). This will be error-prone, even for people who are Vernon experts.Option 3: Do option 2, and add error checks to make sure you're not using the wrong mixin
This slightly improves the user interface, but it would essentially be a guess-and-check API, and it would add a bunch of irrelevant error checks. Any mixin that has a Distributed version is punished; it is forced to take on complexity, checking whether the
self
is Distributed. But the whole point of this base mixin is that it is supposed to be oblivious of the existence of a Distributed version.Option 4: Do option 2, but also create a separate vernon module (e.g. vernon.distributed) that makes choosing the right Distributed mixin easy
This works great, but you can only use this trick once. If you choose to use this trick for the Distributed interface, you can't also use it for the FixedDurationExperiment interface.
Option 5: Do option 2, and introduce library functionality for properly creating an Experiment class
Here's one quick sketch of what this API might look like.
This solution will work, but it changes the feel of using Vernon. Dependencies would need to be registered somewhere. This
ExperimentClass
function would have built-in logic that chooses the appropriate mixin classes given the context. Rather than relying on the Python syntax of class creation, the user needs to learn our API, though we could choose to make the API feel similar to Python class creation, as in the above code. This change would preserve Vernon's current core architecture, but it would change the experience of creating an experiment.The text was updated successfully, but these errors were encountered: