-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding RegNets to tf.keras.applications #15702
Conversation
@AdityaKane2001 |
Yes, that is the case. Thank you :) |
@innat Isn't addition of ResNet18 and 34 possible in similar fashion? |
@MrinalTyagi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for opening this PR! I have two tiny suggestions to slightly improve readability.
@lgeiger Always appreciated! Thanks for the changes. I'll merge them tomorrow. |
Co-authored-by: Lukas Geiger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left a few comments. Do you have the weights for the applications available online somewhere?
Thanks for the review! Made requested changes.
I am still training these models, and I'm updating the tables at the start of the thread accordingly. I'll test the loading code parallelly. |
I have completed training of all the models. I have updated the tables at the start of the thread accordingly. There are a couple of things I want to bring to your notice:
/cc @sayakpaul |
I have made the requested changes. Please run the workflow again. I wanted to provide performance metrics for all the models as on keras.io/applications. However, I am not able to spin up a VM instance of the given specs1 on Google Cloud. Could you please tell how to go about this? /cc @sayakpaul Footnotes
|
Could you please take a look at this one? TIA |
Thanks for the approval! |
@mattdangerw could you also provide an update on what do we do about this?
|
Yeah, re ideal machine for metrics, particularly the performance per step numbers, I am not sure. This is probably a question for @fchollet. We might take a bit to get back to you on this given it's the holidays and a lot of the team is out. In the mean time, I think we can move ahead trying to land this PR, as the table update will be a separate change to keras.io anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've uploaded weights and gen'd api files for this PR, things are looking overall good. However the change to applications_load_weight_tests is breaking right now. Commented on the line. Is that change necessary? Thanks!
@@ -115,7 +125,7 @@ def test_application_pretrained_weights_loading(self): | |||
self.assertShapeEqual(model.output_shape, (None, _IMAGENET_CLASSES)) | |||
x = _get_elephant(model.input_shape[1:3]) | |||
x = app_module.preprocess_input(x) | |||
preds = model.predict(x) | |||
preds = model(x).numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this needs to be updated? Do things break otherwise? We still run these test cases in TF1 without eager mode enabled, and this line is breaking a number of our tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the change is necessary. Grouped convolutions are not yet fully supported on CPUs. We see that model.predict(X_test)
breaks whereas model(X_test)
works fine.
There are a number of issues discussing this in the TF repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could trigger the failures by adding a call to tf.compat.v1.disable_v2_behavior()
after before the call to tf.test.main
in applications_load_weight_test
. We can't submit this if we are breaking all these application tests in a TF1 context. We would need to find a change that does not rely on eager mode behavior (.numpy is eager only).
This might mean we need to dig into the difference between direct call vs predict here. It sound like this is an issue with grouped convolutions on CPU that will only appear when compiling a tf.function, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found a small solution to this. I have tested the change both with TF2 and using tf.compat.v1.disable_v2_behavior()
and it works on my end. Could you please take a look and run the workflow again?
keras/keras/applications/applications_load_weight_test.py
Lines 127 to 132 in 851ca16
x = app_module.preprocess_input(x) | |
try: | |
preds = model.predict(x) # Works in TF1 | |
except: | |
preds = model(x).numpy() # Works in TF2 | |
names = [p[1] for p in app_module.decode_predictions(preds)[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look next week! Sorry for the delayed reply, most of the team is out this week. The proposed change would still not submit, because that fallback (the numpy call) would still be run in a TF1 context for the regnet load weights test unless we disable it.
Overall, I think our options are...
- Disable the load weights test for regnet (without removing the predict call here), and follow up with a fix.
- Fix the underlying CPU/compiled function/grouped convolution issue, and then land this PR.
- Work around the bug for regnets somehow (the conversation here suggests that using
jit_compile=True
may allow CPU to work, which might give us a way forward).
I would say 3) would be the way to go if we can make it work. We really do want the load weights tests to test the compile predicted function (that's how these will often be used!), and shipping regnets such that predict will fail on CPU by default is not a great out of box experience.
Will follow up next week when people are back in office. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
I guess (2) is not something which can be done in the Keras codebase, as the error is thrown in tensorflow/tensorflow/core/kernels/conv_ops_fused_impl.h. I'll open an issue in the TF repo regarding this. So I agree that (3) might be the best option.
Lastly, wish you very happy new year!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still here on this! We think we have found a good workaround (option 3), forcing XLA compilation grouped convolutions. #15868
Once that lands (assuming that doesn't run into road blocks), we can submit this without modifying the predict call in load weights tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this! Really appreciate it.
PiperOrigin-RevId: 421338796
Today these models were pushed to the official docs. I sincerely thank the Keras team for allowing me to add these models. Huge thanks to the TPU Research Group (TRC) for providing TPUs for the entire duration of this project, without which this would not have been possible. Thanks a lot to @fchollet for allowing this and guiding me throughout the process. Thanks to @qlzh727 for his guidance in building Keras from source on TPU VMs. Thanks to @mattdangerw for his support regarding grouped convolutions. Special thanks to @lgeiger for his contributions to the code. Last but not least, thanks a ton to @sayakpaul for his continuous guidance and encouragement. |
Congrats on getting it landed and thanks for all the hard work on this! This is great to have! |
Please see #15240 and #15419.
Progress so far:
X variant:
Y variant:
/cc @fchollet @sayakpaul @qlzh727
/auto Closes #15240.