Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RegNets to tf.keras.applications #15702

Closed
wants to merge 12 commits into from

Conversation

AdityaKane2001
Copy link
Contributor

@AdityaKane2001 AdityaKane2001 commented Nov 25, 2021

Please see #15240 and #15419.

Progress so far:

X variant:

Model Paper (in %) Ours (in %) Diff (in %) Comments
X002 68.9 67.15 1.75 adamw, area_factor=0.25
X004 72.6 71.22 1.38 adamw, area_factor=0.08
X006 74.1 72.37 1.73 adamw, area_factor=0.08
X008 75.2 73.45 1.75 adamw, area_factor=0.08
X016 77 75.55 1.45 adamw, area_factor=0.08, mixup=0.2
X032 78.3 77.09 1.21 adamw, area_factor=0.08, mixup=0.2
X040 78.6 77.87 0.73 adamw, area_factor=0.08, mixup=0.2
X064 79.2 78.22 0.98 adamw, area_factor=0.08, mixup=0.3
X080 79.3 78.41 0.89 adamw, area_factor=0.08, mixup=0.3
X120 79.7 79.09 0.61 adamw, area_factor=0.08, mixup=0.4
X160 80 79.53 0.47 adamw, area_factor=0.08, mixup=0.4
X320 80.5 80.35 0.15 adamw, area_factor=0.08, mixup=0.4

Y variant:

Paper (in %) Ours (in %) Diff (in %) Comments
Y002 70.3 68.51 1.79 adamw, WD=1e-5, area_factor=0.16 mixup=0.2
Y004 74.1 72.11 1.99 adamw, area_factor=0.16, mixup=0.2,WD=1e-5
Y006 75.5 73.52 1.98 adamw, area_factor=0.16, mixup=0.2
Y008 76.3 74.48 1.82 adamw, area_factor=0.16, mixup=0.2
Y016 77.9 76.95 0.95 adamw, area_factor=0.08, mixup=0.2
Y032 78.9 78.05 0.85 adamw, area_factor=0.08, mixup=0.2
Y040 79.4 78.2 1.2 adamw, area_factor=0.08, mixup=0.2
Y064 79.9 78.95 0.95 adamw, area_factor=0.08, mixup=0.3
Y080 79.9 79.11 0.69 adamw, area_factor=0.08, mixup=0.3
Y120 80.3 79.45 0.85 adamw, area_factor=0.08, mixup=0.4
Y160 80.4 79.71 0.69 adamw, area_factor=0.08, mixup=0.4
Y320 80.9 80.12 0.78 adamw, area_factor=0.08, mixup=0.4

/cc @fchollet @sayakpaul @qlzh727
/auto Closes #15240.

@innat
Copy link

innat commented Nov 25, 2021

@AdityaKane2001
Wondering, looks like it's the first time we're going to have lots of variants of the same model in tf.keras.applications. Great job anyway.

@AdityaKane2001
Copy link
Contributor Author

@innat

Yes, that is the case. Thank you :)

@MrinalTyagi
Copy link

@AdityaKane2001 Wondering, looks like it's the first time we're going to have lots of variants of the same model in tf.keras.applications. Great job anyway.

@innat Isn't addition of ResNet18 and 34 possible in similar fashion?

@innat
Copy link

innat commented Nov 26, 2021

@MrinalTyagi
I agree. But I don't know why these models aren't there. There're some requests 1, 2, 3, 4 are pending. Maybe there're some criteria that I'm unaware of.

@gbaned gbaned added the keras-team-review-pending Pending review by a Keras team member. label Nov 26, 2021
Copy link
Contributor

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for opening this PR! I have two tiny suggestions to slightly improve readability.

keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
@AdityaKane2001
Copy link
Contributor Author

@lgeiger Always appreciated! Thanks for the changes. I'll merge them tomorrow.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left a few comments. Do you have the weights for the applications available online somewhere?

keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
keras/applications/regnet.py Outdated Show resolved Hide resolved
@AdityaKane2001
Copy link
Contributor Author

AdityaKane2001 commented Dec 1, 2021

@mattdangerw

Thanks for the review! Made requested changes.

Do you have the weights for the applications available online somewhere?

I am still training these models, and I'm updating the tables at the start of the thread accordingly. I'll test the loading code parallelly.

@mattdangerw mattdangerw removed the keras-team-review-pending Pending review by a Keras team member. label Dec 2, 2021
@AdityaKane2001
Copy link
Contributor Author

AdityaKane2001 commented Dec 10, 2021

@fchollet @mattdangerw

I have completed training of all the models. I have updated the tables at the start of the thread accordingly. There are a couple of things I want to bring to your notice:

  1. model.predict(X_test) does not work with grouped convolutions on CPU. For some reason, model(X_test) works flawlessly. Thus, I have updated the application_load_weight_test.py file accordingly. If needed I'll open an issue for the same.
  2. All models are with 2% of the accuracies mentioned in the paper. Larger models are within 1%.

/cc @sayakpaul

@google-ml-butler google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Dec 16, 2021
@AdityaKane2001
Copy link
Contributor Author

AdityaKane2001 commented Dec 16, 2021

@mattdangerw @fchollet

I have made the requested changes. Please run the workflow again.

I wanted to provide performance metrics for all the models as on keras.io/applications. However, I am not able to spin up a VM instance of the given specs1 on Google Cloud.

Could you please tell how to go about this?

/cc @sayakpaul

Footnotes

  1. CPU: AMD EPYC Processor (with IBPB) (92 core) - Ram: 1.7T - GPU: Tesla A100

@AdityaKane2001
Copy link
Contributor Author

@fchollet @mattdangerw

Could you please take a look at this one? TIA

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Dec 21, 2021
@AdityaKane2001
Copy link
Contributor Author

Thanks for the approval!

@sayakpaul
Copy link
Contributor

@mattdangerw could you also provide an update on what do we do about this?

I wanted to provide performance metrics for all the models as on keras.io/applications. However, I am not able to spin up a VM instance of the given specs1 on Google Cloud.

@mattdangerw
Copy link
Member

Yeah, re ideal machine for metrics, particularly the performance per step numbers, I am not sure. This is probably a question for @fchollet. We might take a bit to get back to you on this given it's the holidays and a lot of the team is out.

In the mean time, I think we can move ahead trying to land this PR, as the table update will be a separate change to keras.io anyway.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've uploaded weights and gen'd api files for this PR, things are looking overall good. However the change to applications_load_weight_tests is breaking right now. Commented on the line. Is that change necessary? Thanks!

@@ -115,7 +125,7 @@ def test_application_pretrained_weights_loading(self):
self.assertShapeEqual(model.output_shape, (None, _IMAGENET_CLASSES))
x = _get_elephant(model.input_shape[1:3])
x = app_module.preprocess_input(x)
preds = model.predict(x)
preds = model(x).numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this needs to be updated? Do things break otherwise? We still run these test cases in TF1 without eager mode enabled, and this line is breaking a number of our tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw

Yes, the change is necessary. Grouped convolutions are not yet fully supported on CPUs. We see that model.predict(X_test) breaks whereas model(X_test) works fine.

There are a number of issues discussing this in the TF repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could trigger the failures by adding a call to tf.compat.v1.disable_v2_behavior() after before the call to tf.test.main in applications_load_weight_test. We can't submit this if we are breaking all these application tests in a TF1 context. We would need to find a change that does not rely on eager mode behavior (.numpy is eager only).

This might mean we need to dig into the difference between direct call vs predict here. It sound like this is an issue with grouped convolutions on CPU that will only appear when compiling a tf.function, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw

I have found a small solution to this. I have tested the change both with TF2 and using tf.compat.v1.disable_v2_behavior() and it works on my end. Could you please take a look and run the workflow again?

x = app_module.preprocess_input(x)
try:
preds = model.predict(x) # Works in TF1
except:
preds = model(x).numpy() # Works in TF2
names = [p[1] for p in app_module.decode_predictions(preds)[0]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look next week! Sorry for the delayed reply, most of the team is out this week. The proposed change would still not submit, because that fallback (the numpy call) would still be run in a TF1 context for the regnet load weights test unless we disable it.

Overall, I think our options are...

  1. Disable the load weights test for regnet (without removing the predict call here), and follow up with a fix.
  2. Fix the underlying CPU/compiled function/grouped convolution issue, and then land this PR.
  3. Work around the bug for regnets somehow (the conversation here suggests that using jit_compile=True may allow CPU to work, which might give us a way forward).

I would say 3) would be the way to go if we can make it work. We really do want the load weights tests to test the compile predicted function (that's how these will often be used!), and shipping regnets such that predict will fail on CPU by default is not a great out of box experience.

Will follow up next week when people are back in office. Thanks!

Copy link
Contributor Author

@AdityaKane2001 AdityaKane2001 Dec 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw

Thanks for the detailed explanation.

I guess (2) is not something which can be done in the Keras codebase, as the error is thrown in tensorflow/tensorflow/core/kernels/conv_ops_fused_impl.h. I'll open an issue in the TF repo regarding this. So I agree that (3) might be the best option.

Lastly, wish you very happy new year!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw

Could you please take a look at this one? TIA

/cc @fchollet @qlzh727

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still here on this! We think we have found a good workaround (option 3), forcing XLA compilation grouped convolutions. #15868

Once that lands (assuming that doesn't run into road blocks), we can submit this without modifying the predict call in load weights tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdangerw

Thanks a lot for this! Really appreciate it.

@google-ml-butler google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Dec 24, 2021
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Dec 25, 2021
@qlzh727 qlzh727 removed the keras-team-review-pending Pending review by a Keras team member. label Jan 4, 2022
@AdityaKane2001
Copy link
Contributor Author

@mattdangerw

Thanks a ton for #15868!

I have tested the code on my end and rolled back 851ca16 in cf25748.

copybara-service bot pushed a commit that referenced this pull request Jan 12, 2022
@AdityaKane2001
Copy link
Contributor Author

Today these models were pushed to the official docs. I sincerely thank the Keras team for allowing me to add these models. Huge thanks to the TPU Research Group (TRC) for providing TPUs for the entire duration of this project, without which this would not have been possible. Thanks a lot to @fchollet for allowing this and guiding me throughout the process. Thanks to @qlzh727 for his guidance in building Keras from source on TPU VMs. Thanks to @mattdangerw for his support regarding grouped convolutions. Special thanks to @lgeiger for his contributions to the code. Last but not least, thanks a ton to @sayakpaul for his continuous guidance and encouragement.

@mattdangerw
Copy link
Member

Congrats on getting it landed and thanks for all the hard work on this! This is great to have!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interested in adding RegNets to tf.keras.applications
9 participants