-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'MDAnalysis.analysis.density' parallelization #4729
Conversation
added backend and aggregator
Added client_DensityAnalysis
Added client_DensityAnalysis to tests
Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-12-18 19:05:59 UTC |
Linter Bot Results:Hi @talagayev! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4729 +/- ##
===========================================
- Coverage 93.65% 93.63% -0.03%
===========================================
Files 177 189 +12
Lines 21774 22845 +1071
Branches 3064 3064
===========================================
+ Hits 20393 21391 +998
- Misses 929 1002 +73
Partials 452 452 ☔ View full report in Codecov by Sentry. |
Three aspects in my fix.
|
@yuxuanzhuang could you please raise issues for the updates to the parallelization documentation (#4729 (comment)) — I think that's very important to make clear, given that we are now repeatedly running in these issues. Thank you! |
@yuxuanzhuang thanks, the illustration is indeed wrong, as well as the accompanying text. I fixed them in #4760 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've been figuring out: data structures need to be laid out in __init__
, thus the size of _grid
needs to be set up before _prepare
(which is executed in each worker process).
@@ -465,7 +470,7 @@ def _prepare(self): | |||
grid, edges = np.histogramdd(np.zeros((1, 3)), bins=bins, | |||
range=arange, density=False) | |||
grid *= 0.0 | |||
self._grid = grid | |||
self.results._grid = grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall grid shape needs to be determined in __init__
, then each worker can have its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, yes that makes sense.
I think there was also an additional case, where I encountered this problem with the parallelization;
I will try to apply the same fix there to see if it works.
Other than that I added the Changelog and the docs, so should be ready to go :)
added changelog
pep + docs addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine.
I am adding a single comment, otherwise my comments on the code are just observations.
@@ -412,7 +420,6 @@ def __init__(self, atomgroup, delta=1.0, | |||
self._ydim = ydim | |||
self._zdim = zdim | |||
|
|||
def _prepare(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right change.
I am noticing that the necessary changes for parallelization really shift a lot of what _prepare()
was doing back into __init__
. I guess that's ok and it's better than introducing yet another method like _prepare_init()
... We probably need to review the docs for writing analysis code with AnalysisBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment so that future readers know why we are doing everything here
and not in _prepare()
.
def _prepare(self): | |
# The grid with its dimensions has to be set up in __init__ | |
# so that parallel analysis works correctly: each process | |
# needs to have a `results._grid` of the same size and the | |
# same `_bins`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manually added to the file, suggestions don't work on deleted lines
assert D.results.density.grid.shape == (8, 12, 17) | ||
|
||
def test_warn_userdefn_padding(self, universe): | ||
def test_warn_userdefn_padding(self, universe, client_DensityAnalysis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure if we need to test warnings for all parallel modalities, too, but it's the safe approach and you've already converted all tests so let's leave it in. Just something to think about in the future if our test run times are too long (again...).
Fixes #4677
Changes made in this Pull Request:
DensityAnalysis
inanalysis.density
client_DensityAnalysis
inconftest.py
client_DensityAnalysis
to the tests intest_density.py
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4729.org.readthedocs.build/en/4729/