Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

LennyN95
Copy link

@LennyN95 LennyN95 commented Nov 22, 2024

This PR resolves #2556.

  • Implementation of preprocessing based on a single process
  • Introduce new environment variables to set -npp and -nps

We have done some internal testing and get identical results between the latest nnunetv2==2.5.1 and with our proposed patch.

surajpaib and others added 6 commits November 21, 2024 08:15
New environment variables:
- nnUNet_npp
- nnUNet_nps

Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.
@LennyN95
Copy link
Author

@FabianIsensee I have noticed some (significant) differences between nnunetv2==2.0 and nnunetv2==2.5.1. This is beyond the scope of this topic / PR, but since some of our contributors have used this version, we'd like to see if and to what extent we can offer them a solution as well. In general, I am curious if you have any idea what changes might have led to these differences (we are talking about a Dice score of 0.9451 between these versions for a lung nodule segmentation task).

@FabianIsensee
Copy link
Member

Hey, thanks for the PR and sorry for being slow. I have too much on my plate.
One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

@LennyN95
Copy link
Author

Hey @FabianIsensee thanks for the reply and no worries!

One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Good point! @surajpaib Looks like we can combine it to output each item right after preprocessing.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

I ran some tests (with this submission model). All MHub models come with test and reference data, so I manually updated the version from nnunetv2==2.0 to nnunetv2==2.5.1 and compared the generated segmentation with the reference. Normally, we would expect a Dice score of ~1 (with slight variations due to rounding errors caused by different graphics card architectures). However, in this case I got a Dice score of 0.9451, so the generated masks differ when using the latest version. I was wondering if you could link this to a specific change.

@FabianIsensee
Copy link
Member

Is there a way I can reproduce this locally to investigate? Like can you share both checkpoints + the reference data that gives Dice=1 in one case and 0.94 in the other? That way I can track down where things diverge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility
3 participants