Problem with large training dataset #1368
Replies: 2 comments 7 replies
-
Hi, nnU-Net's resource usage does not increase with more training cases, so there is no upper limit to the size of the dataset you can use. We have trained nnU-Nets with well over 4k training samples just fine. What is strange to me is that no real error message is given, all we see is that some background worker died an unexpected death. |
Beta Was this translation helpful? Give feedback.
-
Thank you for this great repository. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have problems with training, when using a very large training sample. In particularly, I tried to train nnU-Net with 1470 datasets in the training sample, each comprising one T1w MRI brain scan as input modality and a label map with 1 label.
I tried also to reduce the number of images in the training sample to find the rough upper limit of datasets, which can be handled during training. Testing with 1050 datasets, it still didn't work, but when reducing to only 700 datasets in the training sample, the training worked fine.
The error message I get for very large training samples refers to the "multi_threaded_augmenter.py" and occurs usually during one of the first epochs. See below for a copy of the error message. I tried also to increase the memory allowed in the SLURM job to very large values (e.g. 1 task x 24 CPU x 20.000 MB), but also this didn't help.
Is there a strategy, how to work with such large training samples? For example, would deactivating data augmentation help and is there an easy way to do this?
Thank you for help and suggestions!
Benno
PS:
Here the an example of an entire error message:
Beta Was this translation helpful? Give feedback.
All reactions