raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message #2207
Unanswered
yonas-babulet
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello guys i got a problem while i was trying to train the model using BrainTumour dataset in 3d_fullres cofiguration i get this problem can you help me how to fix it? i am trying to run it in server.
nnUNetv2_train Dataset001_BrainTumour 3d_fullres 0
Using device: cuda:0
2024-05-20 23:54:36.417725: failed to log: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7811a8b515c0>)
2024-05-20 23:54:36.417725: failed to log: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7811a8b515c0>)
2024-05-20 23:54:36.417725: failed to log: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7811a8b515c0>)
2024-05-20 23:54:36.417725: failed to log: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7811a8b515c0>)
2024-05-20 23:54:36.417725: failed to log: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7811a8b515c0>)
#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################
/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/optim/lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
This is the configuration used by this training:
Configuration name: 3d_fullres
{'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [128, 128, 128], 'median_image_size_in_voxels': [138.0, 169.0, 138.0], 'spacing': [1.0, 1.0, 1.0], 'normalization_schemes': ['ZScoreNormalization', 'ZScoreNormalization', 'ZScoreNormalization', 'ZScoreNormalization'], 'use_mask_for_norm': [True, True, True, True], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [5, 5, 5], 'pool_op_kernel_sizes': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': False}
These are the global plan.json settings:
{'dataset_name': 'Dataset001_BrainTumour', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 1.0, 1.0], 'original_median_shape_after_transp': [138, 169, 138], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 5721.0, 'mean': 728.8666381835938, 'median': 779.0, 'min': 0.0, 'percentile_00_5': 104.0, 'percentile_99_5': 1733.0, 'std': 354.5618896484375}, '1': {'max': 8761.0, 'mean': 621.560791015625, 'median': 644.0, 'min': 0.0, 'percentile_00_5': 56.0, 'percentile_99_5': 2421.0, 'std': 335.946044921875}, '2': {'max': 9012.0, 'mean': 662.5552368164062, 'median': 639.0, 'min': 0.0, 'percentile_00_5': 44.0, 'percentile_99_5': 2963.0, 'std': 420.2735595703125}, '3': {'max': 3346.0, 'mean': 664.2885131835938, 'median': 647.0, 'min': 0.0, 'percentile_00_5': 103.0, 'percentile_99_5': 1997.0, 'std': 318.48980712890625}}}
2024-05-20 23:54:40.320030: unpacking dataset...
2024-05-20 23:54:40.462576: unpacking done...
2024-05-20 23:54:40.463997: do_dummy_2d_data_aug: False
2024-05-20 23:54:40.465495: Using splits from existing split file: /home/ybabuletabetew/Documents/Projects/NEXTOU/nnUNet_preprocessed/Dataset001_BrainTumour/splits_final.json
2024-05-20 23:54:40.466101: The split file contains 5 splits.
2024-05-20 23:54:40.466152: Desired fold for training: 0
2024-05-20 23:54:40.466182: This split has 387 training and 97 validation cases.
2024-05-20 23:54:51.155321: Unable to plot network architecture:
2024-05-20 23:54:51.155502: CUDA out of memory. Tried to allocate 256.00 MiB. GPU
2024-05-20 23:54:51.187784:
2024-05-20 23:54:51.187838: Epoch 0
2024-05-20 23:54:51.187947: Current learning rate: 0.01
using pin_memory on device 0
Traceback (most recent call last):
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/bin/nnUNetv2_train", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/Documents/Projects/NEXTOU/nnUNet-2.0/nnunetv2/run/run_training.py", line 247, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/home/ybabuletabetew/Documents/Projects/NEXTOU/nnUNet-2.0/nnunetv2/run/run_training.py", line 190, in run_training
nnunet_trainer.run_training()
File "/home/ybabuletabetew/Documents/Projects/NEXTOU/nnUNet-2.0/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1210, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/Documents/Projects/NEXTOU/nnUNet-2.0/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 848, in train_step
output = self.network(data)
^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/dynamic_network_architectures/architectures/unet.py", line 62, in forward
return self.decoder(skips)
^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/dynamic_network_architectures/building_blocks/unet_decoder.py", line 111, in forward
x = self.stagess
^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/dynamic_network_architectures/building_blocks/simple_conv_blocks.py", line 137, in forward
return self.convs(x)
^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/dynamic_network_architectures/building_blocks/simple_conv_blocks.py", line 71, in forward
return self.all_modules(x)
^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 610, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
return F.conv3d(
^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/ybabuletabetew/miniconda3/envs/NEXTOU/lib/python3.11/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Beta Was this translation helpful? Give feedback.
All reactions