Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Differences in Evaluation Results on the Validation Set Between train.py During Training and test.py in YOLOv5 5.0 #13485

Open
1 task done
3210448723 opened this issue Jan 9, 2025 · 2 comments
Labels
detect Object Detection issues, PR's question Further information is requested

Comments

@3210448723
Copy link

Search before asking

Question

YOLOv5 5.0版本在train.py训练过程中在验证集上的评估结果与test.py在验证集上的评估结果具有显著差异

Why is this happening? The results in test.py are much higher than those obtained during validation after adjusting the number of epochs, and this phenomenon occurs in most epochs. The results from test.py are extremely ideal and do not match the actual performance. Below is a portion of the output logs.
为什么会这样?test中的结果比改epoch后进行验证的结果高了很多,而且大多数epoch都有这样的现象,test.py的结果极其理想,与实际不符。下面是部分输出日志

Evaluation Output of train.py on the Validation Set at Epoch 115

2024-12-12 14:37:44,182 - INFO - YOLOv5 🚀 5211d5c torch 2.4.1+cu124 CUDA:0 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:1 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:2 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:3 (NVIDIA GeForce RTX 3090, 24154.375MB)

2024-12-12 14:37:44,192 - INFO - Namespace(adam=False, artifact_alias='latest', batch_size=32, bbox_interval=-1, bucket='', cache_images=True, cfg='', data='data/fankou/EnhancedDataset.yaml', device='0,1,2,3', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/fankou/hyp.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, offline=True, project='runs/train', quad=False, rect=False, resume=True, save_dir='runs/train/exp', save_period=1, single_cls=False, sync_bn=False, total_batch_size=32, upload_dataset=False, weights='./runs/train/exp/weights/last.pt', workers=8, world_size=1)
2024-12-12 14:37:44,193 - INFO - �[34m�[1mtensorboard: �[0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
2024-12-12 14:37:44,194 - INFO - �[34m�[1mhyperparameters: �[0mlr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.0375, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, label_smoothing=0.0
2024-12-12 14:37:47,177 - INFO - 
                 from  n    params  module                                  arguments                     
2024-12-12 14:37:47,181 - INFO -   0                -1  1      7040  models.common.Focus                     [3, 64, 3]                    
2024-12-12 14:37:47,182 - INFO -   1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
2024-12-12 14:37:47,185 - INFO -   2                -1  1    156928  models.common.C3                        [128, 128, 3]                 
2024-12-12 14:37:47,187 - INFO -   3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
2024-12-12 14:37:47,199 - INFO -   4                -1  1   1611264  models.common.C3                        [256, 256, 9]                 
2024-12-12 14:37:47,205 - INFO -   5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
2024-12-12 14:37:47,248 - INFO -   6                -1  1   6433792  models.common.C3                        [512, 512, 9]                 
2024-12-12 14:37:47,277 - INFO -   7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
2024-12-12 14:37:47,296 - INFO -   8                -1  1   2624512  models.common.SPP                       [1024, 1024, [5, 9, 13]]      
2024-12-12 14:37:47,359 - INFO -   9                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
2024-12-12 14:37:47,363 - INFO -  10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
2024-12-12 14:37:47,363 - INFO -  11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
2024-12-12 14:37:47,363 - INFO -  12           [-1, 6]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,382 - INFO -  13                -1  1   2757632  models.common.C3                        [1024, 512, 3, False]         
2024-12-12 14:37:47,383 - INFO -  14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
2024-12-12 14:37:47,383 - INFO -  15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
2024-12-12 14:37:47,383 - INFO -  16           [-1, 4]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,390 - INFO -  17                -1  1    690688  models.common.C3                        [512, 256, 3, False]          
2024-12-12 14:37:47,394 - INFO -  18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
2024-12-12 14:37:47,394 - INFO -  19          [-1, 14]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,411 - INFO -  20                -1  1   2495488  models.common.C3                        [512, 512, 3, False]          
2024-12-12 14:37:47,426 - INFO -  21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
2024-12-12 14:37:47,426 - INFO -  22          [-1, 10]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,490 - INFO -  23                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
2024-12-12 14:37:47,491 - INFO -  24      [17, 20, 23]  1     59235  models.yolo.Detect                      [6, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
2024-12-12 14:37:47,781 - INFO - Model Summary: 499 layers, 46658275 parameters, 46658275 gradients, 114.6 GFLOPS
2024-12-12 14:37:47,781 - INFO - 
2024-12-12 14:37:47,890 - INFO - Transferred 650/650 items from ./runs/train/exp/weights/last.pt
2024-12-12 14:37:47,966 - INFO - Scaled weight_decay = 0.0005
2024-12-12 14:37:47,970 - INFO - Optimizer groups: 110 .bias, 110 conv.weight, 107 other
2024-12-12 14:39:57,377 - INFO - Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/train/exp
Starting training for 300 epochs...

2024-12-13 00:16:49,333 - INFO - 
�[34m�[1mtest:�[0m data: {'train': ['/home/user/yuanjinmin/数据集/模型训练/train', '/home/user/yuanjinmin/dataset/obj_train_data/train_pro'], 'val': ['/home/user/yuanjinmin/数据集/模型训练/val', '/home/user/yuanjinmin/dataset/obj_train_data/val_pro'], 'nc': 6, 'names': ['unhelmet', 'helmet', 'cigarette', 'fire', 'smoke', 'safebelt']}, weight: None, batch_size: 64, imgsz: 640, conf_thres: 0.001, iou_thres: 0.6, save_json: False, single_cls: False, augment: False, verbose: True, dataloader: <utils.datasets.InfiniteDataLoader object at 0x784bc87e5100>, save_dir: runs/train/exp, save_txt: False, save_hybrid: False, save_conf: False, plots: False, wandb_logger: <utils.wandb_logging.wandb_utils.WandbLogger object at 0x784bd86e1610>, compute_loss: <utils.loss.ComputeLoss object at 0x784bd5c4e790>, half_precision: True, is_coco: False
2024-12-13 00:17:12,163 - INFO -                Class      Images      Labels           P           R      [email protected]  [email protected]:.95
2024-12-13 00:17:12,163 - INFO -                  all        3925       22419       0.745       0.731       0.722       0.406
2024-12-13 00:17:12,164 - INFO -             unhelmet        3925        9246       0.903       0.939       0.938       0.538
2024-12-13 00:17:12,164 - INFO -               helmet        3925       10645        0.86       0.927       0.943       0.754
2024-12-13 00:17:12,164 - INFO -            cigarette        3925         761       0.631       0.618       0.594       0.229
2024-12-13 00:17:12,164 - INFO -                 fire        3925         808        0.57        0.64         0.6       0.325
2024-12-13 00:17:12,164 - INFO -                smoke        3925         717       0.602       0.351        0.35       0.135
2024-12-13 00:17:12,164 - INFO -             safebelt        3925         242       0.904       0.913        0.91       0.457

Evaluation Output of test.py on the Validation Set for the Model at Epoch 115

2025-01-09 20:15:37,890 - INFO - Namespace(augment=False, batch_size=64, conf_thres=0.001, data='data/fankou/EnhancedDataset.yaml', device='0,1,2,3', exist_ok=False, img_size=640, iou_thres=0.6, name='exp', project='runs/test', save_conf=True, save_hybrid=True, save_json=False, save_txt=True, single_cls=False, task='val', verbose=True, weights='runs/train/exp/weights/epoch_115.pt')
2025-01-09 20:15:39,100 - INFO - Fusing layers...
2025-01-09 20:15:40,267 - INFO - Model Summary: 392 layers, 46627491 parameters, 0 gradients, 114.0 GFLOPS
2025-01-09 20:17:21,449 - INFO -                Class      Images      Labels           P           R      [email protected]  [email protected]:.95
2025-01-09 20:17:21,449 - INFO -                  all        3925       22419           1           1       0.995       0.995
2025-01-09 20:17:21,449 - INFO -             unhelmet        3925        9246           1           1       0.996       0.996
2025-01-09 20:17:21,449 - INFO -               helmet        3925       10645           1       0.999       0.996       0.996
2025-01-09 20:17:21,449 - INFO -            cigarette        3925         761           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -                 fire        3925         808           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -                smoke        3925         717           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -             safebelt        3925         242           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO - Speed: 3.2/1.8/4.9 ms inference/NMS/total per 640x640 image at batch-size 64
2025-01-09 20:17:22,088 - INFO - Results saved to runs/test/exp
3925 labels saved to runs/test/exp/labels

Additional

No response

@3210448723 3210448723 added the question Further information is requested label Jan 9, 2025
@UltralyticsAssistant UltralyticsAssistant added the detect Object Detection issues, PR's label Jan 9, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @3210448723, thank you for bringing this to our attention and for your interest in YOLOv5 🚀!

It seems you're encountering differences in evaluation metrics between train.py and test.py. This discrepancy might arise due to differences in how the evaluation is performed during training versus testing. To assist you better, could you please share a minimum reproducible example (MRE)? This should include:

  • The exact commands you used for both train.py and test.py.
  • Relevant portions of your dataset or configuration files.
  • Specific details about your training and testing pipelines (e.g., augmentations, hyperparameters, evaluation settings).
  • Versions of YOLOv5, Python, and PyTorch being used.

Additionally, ensure that your environment satisfies these minimum requirements:

  • Python>=3.8.0
  • All dependencies installed as per the requirements.txt included in the repository
  • PyTorch>=1.8 and correctly set up CUDA (if using GPU)

If applicable, confirm whether you are running YOLOv5 in a local environment or in a cloud-based environment (such as Colab, Paperspace, etc.).

This is an automated response to help guide resolution, and an Ultralytics engineer will assist you further soon. Let us know if you need additional clarification! 😊

@pderrenger
Copy link
Member

@3210448723 the significant differences in evaluation results between train.py and test.py likely stem from differences in evaluation configurations, such as augmentation settings, confidence thresholds, or IoU thresholds. During training, train.py typically uses validation with partial augmentations and real-time adjustments, while test.py evaluates the model in a purely inference-focused environment without training-specific nuances.

To investigate further:

  1. Ensure both scripts use consistent configurations for evaluation (e.g., --augment, imgsz, conf_thres, iou_thres).
  2. Check if the dataset and preprocessing steps are identical for both scripts.
  3. Confirm the test.py command is evaluating the same checkpoint as the one saved during training.

For additional details on validation differences, consult the YOLOv5 validation documentation. Let me know if you need further clarification!

@3210448723 3210448723 changed the title Significant Differences in Evaluation Results on the Validation Set Between train.py During Training and test.py in [YOLOv5 5.0](https://github.com/ultralytics/yolov5/releases/tag/v5.0) Significant Differences in Evaluation Results on the Validation Set Between train.py During Training and test.py in YOLOv5 5.0 Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detect Object Detection issues, PR's question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants