You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To Reproduce
比较难以复现,偶发性错误,有的时候从这个epoch训练报assert错误之后,接着--resume选项继续训练,这个epoch又能训练过去了,比如上文中的epoch 24报错,我接着epoch23的pth继续训练,结果又能通过epoch24的训练了,本人愚钝,不知道如何缓解或者解决这个问题,希望作者看到了如果有时间的话能提供一些建议,谢谢~
The text was updated successfully, but these errors were encountered:
Star RTDETR
请先在RTDETR主页点击star以支持本项目
Star RTDETR to help more people discover this project.
Describe the bug
Epoch: [24] [3700/4421] eta: 0:04:24 lr: 0.000010 loss: 19.0368 (17.4353) loss_vfl: 0.6064 (0.7137) loss_bbox: 0.0663 (0.1036) loss_giou: 0.5338 (0.5874) loss_vfl_aux_0: 0.7402 (0.8096) loss_bbox_aux_0: 0.0729 (0.1187) loss_giou_aux_0: 0.6301 (0.6288) loss_vfl_aux_1: 0.6860 (0.7895) loss_bbox_aux_1: 0.0646 (0.1089) loss_giou_aux_1: 0.5190 (0.6023) loss_vfl_aux_2: 0.6597 (0.7459) loss_bbox_aux_2: 0.0668 (0.1054) loss_giou_aux_2: 0.5340 (0.5916) loss_vfl_aux_3: 0.6401 (0.7211) loss_bbox_aux_3: 0.0681 (0.1042) loss_giou_aux_3: 0.5312 (0.5888) loss_vfl_aux_4: 0.6299 (0.7149) loss_bbox_aux_4: 0.0665 (0.1037) loss_giou_aux_4: 0.5331 (0.5876) loss_vfl_aux_5: 0.7495 (0.8039) loss_bbox_aux_5: 0.0975 (0.1549) loss_giou_aux_5: 0.7090 (0.7207) loss_vfl_dn_0: 0.5093 (0.5377) loss_bbox_dn_0: 0.0742 (0.1434) loss_giou_dn_0: 0.6288 (0.6582) loss_vfl_dn_1: 0.4673 (0.4842) loss_bbox_dn_1: 0.0617 (0.1143) loss_giou_dn_1: 0.5375 (0.5704) loss_vfl_dn_2: 0.4539 (0.4721) loss_bbox_dn_2: 0.0590 (0.1089) loss_giou_dn_2: 0.5288 (0.5569) loss_vfl_dn_3: 0.4536 (0.4660) loss_bbox_dn_3: 0.0586 (0.1078) loss_giou_dn_3: 0.5254 (0.5546) loss_vfl_dn_4: 0.4468 (0.4654) loss_bbox_dn_4: 0.0587 (0.1077) loss_giou_dn_4: 0.5274 (0.5548) loss_vfl_dn_5: 0.4519 (0.4665) loss_bbox_dn_5: 0.0587 (0.1078) loss_giou_dn_5: 0.5290 (0.5554) time: 0.3496 data: 0.0037 max mem: 17856
Epoch: [24] [3800/4421] eta: 0:03:48 lr: 0.000010 loss: 16.5786 (17.4315) loss_vfl: 0.7236 (0.7142) loss_bbox: 0.0596 (0.1032) loss_giou: 0.3736 (0.5870) loss_vfl_aux_0: 0.7646 (0.8101) loss_bbox_aux_0: 0.0631 (0.1182) loss_giou_aux_0: 0.3962 (0.6285) loss_vfl_aux_1: 0.7769 (0.7899) loss_bbox_aux_1: 0.0598 (0.1085) loss_giou_aux_1: 0.3695 (0.6020) loss_vfl_aux_2: 0.7441 (0.7465) loss_bbox_aux_2: 0.0590 (0.1050) loss_giou_aux_2: 0.3621 (0.5912) loss_vfl_aux_3: 0.7363 (0.7217) loss_bbox_aux_3: 0.0585 (0.1037) loss_giou_aux_3: 0.3685 (0.5884) loss_vfl_aux_4: 0.7446 (0.7157) loss_bbox_aux_4: 0.0583 (0.1033) loss_giou_aux_4: 0.3765 (0.5872) loss_vfl_aux_5: 0.7554 (0.8046) loss_bbox_aux_5: 0.0830 (0.1542) loss_giou_aux_5: 0.4614 (0.7201) loss_vfl_dn_0: 0.5054 (0.5377) loss_bbox_dn_0: 0.0667 (0.1430) loss_giou_dn_0: 0.5499 (0.6583) loss_vfl_dn_1: 0.4458 (0.4842) loss_bbox_dn_1: 0.0603 (0.1139) loss_giou_dn_1: 0.4702 (0.5704) loss_vfl_dn_2: 0.4360 (0.4722) loss_bbox_dn_2: 0.0594 (0.1086) loss_giou_dn_2: 0.4544 (0.5569) loss_vfl_dn_3: 0.4346 (0.4661) loss_bbox_dn_3: 0.0594 (0.1074) loss_giou_dn_3: 0.4574 (0.5545) loss_vfl_dn_4: 0.4363 (0.4655) loss_bbox_dn_4: 0.0593 (0.1074) loss_giou_dn_4: 0.4564 (0.5548) loss_vfl_dn_5: 0.4355 (0.4665) loss_bbox_dn_5: 0.0592 (0.1074) loss_giou_dn_5: 0.4564 (0.5554) time: 0.3734 data: 0.0040 max mem: 17856
Epoch: [24] [3900/4421] eta: 0:03:11 lr: 0.000010 loss: 18.1389 (17.4234) loss_vfl: 0.6440 (0.7142) loss_bbox: 0.0591 (0.1030) loss_giou: 0.5909 (0.5868) loss_vfl_aux_0: 0.6987 (0.8103) loss_bbox_aux_0: 0.0666 (0.1180) loss_giou_aux_0: 0.6362 (0.6282) loss_vfl_aux_1: 0.7231 (0.7900) loss_bbox_aux_1: 0.0646 (0.1082) loss_giou_aux_1: 0.5919 (0.6018) loss_vfl_aux_2: 0.6631 (0.7463) loss_bbox_aux_2: 0.0645 (0.1048) loss_giou_aux_2: 0.5722 (0.5911) loss_vfl_aux_3: 0.6465 (0.7217) loss_bbox_aux_3: 0.0597 (0.1036) loss_giou_aux_3: 0.5741 (0.5882) loss_vfl_aux_4: 0.6460 (0.7159) loss_bbox_aux_4: 0.0596 (0.1031) loss_giou_aux_4: 0.5903 (0.5870) loss_vfl_aux_5: 0.7163 (0.8045) loss_bbox_aux_5: 0.0816 (0.1540) loss_giou_aux_5: 0.6991 (0.7196) loss_vfl_dn_0: 0.5063 (0.5375) loss_bbox_dn_0: 0.0745 (0.1428) loss_giou_dn_0: 0.6293 (0.6578) loss_vfl_dn_1: 0.4663 (0.4840) loss_bbox_dn_1: 0.0632 (0.1137) loss_giou_dn_1: 0.5524 (0.5699) loss_vfl_dn_2: 0.4607 (0.4720) loss_bbox_dn_2: 0.0550 (0.1083) loss_giou_dn_2: 0.5654 (0.5564) loss_vfl_dn_3: 0.4641 (0.4659) loss_bbox_dn_3: 0.0548 (0.1072) loss_giou_dn_3: 0.5649 (0.5541) loss_vfl_dn_4: 0.4619 (0.4652) loss_bbox_dn_4: 0.0547 (0.1072) loss_giou_dn_4: 0.5631 (0.5543) loss_vfl_dn_5: 0.4612 (0.4663) loss_bbox_dn_5: 0.0547 (0.1072) loss_giou_dn_5: 0.5624 (0.5549) time: 0.3582 data: 0.0037 max mem: 17856
Traceback (most recent call last):
File "tools/train.py", line 51, in
main(args)
File "tools/train.py", line 37, in main
solver.fit()
File "/home/amax/ckl/OI-RT-DETR/oi-rtdetr-pytorch/tools/../src/solver/det_solver.py", line 37, in fit
train_stats = train_one_epoch(
File "/home/amax/ckl/OI-RT-DETR/oi-rtdetr-pytorch/tools/../src/solver/det_engine.py", line 46, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/home/amax/miniconda3/envs/rtdetr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/amax/ckl/OI-RT-DETR/oi-rtdetr-pytorch/tools/../src/zoo/rtdetr/rtdetr_criterion.py", line 238, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/amax/miniconda3/envs/rtdetr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/amax/miniconda3/envs/rtdetr/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/amax/ckl/OI-RT-DETR/oi-rtdetr-pytorch/tools/../src/zoo/rtdetr/matcher.py", line 99, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
File "/home/amax/ckl/OI-RT-DETR/oi-rtdetr-pytorch/tools/../src/zoo/rtdetr/box_ops.py", line 52, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
AssertionError
作者您好,冒昧打扰了,如报错信息所示,在box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox) 这这一行cxcywh转换为xyxy的时候似乎会发生错误,但是这个时候已经训练了24个epoch,也就是tgt_bbox 这里应该不会在转换的时候发生错误,也就是out_bbox有一个box不满足xyxy的后者点坐标大于前者的格式,也就是模型预测输出的cxcywh有负数值???
To Reproduce
比较难以复现,偶发性错误,有的时候从这个epoch训练报assert错误之后,接着--resume选项继续训练,这个epoch又能训练过去了,比如上文中的epoch 24报错,我接着epoch23的pth继续训练,结果又能通过epoch24的训练了,本人愚钝,不知道如何缓解或者解决这个问题,希望作者看到了如果有时间的话能提供一些建议,谢谢~
The text was updated successfully, but these errors were encountered: