计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 150-157.doi: 10.11896/jsjkx.241200021

• 计算机图形学&多媒体 • 上一篇    下一篇

ETF-YOLO11n:交通图像的多尺度特征融合目标检测方法

夏淑芳, 尹昊楠, 瞿中   

  1. 重庆邮电大学软件工程学院 重庆 400065
  • 收稿日期:2024-12-02 修回日期:2025-03-30 出版日期:2025-12-15 发布日期:2025-12-09
  • 通讯作者: 瞿中(quzhong@cqupt.edu.cn)
  • 作者简介:(xiasf@cqupt.edu.cn)
  • 基金资助:
    国家自然科学基金(62576058,62571077);重庆市教育委员会科学技术研究项目(KJZD-M202300604);重庆市自然科学基金(2023NSCQ-MSX1781)

ETF-YOLO11n:Object Detection Method Based on Multi-scale Feature Fusion for TrafficImages

XIA Shufang, YIN Haonan, QU Zhong   

  1. School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2024-12-02 Revised:2025-03-30 Published:2025-12-15 Online:2025-12-09
  • About author:XIA Shufang,born in 1980,Ph.D.Her main research interests include compu-ter vision,machine learning and artificial intelligence.
    QU Zhong,born in 1972,Ph.D,professor.His main research interests include computer vision,machine learning and artificial intelligence.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(62576058,62571077),Scientific and Technological Research Program of Chongqing Municipal Education Commission(KJZD-M202300604) and Natural Science Foundation of Chongqing,China(2023NSCQ-MSX1781).

摘要: 近年来深度学习算法在计算机视觉领域取得了显著的进展,但是由于复杂交通图像中存在目标尺寸小、特征信息不明显、易受干扰等问题,目标检测精度依旧不高。针对这一问题,对最先进的检测模型YOLO11进行改进,设计了多尺度特征融合模型ETF-YOLO11n(Effective Traffic Feature YOLO)。首先,设计了三重特征融合模块TFF,将主干网络提取到的不同尺寸特征信息进行有效融合;其次,设计了基于混合空洞卷积的特征加强模块HDCFE,并添加至模型的颈部网络中整合不同感受野提取到的特征,降低因为遮挡等情况对模型的干扰;最后,用提出的GeoCIoU替代CIoU,通过两个不同的惩罚项,模型能更精准地反馈检测框与真实框的匹配情况。所提出的ETF-YOLO11n在交通数据KITTI上AP达到65.6%,mAP@0.5达到90.7%,与基线模型YOLO11n相比分别提升了2.4个百分点和1.2个百分点,体现了良好的检测效果。此外,ETF-YOLO11n在COCO-Traffic数据集上AP和mAP@0.5分别达到了42.5%和59.8%;所提出的方法迁移至YOLOv8模型,在KITTI数据集上AP和mAP@0.5分别达到66.9% 和91.5%。实验结果表明,所提出的方法能显著提升模型的检测能力,且对不同模型和数据集都有较好的泛化性,在精度与参数量上达到了很好的平衡1)

关键词: 目标检测, 多特征融合, 交并比, 特征加强, 复杂交通场景

Abstract: Deep learning algorithms have made significant progress in the field of computer vision in recent years,but the accuracy of object detection in complex traffic scenes is still unsatisfactory due to the small size of traffic objects,inconspicuous feature,and susceptibility to interference.To address this problem,this paper improves the state-of-the-art YOLO11 and designs the ETF-YOLO11n based on multi-scale feature fusion.Firstly,it designs TFF,which effectively fuses the feature information of different sizes extracted from the backbone.Secondly,it designs HDCFE,effectively integrates the features extracted from different receptive fields and reduces the interference on the detection effect of the model due to occlusion and overlapping.Finally,the proposed GeoCIoU is used to replace CIoU,and the model can provide more accurate feedback on the matching of the predicted box and the ground-truth box through the two different penalization terms.The ETF-YOLO11n achieves an AP of 65.6% and mAP@0.5 of 90.7% on KITTI dataset,which is improved by 2.4 percentage points and 1.2 percentage points.In addition,ETF-YOLO11n achieves 42.5% AP and 59.8% mAP@0.5 on COCO-Traffic,and EFT-YOLOv8n achieves 66.9% AP and 91.5% mAP@0.5 on KITTI dataset.The results show that the proposed methods significantly improve the performance and have good ge-neralization ability to different models and datasets,achieve a good balance between the accuracy and parameters.The source code has been opened.

Key words: Object detection, Multi-feature fusion, Inter over Union, Feature enhancement, Complex traffic scenarios

中图分类号: 

  • TP391.41
[1]CHEN H,WAN W W,MATSUSHIT A,et al.AutomaticallyPrepare Training Data for YOLO Using Robotic In-Hand Observation and Synthesis [J].IEEE Transactions on Automation Science and Engineering,2024,21(3):4876-4982.
[2]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultibox Detector [C]//European Conference on Computer Vision.Springer,2016:21-27.
[3]JOCHER G.YOLOv5[EB/OL].(2020-06-28) [2024-11-17].https://github.com/ultralytics/yolov5.
[4]LI C Y,LI LL,JIANG H L,et al.YOLOv6:A Single-Stage Object Detection Framework for Industrial Applications[J].arXiv:2209.02976,2022.
[5]ZHENG G,LIU S T,WANG F,et al.YOLOX:Exceeding YOLO Series in 2021[J].arXiv:2107.08430,2021.
[6]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[C]//IEEE Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[7]JOCHER G.YOLOv8[EB/OL].(2023-01-10) [2024-11-17].https://github.com/ultralytics/yolov8.
[8]WANG C Y,YEH I H,LIAO H Y,et al.Yolov9:LearningWhat You Want to Learn Using Programmable Gradient Information[C]//European Conference on Computer Vision.Sprin-ger,2024:1-21.
[9]JOCHER G.YOLO11[EB/OL].(2024-09-30) [2024-11-17].https://GitHub-ultralytics/ultralytics:Ultralytics YOLO11.
[10]GIRSHICK R,REN S,HE K,et al.Faster R-CNN:TowardsReal-time Object Detection with Region Proposal Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[11]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end Object Detection with Transformers[C]//European Conference on Computer Vision.Springer,2020:213-219.
[12]Government of the People’s Republic of China.MotorVehicles Nationwide Reached 440 Million in the First Half of 2024 [EB/OL].(2024-07-09) [2024-11-17].https://www.gov.cn/lianbo/bumen/202407/content_6961935.htm.
[13]GEIGER A,LENZ P,STILLER C,et al.Vision Meets Robo-tics:The Kitti Dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[14]LIN T Y,DOLLAR P,PIOTR G,et al.FeaturePyramid Networks for Object Detection [C]//IEEE Conference on Compu-ter Vision and Pattern Recognition.IEEE,2017:936-944.
[15]LIU S,QI L,QIN H F,et al.Path Aggregation Network for Instance Segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:8759-8768.
[16]LAN L X,CHI M Y.Remote Sensing Change Detection Based on Feature Fusion and Attention Network[J].Computer Science,2022,49(6):193-198.
[17]LU H T,FANG M Y,QIU Y X,et al.An Anchor-Free Defect Detector for Complex Background Based on Pixelwise Adaptive Multiscale Feature Fusion[J].IEEE Transactions on Instrumentation and Measurement,2023,72:1-12.
[18]YU J H,JIANG Y N,WANG Z Y.et al.Unitbox:An advanced object detection network [C]//International Conference on Multimedia.ACM,2016:516-520.
[19]REZATOFIGHI H,HAMID T,NATHAN G,et al.Generalized Intersection over Union:A Metric and a Loss for Bounding Box Regression[C]//IEEE International Conference on Computer Vision.IEEE,2019:658-666.
[20]ZHENG Z H,WANG P,LIU W,et al.Distance-IoU loss:Faster and Better Learning for Bounding Box Regression [C]//AAAI Conference on Artificial Intelligence.AAAI,2020:12993-13000.
[21]GEVORGYAN Z.SIoU Loss:More Powerful Learning forBounding Box Regression [J].arXiv:2205.12740,2022.
[22]LUO X,CAI Z,SHAO B,et al.Unified-IoU:For High-Quality Object Detection [J].arXiv:2408.06636,2024.
[23]HU J,SHEN L,SUN G.Squeeze-and-excitation Networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:7132-7141.
[24]WOO S P,LEE J C,KWEON J Y.CBAM:Convolutional Block Attention Module [C]//European Conference on Computer Vision.Springer,2018:3-19.
[25]REN X X,LI M,LI Z H,et al.Curiosity-driven Attention forAno-maly Road Obstacles Segmentation in Autonomous Driving [J].IEEE Transactions on Intelligent Vehicles,2022,8(3):2233-2243.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is All You Need[J].arXiv:1706.03762,2017.
[27]ANTHONY F,DANIEL G,KYROLLOS Y Y,et al.LookHere:Vision Transformers with Directed Attention Generalize and Extrapolate[J].arXiv:2405.13958,2024.
[28]GAO L Y,QU Z,WANG S Y,et al.A Lightweight Neural Network Model of Feature Pyramid and Attention Mechanism for Traffic Object Detection[J].IEEE Transactions on Intelligent Vehicles,2024,9(2):3422-3435.
[29]WANG S Y,QU Z,GAO L Y,et al.Multi-spatial Pyramid Feature and Optimizing Focal Loss Function for Object Detection[J].IEEE Transactions on Intelligent Vehicles,2023,9(1):1054-1065.
[30]LIN T S,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,99:2999-3007.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!