计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200051-7.doi: 10.11896/jsjkx.250200051

• 计算机图形学&多媒体 • 上一篇    下一篇

基于局部特征和特征融合的无人驾驶场景目标检测方法

纪涛1,2,3, 杨一帆1,2, 冯亚春2, 伍凌帆2, 李旭亮2, 李亚伟2   

  1. 1 云南师范大学信息学院 昆明 650500
    2 北京航空航天大学宇航学院 北京 102206
    3 西南联合研究生院 昆明65050
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 杨一帆(yifanyang@buaa.edu.cn)
  • 作者简介:jitao09@foxmail.com
  • 基金资助:
    国家自然科学基金(62476017)

Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion

JI Tao1,2,3, YANG Yifang1,2, FENG Yachun2, WU Lingfan2, LI Xuliang2, LI Yawei2   

  1. 1 School of Information Science,Yunnan Normal University,Kunming 650500,China
    2 School of Astronautics,Beihang University,Beijing 102206,China
    3 Southwest United Graduate School,Kunming 650500,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62476017).

摘要: 在无人驾驶场景中,目标检测的准确性和鲁棒性对系统性能至关重要。针对现有基于深度学习的网络模型在无人驾驶场景处理小目标和遮挡目标问题时出现的误检和漏检现象,提出了一种LSDA-YOLO网络模型。首先,提出了LocalSimAM(Local Simple and Effective Attention Mechanism)注意力机制,用于改善信息丢失问题,并将其应用于Backbone;同时引入SHSA(Single-Head Self-Attention)注意力机制,设计了一个信息聚合网络,提升对遮挡目标的检测能力。在Neck部分,通过动态调整上采样比例,增强模型对多尺度特征的适应性,减少小目标漏检率。在Head部分引入了自适应空间多尺度特征融合(Adaptive Spatial Feature Fusion,ASFF)策略,增强模型的多尺度检测能力。实验结果表明,LSDA-YOLO网络模型在KITTI数据集上,mAP0.5和mAP0.5:0.95分别提升了3.1个百分点和3.9个百分点,优于YOLOv11n基准网络模型,适用于无人驾驶场景高精度实时检测。

关键词: 注意力机制, 无人驾驶, 车辆检测, 行人检测, 特征融合

Abstract: In the context of unmanned driving,the accuracy and robustness of object detection are of vital importance to the performance of the system.Aiming at the false detection and missed detection phenomena that occur when existing deep learning-based network models deal with small objects and occluded objects inunmanned driving scenarios,an LSDA-YOLO network model is proposed.Firstly,the LocalSimAM attention mechanism is proposed to address the issue of information loss,and it is applied to the Backbone.Meanwhile,the SHSA attention mechanism is introduced,and an information aggregation network is designed to enhance the detection ability for occluded objects.In the Neck part,by dynamically adjusting the upsampling ratio,the adaptability of the model to multi-scale features is enhanced,reducing the missed detection rate of small objects.In the Head part,the ASFF strategy is introduced to enhance the model’s multi-scale detection ability.Experimental results show that the LSDA-YOLO network model improves the mAP0.5 and mAP0.5:0.95 by 3.1 percentage points and 3.9 percentage points respectively on the KITTI dataset,outperforming the YOLOv11n baseline network model,and is suitable for high-precision real-time detection in unmanned driving scenarios.

Key words: Attention mechanism, Unmanned driving, Vehicle detection, Pedestrian detection, Feature fusion

中图分类号: 

  • TP391.41
[1]GIRSHICK R.Fast R-CNN[J].arXiv:1504.08083,2015.
[2]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[4]REDMON J.You only look once:Unified,real-time object detec-tion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]LIM J S,ASTRID M,YOONH J,et al.Small object detection using context and attention[C]//2021 International Conference on Artificial Intelligence in Information and Communication(ICAIIC).IEEE,2021:181-186.
[7]BAI Y,ZHANG Y,DING M,et al.Sod-mtgan:Small object detection via multi-task generative adversarial network[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:206-221.
[8]KISANTAL M.Augmentation for Small Object Detection[J].arXiv:1902.07296,2019.
[9]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[10]JU M R,LUO H B,WANG Z B,et al.Improved YOLO V3 algorithm and its application in small object detection [J].Acta Optica Sinica,2019,39(7):0715004.
[11]CHEN F,GAO C,LIU F,et al.Local patch network with globalattention for infrared small object detection[J].IEEE Transactions on Aerospace and Electronic Systems,2022,58(5):3979-3991.
[12]LI G,FAN W,XIE H,et al.Detection of road objects based on camera sensors for autonomous driving in various traffic situations[J].IEEE Sensors Journal,2022,22(24):24253-24263.
[13]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[14]ZHANG H,ZU K,LU J,et al.Epsanet:An efficient pyramidsplit attention block on convolutional neural network[C]//CoRR.2021.
[15]YUN S,RO Y.Shvit:Single-head vision transformer with memory efficient macro design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5756-5767.
[16]LIU W,LU H,FU H,et al.Learning to upsample by learning to sample[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:6027-6037.
[17]WANG J,CHEN K,XU R,et al.Carafe:Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3007-3016.
[18]LU H,LIU W,FU H,et al.FADE:A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures[J].arXiv:2407.13500,2024.
[19]LU H,LIU W,YE Z,et al.SAPA:Similarity-aware point affiliation for feature upsampling[J].Advances in Neural Information Processing Systems,2022,35:20889-20901.
[20]GEIGER A,LENZ P,STILLER C,et al.KITTI Vision Benchmark Suite[EB/OL].https://www.cvlibs.net/datasets/kitti.
[21]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[22]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[23]CARION N,MASSA F,SYNNAEVEG,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[24]Ultralytics.Ultralytics/yolov5.GitHub[DB/OL].https://git-hub.com/ultralytics/yolov5.
[25]LI C,LI L,JIANGH,et al.YOLOv6:A single-stage object detection framework for industrial applications[J].arXiv:2209.02976,2022.
[26]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[27]ULTRALYTICS.Ultralytics/yolov8[DB/OL].https://github.com/ultralytics/yolov8.
[28]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!