基于局部特征和特征融合的无人驾驶场景目标检测方法

doi:10.11896/jsjkx.250200051

计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200051-7.doi: 10.11896/jsjkx.250200051

• 计算机图形学&多媒体 • 上一篇下一篇

基于局部特征和特征融合的无人驾驶场景目标检测方法

纪涛^1,2,3, 杨一帆^1,2, 冯亚春², 伍凌帆², 李旭亮², 李亚伟²

1 云南师范大学信息学院昆明 650500
2 北京航空航天大学宇航学院北京 102206
3 西南联合研究生院昆明65050

出版日期:2025-11-15 发布日期:2025-11-10
通讯作者: 杨一帆(yifanyang@buaa.edu.cn)
作者简介:jitao09@foxmail.com
基金资助:
国家自然科学基金(62476017)

Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion

JI Tao^1,2,3, YANG Yifang^1,2, FENG Yachun², WU Lingfan², LI Xuliang², LI Yawei²

1 School of Information Science,Yunnan Normal University,Kunming 650500,China
2 School of Astronautics,Beihang University,Beijing 102206,China
3 Southwest United Graduate School,Kunming 650500,China

Online:2025-11-15 Published:2025-11-10
Supported by:
National Natural Science Foundation of China(62476017).

摘要/Abstract

摘要： 在无人驾驶场景中,目标检测的准确性和鲁棒性对系统性能至关重要。针对现有基于深度学习的网络模型在无人驾驶场景处理小目标和遮挡目标问题时出现的误检和漏检现象,提出了一种LSDA-YOLO网络模型。首先,提出了LocalSimAM(Local Simple and Effective Attention Mechanism)注意力机制,用于改善信息丢失问题,并将其应用于Backbone;同时引入SHSA(Single-Head Self-Attention)注意力机制,设计了一个信息聚合网络,提升对遮挡目标的检测能力。在Neck部分,通过动态调整上采样比例,增强模型对多尺度特征的适应性,减少小目标漏检率。在Head部分引入了自适应空间多尺度特征融合(Adaptive Spatial Feature Fusion,ASFF)策略,增强模型的多尺度检测能力。实验结果表明,LSDA-YOLO网络模型在KITTI数据集上,mAP_0.5和mAP_0.5:0.95分别提升了3.1个百分点和3.9个百分点,优于YOLOv11n基准网络模型,适用于无人驾驶场景高精度实时检测。

关键词: 注意力机制, 无人驾驶, 车辆检测, 行人检测, 特征融合

Abstract: In the context of unmanned driving,the accuracy and robustness of object detection are of vital importance to the performance of the system.Aiming at the false detection and missed detection phenomena that occur when existing deep learning-based network models deal with small objects and occluded objects inunmanned driving scenarios,an LSDA-YOLO network model is proposed.Firstly,the LocalSimAM attention mechanism is proposed to address the issue of information loss,and it is applied to the Backbone.Meanwhile,the SHSA attention mechanism is introduced,and an information aggregation network is designed to enhance the detection ability for occluded objects.In the Neck part,by dynamically adjusting the upsampling ratio,the adaptability of the model to multi-scale features is enhanced,reducing the missed detection rate of small objects.In the Head part,the ASFF strategy is introduced to enhance the model’s multi-scale detection ability.Experimental results show that the LSDA-YOLO network model improves the mAP_0.5 and mAP_0.5:0.95 by 3.1 percentage points and 3.9 percentage points respectively on the KITTI dataset,outperforming the YOLOv11n baseline network model,and is suitable for high-precision real-time detection in unmanned driving scenarios.

Key words: Attention mechanism, Unmanned driving, Vehicle detection, Pedestrian detection, Feature fusion

中图分类号:

TP391.41

纪涛, 杨一帆, 冯亚春, 伍凌帆, 李旭亮, 李亚伟. 基于局部特征和特征融合的无人驾驶场景目标检测方法[J]. 计算机科学, 2025, 52(11A): 250200051-7. https://doi.org/10.11896/jsjkx.250200051

JI Tao, YANG Yifang, FENG Yachun, WU Lingfan, LI Xuliang, LI Yawei. Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion[J]. Computer Science, 2025, 52(11A): 250200051-7. https://doi.org/10.11896/jsjkx.250200051

参考文献

[1]GIRSHICK R.Fast R-CNN[J].arXiv:1504.08083,2015.
[2]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision－ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[4]REDMON J.You only look once:Unified,real-time object detec-tion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]LIM J S,ASTRID M,YOONH J,et al.Small object detection using context and attention[C]//2021 International Conference on Artificial Intelligence in Information and Communication(ICAIIC).IEEE,2021:181-186.
[7]BAI Y,ZHANG Y,DING M,et al.Sod-mtgan:Small object detection via multi-task generative adversarial network[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:206-221.
[8]KISANTAL M.Augmentation for Small Object Detection[J].arXiv:1902.07296,2019.
[9]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[10]JU M R,LUO H B,WANG Z B,et al.Improved YOLO V3 algorithm and its application in small object detection [J].Acta Optica Sinica,2019,39(7):0715004.
[11]CHEN F,GAO C,LIU F,et al.Local patch network with globalattention for infrared small object detection[J].IEEE Transactions on Aerospace and Electronic Systems,2022,58(5):3979-3991.
[12]LI G,FAN W,XIE H,et al.Detection of road objects based on camera sensors for autonomous driving in various traffic situations[J].IEEE Sensors Journal,2022,22(24):24253-24263.
[13]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[14]ZHANG H,ZU K,LU J,et al.Epsanet:An efficient pyramidsplit attention block on convolutional neural network[C]//CoRR.2021.
[15]YUN S,RO Y.Shvit:Single-head vision transformer with memory efficient macro design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5756-5767.
[16]LIU W,LU H,FU H,et al.Learning to upsample by learning to sample[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:6027-6037.
[17]WANG J,CHEN K,XU R,et al.Carafe:Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3007-3016.
[18]LU H,LIU W,FU H,et al.FADE:A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures[J].arXiv:2407.13500,2024.
[19]LU H,LIU W,YE Z,et al.SAPA:Similarity-aware point affiliation for feature upsampling[J].Advances in Neural Information Processing Systems,2022,35:20889-20901.
[20]GEIGER A,LENZ P,STILLER C,et al.KITTI Vision Benchmark Suite[EB/OL].https://www.cvlibs.net/datasets/kitti.
[21]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[22]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision－ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[23]CARION N,MASSA F,SYNNAEVEG,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[24]Ultralytics.Ultralytics/yolov5.GitHub[DB/OL].https://git-hub.com/ultralytics/yolov5.
[25]LI C,LI L,JIANGH,et al.YOLOv6:A single-stage object detection framework for industrial applications[J].arXiv:2209.02976,2022.
[26]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[27]ULTRALYTICS.Ultralytics/yolov8[DB/OL].https://github.com/ultralytics/yolov8.
[28]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于局部特征和特征融合的无人驾驶场景目标检测方法

Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0