Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250200051-7.doi: 10.11896/jsjkx.250200051

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion

JI Tao1,2,3, YANG Yifang1,2, FENG Yachun2, WU Lingfan2, LI Xuliang2, LI Yawei2   

  1. 1 School of Information Science,Yunnan Normal University,Kunming 650500,China
    2 School of Astronautics,Beihang University,Beijing 102206,China
    3 Southwest United Graduate School,Kunming 650500,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62476017).

Abstract: In the context of unmanned driving,the accuracy and robustness of object detection are of vital importance to the performance of the system.Aiming at the false detection and missed detection phenomena that occur when existing deep learning-based network models deal with small objects and occluded objects inunmanned driving scenarios,an LSDA-YOLO network model is proposed.Firstly,the LocalSimAM attention mechanism is proposed to address the issue of information loss,and it is applied to the Backbone.Meanwhile,the SHSA attention mechanism is introduced,and an information aggregation network is designed to enhance the detection ability for occluded objects.In the Neck part,by dynamically adjusting the upsampling ratio,the adaptability of the model to multi-scale features is enhanced,reducing the missed detection rate of small objects.In the Head part,the ASFF strategy is introduced to enhance the model’s multi-scale detection ability.Experimental results show that the LSDA-YOLO network model improves the mAP0.5 and mAP0.5:0.95 by 3.1 percentage points and 3.9 percentage points respectively on the KITTI dataset,outperforming the YOLOv11n baseline network model,and is suitable for high-precision real-time detection in unmanned driving scenarios.

Key words: Attention mechanism, Unmanned driving, Vehicle detection, Pedestrian detection, Feature fusion

CLC Number: 

  • TP391.41
[1]GIRSHICK R.Fast R-CNN[J].arXiv:1504.08083,2015.
[2]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[4]REDMON J.You only look once:Unified,real-time object detec-tion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]LIM J S,ASTRID M,YOONH J,et al.Small object detection using context and attention[C]//2021 International Conference on Artificial Intelligence in Information and Communication(ICAIIC).IEEE,2021:181-186.
[7]BAI Y,ZHANG Y,DING M,et al.Sod-mtgan:Small object detection via multi-task generative adversarial network[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:206-221.
[8]KISANTAL M.Augmentation for Small Object Detection[J].arXiv:1902.07296,2019.
[9]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[10]JU M R,LUO H B,WANG Z B,et al.Improved YOLO V3 algorithm and its application in small object detection [J].Acta Optica Sinica,2019,39(7):0715004.
[11]CHEN F,GAO C,LIU F,et al.Local patch network with globalattention for infrared small object detection[J].IEEE Transactions on Aerospace and Electronic Systems,2022,58(5):3979-3991.
[12]LI G,FAN W,XIE H,et al.Detection of road objects based on camera sensors for autonomous driving in various traffic situations[J].IEEE Sensors Journal,2022,22(24):24253-24263.
[13]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[14]ZHANG H,ZU K,LU J,et al.Epsanet:An efficient pyramidsplit attention block on convolutional neural network[C]//CoRR.2021.
[15]YUN S,RO Y.Shvit:Single-head vision transformer with memory efficient macro design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5756-5767.
[16]LIU W,LU H,FU H,et al.Learning to upsample by learning to sample[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:6027-6037.
[17]WANG J,CHEN K,XU R,et al.Carafe:Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3007-3016.
[18]LU H,LIU W,FU H,et al.FADE:A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures[J].arXiv:2407.13500,2024.
[19]LU H,LIU W,YE Z,et al.SAPA:Similarity-aware point affiliation for feature upsampling[J].Advances in Neural Information Processing Systems,2022,35:20889-20901.
[20]GEIGER A,LENZ P,STILLER C,et al.KITTI Vision Benchmark Suite[EB/OL].https://www.cvlibs.net/datasets/kitti.
[21]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[22]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[23]CARION N,MASSA F,SYNNAEVEG,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[24]Ultralytics.Ultralytics/yolov5.GitHub[DB/OL].https://git-hub.com/ultralytics/yolov5.
[25]LI C,LI L,JIANGH,et al.YOLOv6:A single-stage object detection framework for industrial applications[J].arXiv:2209.02976,2022.
[26]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[27]ULTRALYTICS.Ultralytics/yolov8[DB/OL].https://github.com/ultralytics/yolov8.
[28]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.
[1] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[2] PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[3] GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[4] GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[5] LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng. VSRI:Visual Semantic Relational Interactor for Image Caption [J]. Computer Science, 2025, 52(8): 222-231.
[6] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[7] LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109.
[8] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[9] ZHUANG Jianjun, WAN Li. SCF U2-Net:Lightweight U2-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
[10] XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[11] FANG Chunying, HE Yuankun, WU Anxin. Emotion Recognition Based on Brain Network Connectivity and EEG Microstates [J]. Computer Science, 2025, 52(7): 201-209.
[12] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[13] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[14] KONG Yinling, WANG Zhongqing, WANG Hongling. Study on Opinion Summarization Incorporating Evaluation Object Information [J]. Computer Science, 2025, 52(7): 233-240.
[15] ZENG Fanyun, LIAN Hechun, FENG Shanshan, WANG Qingmei. Material SEM Image Retrieval Method Based on Multi-scale Features and Enhanced HybridAttention Mechanism [J]. Computer Science, 2025, 52(6A): 240800014-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!