Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250200051-7.doi: 10.11896/jsjkx.250200051

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Unmanned Driving Scene Object Detection Method Based on Local Features and Feature Fusion

JI Tao1,2,3, YANG Yifang1,2, FENG Yachun2, WU Lingfan2, LI Xuliang2, LI Yawei2   

  1. 1 School of Information Science,Yunnan Normal University,Kunming 650500,China
    2 School of Astronautics,Beihang University,Beijing 102206,China
    3 Southwest United Graduate School,Kunming 650500,China
  • Online:2025-11-15 Published:2025-11-10
  • About author:JI Tao,born in 1999,postgraduate.His main research interests include object detection and embedded system.
    YANG Yifan,born in 1986,Ph.D,associate professor,master supervisor.His main research interests include embedded edge intelligent computing,image enhancement,object recognition and tracking.
  • Supported by:
    National Natural Science Foundation of China(62476017).

Abstract: In the context of unmanned driving,the accuracy and robustness of object detection are of vital importance to the performance of the system.Aiming at the false detection and missed detection phenomena that occur when existing deep learning-based network models deal with small objects and occluded objects inunmanned driving scenarios,an LSDA-YOLO network model is proposed.Firstly,the LocalSimAM attention mechanism is proposed to address the issue of information loss,and it is applied to the Backbone.Meanwhile,the SHSA attention mechanism is introduced,and an information aggregation network is designed to enhance the detection ability for occluded objects.In the Neck part,by dynamically adjusting the upsampling ratio,the adaptability of the model to multi-scale features is enhanced,reducing the missed detection rate of small objects.In the Head part,the ASFF strategy is introduced to enhance the model’s multi-scale detection ability.Experimental results show that the LSDA-YOLO network model improves the mAP0.5 and mAP0.5:0.95 by 3.1 percentage points and 3.9 percentage points respectively on the KITTI dataset,outperforming the YOLOv11n baseline network model,and is suitable for high-precision real-time detection in unmanned driving scenarios.

Key words: Attention mechanism, Unmanned driving, Vehicle detection, Pedestrian detection, Feature fusion

CLC Number: 

  • TP391.41
[1]GIRSHICK R.Fast R-CNN[J].arXiv:1504.08083,2015.
[2]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[4]REDMON J.You only look once:Unified,real-time object detec-tion[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]LIM J S,ASTRID M,YOONH J,et al.Small object detection using context and attention[C]//2021 International Conference on Artificial Intelligence in Information and Communication(ICAIIC).IEEE,2021:181-186.
[7]BAI Y,ZHANG Y,DING M,et al.Sod-mtgan:Small object detection via multi-task generative adversarial network[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:206-221.
[8]KISANTAL M.Augmentation for Small Object Detection[J].arXiv:1902.07296,2019.
[9]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[10]JU M R,LUO H B,WANG Z B,et al.Improved YOLO V3 algorithm and its application in small object detection [J].Acta Optica Sinica,2019,39(7):0715004.
[11]CHEN F,GAO C,LIU F,et al.Local patch network with globalattention for infrared small object detection[J].IEEE Transactions on Aerospace and Electronic Systems,2022,58(5):3979-3991.
[12]LI G,FAN W,XIE H,et al.Detection of road objects based on camera sensors for autonomous driving in various traffic situations[J].IEEE Sensors Journal,2022,22(24):24253-24263.
[13]YANG L,ZHANG R Y,LI L,et al.Simam:A simple,parameter-free attention module for convolutional neural networks[C]//International Conference on Machine Learning.PMLR,2021:11863-11874.
[14]ZHANG H,ZU K,LU J,et al.Epsanet:An efficient pyramidsplit attention block on convolutional neural network[C]//CoRR.2021.
[15]YUN S,RO Y.Shvit:Single-head vision transformer with memory efficient macro design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:5756-5767.
[16]LIU W,LU H,FU H,et al.Learning to upsample by learning to sample[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:6027-6037.
[17]WANG J,CHEN K,XU R,et al.Carafe:Content-aware reassembly of features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3007-3016.
[18]LU H,LIU W,FU H,et al.FADE:A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures[J].arXiv:2407.13500,2024.
[19]LU H,LIU W,YE Z,et al.SAPA:Similarity-aware point affiliation for feature upsampling[J].Advances in Neural Information Processing Systems,2022,35:20889-20901.
[20]GEIGER A,LENZ P,STILLER C,et al.KITTI Vision Benchmark Suite[EB/OL].https://www.cvlibs.net/datasets/kitti.
[21]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,39(6):1137-1149.
[22]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands.Springer International Publishing,2016:21-37.
[23]CARION N,MASSA F,SYNNAEVEG,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[24]Ultralytics.Ultralytics/yolov5.GitHub[DB/OL].https://git-hub.com/ultralytics/yolov5.
[25]LI C,LI L,JIANGH,et al.YOLOv6:A single-stage object detection framework for industrial applications[J].arXiv:2209.02976,2022.
[26]WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[27]ULTRALYTICS.Ultralytics/yolov8[DB/OL].https://github.com/ultralytics/yolov8.
[28]WANG A,CHEN H,LIU L,et al.Yolov10:Real-time end-to-end object detection[J].Advances in Neural Information Processing Systems,2024,37:107984-108011.
[1] WANG Xinyu, GAO Donghuai, NING Yuwen, XU Hao, QI Haonan. Student Behavior Detection Method Based on Improved YOLO Algorithm [J]. Computer Science, 2026, 53(3): 246-256.
[2] SONG Jianhua, HE Jiawei, ZHANG Yan. Dual-channel Source Code Vulnerability Detection Model Based on Contrastive Learning [J]. Computer Science, 2026, 53(3): 424-432.
[3] QIAN Qing, CHEN Huicheng, CUI Yunhe, TANG Ruixue, FU Jinmei. Joint Entity and Relation Extraction Method with Multi-scale Collaborative Aggregation and Axial-semantic Guidance [J]. Computer Science, 2026, 53(3): 97-106.
[4] GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[5] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[6] ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341.
[7] ZHUO Tienong, YING Di, ZHAO Hui. Research on Student Classroom Concentration Integrating Cross-modal Attention and Role
Interaction
[J]. Computer Science, 2026, 53(2): 67-77.
[8] XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169.
[9] HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[10] HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244.
[11] GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[12] JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263.
[13] LIU Chenhong, LI Fenglian, YANG Jia, WANG Suzhe, CHEN Guijun. Boundary-focused Multi-scale Feature Fusion Network for Stroke Lesion Segmentation [J]. Computer Science, 2026, 53(2): 264-272.
[14] LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.
[15] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!