Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241100112-7.doi: 10.11896/jsjkx.241100112

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Three-dimensional Object Detection Algorithm of Road Scene Based on Attention Mechanism

CAO Wenbo1, WEI Mingyang1, DUAN Xiaoyong1, LIU Xueyuan1,2   

  1. 1 College of Mechanical Engineering and Transportation,Southwest Forestry University,Kunming 650224,China
    2 Key Laboratory of Environmental Protection and Safety of Motor Vehicles in Plateau Mountain Areas,Kunming 650224,China
  • Online:2025-11-15 Published:2025-11-10
  • About author:CAO Wenbo,born in 2000,postgra-duate.Her main research interest is laser radar.
    LIU Xueyuan,born in 1979,Ph.D,senior experimentalist.His main research interest is automobiles.
  • Supported by:
    Agricultural Joint Special Project of the Department of Science and Technology of Yunnan Province(202301BD070001-014)

Abstract: With the development of deep learning and on-board LiDAR,driverless cars have increasingly high requirements for detection,which not only need to accurately detect obstacles on the road,but also have high requirements on detection speed.In the complex road scene,there are always obstacles and small volume of some targets,which make it difficult to accurately detect some targets.To solve this problem,this paper proposes an improved 3D target detection method of Pointpillars algorithm model to make it have higher accuracy while guaranteeing the detection speed.Firstly,a variety of data-enhancing operations are introduced to increase the diversity and magnitude of the dataset and reduce the overfitting phenomenon.Then,an attention matrix is added to the point column feature extraction,and the importance of each voxel is dynamically adjusted according to different voxel positions and semantic information,so that the model can focus on more useful features for target detection tasks.Finally,the channel attention mechanism(CA) and spatial attention mechanism(SA) modules are added to the backbone network of the model successively,which enhance the response of the model to useful information,suppresse the interference of unimportant features to the detection results,and thus improve the representation of target features.The experimental results show that the detection accuracy of the improved algorithm model is improved in each category and detection difficulty.

Key words: Laser radar, 3D object detection, Point cloud, Data enhancement, Attention mechanism

CLC Number: 

  • TN958
[1]LI B,ZHANG T,XIA T.Vehicle detection from 3d lidar using fully convolutional network[J].arXiv:1608.07916,2016.
[2]LIANG M,YANG B,WANG S,et al.Deep continuous fusionfor multi-sensor 3d object detection[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:641-656.
[3]QI C R,SU H,MO K,et al.Pointnet:Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660.
[4]LANG A H,VORA S,CAESAR H,et al.Pointpillars:Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:12697-12705.
[5]YANG B,LUO W,URTASUN R.Pixor:Real-time 3d objectdetection from pointclouds[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2018:7652-7660.
[6]ZHAO H,JIANG L,JIA J,et al.Point transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:16259-16268.
[7]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the International Conference on Neural Information Processing Systems,2017:6000-6010.
[8]WANG Y,GUIZILINI V C,ZHANG T,et al.Detr3d:3d object detection from multi-view images via3d-to-2d queries[C]//Conference on Robot Learning.PMLR,2022:180-191.
[9]CARION N,MASSA F,SYNNAEVE G,et al.End-to-end object detection with transformers[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:213-229.
[10]ZUO C,FENG S J,ZHANG X Y.Computational imaging under deep learning:present,Challenges and future[J].Acta Optica Sinica,2020,40(1):0111003.
[11]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[12]LUO J H,WU J X.A review of Fine-grained image classification based on deep convolutional features [J].Acta Automatica Sinica,2017,43(8):1306-1318.
[13]QI C R,LIU W,WU C,et al.Frustum pointnets for 3d object detection from rgb-d data[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:918-927.
[14]WAN J H,CHEN N J.Multi-learning Emotion RecognitionMethod based on context awareness and attention mechanism [J].Journal of Beijing Normal University(Natural Science Edition),2021,57(5):601-605.
[15]YIN Z,ONCEL T.VoxelNet:End-to-End Learning for PointCloud Based 3D Object Detection[C]//2018 IEEE/CVFConfe-rence on Computer Vision and Pattern Recognition:[Volume 7 of 13].IEEE,2018:4490-4499.
[16]YAN Y,MAO Y,LI B.Second:Sparsely embedded convolutional detection[J].Sensors,2018,18(10):3337.
[17]KU J,MOZIFIAN M,LEE J,et al.Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2018:1-8.
[18]YUE Y C,CAI Y F,W D S.GridNet-3D:A Novel Real-Time 3D Object Detection Algorithm Based on Point Cloud[J].Chinese Journal of Electronics,2021,30(5):931-939.
[1] QIAN Qing, CHEN Huicheng, CUI Yunhe, TANG Ruixue, FU Jinmei. Joint Entity and Relation Extraction Method with Multi-scale Collaborative Aggregation and Axial-semantic Guidance [J]. Computer Science, 2026, 53(3): 97-106.
[2] GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[3] WANG Xinyu, GAO Donghuai, NING Yuwen, XU Hao, QI Haonan. Student Behavior Detection Method Based on Improved YOLO Algorithm [J]. Computer Science, 2026, 53(3): 246-256.
[4] ZHUO Tienong, YING Di, ZHAO Hui. Research on Student Classroom Concentration Integrating Cross-modal Attention and Role
Interaction
[J]. Computer Science, 2026, 53(2): 67-77.
[5] XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169.
[6] HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244.
[7] GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[8] JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263.
[9] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[10] ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341.
[11] LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.
[12] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[13] WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240.
[14] CHEN Qian, CHENG Kaixuan, GUO Xin, ZHANG Xiaoxia, WANG Suge, LI Yanhong. Bidirectional Prompt-Tuning for Event Argument Extraction with Topic and Entity Embeddings [J]. Computer Science, 2026, 53(1): 278-284.
[15] PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!