Computer Science ›› 2024, Vol. 51 ›› Issue (8): 192-199.doi: 10.11896/jsjkx.230500071

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion

PU Bin1, LIANG Zhengyou1,2, SUN Yu1,2   

  1. 1 School of Computer and Electronics Information,Guangxi University,Nanning 530004,China
    2 Guangxi Key Laboratory of Multimedia Communication and Network Technology,Guangxi University,Nanning 530004,China
  • Received:2023-05-10 Revised:2023-08-10 Online:2024-08-15 Published:2024-08-13
  • About author:PU Bin,born in 1997,postgraduate.His main research interests include mono-cular 3D object detection and image classification.
    LIANG Zhengyou,born in 1968,Ph.D,professor,is a member of CCF(No.16803M).His main research interests include computer vision,artificial intelligence and parallel distributed computing.
  • Supported by:
    National Natural Science Foundation of China(62171145).

Abstract: Monocular 3D object detection aims to complete 3D object detection using monocular images,and most existing monocular 3D object detection algorithms are based on classical 2D object detection algorithms.To address the issue of inaccurate instance depth estimation through direct regression in monocular 3D object detection algorithms,which leads to poor detection accuracy,a monocular 3D object detection algorithm based on height-depth constraint and edge feature fusion is proposed.In the instance depth estimation method,the height-depth constraint is calculated by the instance 3D height and 2D height under the geometric projection relationship,mainly converting the prediction of instance depth into the prediction of 2D height and 3D height of the object.To address the issue of object truncation at image edges in monocular images,an edge fusion module based on depth separable convolution is used to enhance the feature extraction of edge objects.For the multi-scale problem caused by the proximity and distance of objects in the image,a multi-scale mix attention module based on dilated convolution is designed to enhance the multi-scale feature extraction of the highest layer feature map.Experimental results demonstrate the effectiveness of the proposed method,as it achieves a 7.11% improvement in car category detection accuracy compared to the baseline model on the KITTI dataset,outperforming the current methods.

Key words: Monocular 3D object detection, Height-Depth constraint, Edge fusion, Multi-scale feature, Attention mechanism

CLC Number: 

  • TP391
[1]ZHOU X,WANG D,KRAHENBUHL P.Objects as points[EB/OL].(2019-04-16)[2022-09-24].https://arxiv.org/abs/1904.07850.
[2]LIU Z,WU Z,TOTH R.Smoke:Single-stage monocular 3d object detection via keypoint estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:996-997.
[3]CHEN Y,TAI L,SUN K,et al.Monopair:Monocular 3d object detection using pairwise spatial relationships[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12093-12102.
[4]MA X,ZHANG Y,XU D,et al.Delving into localization errors for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4721-4730.
[5]DING M,HUO Y,YI H,et al.Learning depth-guided convolutions for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:1000-1001.
[6]WANG L,DU L,YE X,et al.Depth-conditioned dynamic message propagation for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:454-463.
[7]ZHOU Z,DU L,YE X,et al.SGM3D:stereo guided monocular3d object detection[J].IEEE Robotics and Automation Letters,2022,7(4):10478-10485.
[8]CHEN H,HUANG Y,TIAN W,et al.Monorun:Monocular 3d object detection by reconstruction and uncertainty propagation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10379-10388.
[9]READING C,HARAKEH A,CHAE J,et al.Categorical depth distribution network for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8555-8564.
[10]HUANG K C,WU T H,SU H T,et al.Monodtr:Monocular 3d object detection with depth-aware transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4012-4021.
[11]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017.
[12]ZHANG Y,LU J,ZHOU J.Objects are different:Flexible monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:3289-3298.
[13]LU Y,MA X,YANG L,et al.Geometry uncertainty projection network for monocular 3d object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3111-3121.
[14]KUMAR A,BRAZIL G,CORONA E,et al.Deviant:Depthequivariant network for monocular 3d object detection[C]//Computer Vision-ECCV 2022:17th European Conference,Tel Aviv,Israel,Part IX.Cham:Springer Nature Switzerland,2022:664-683.
[15]HE K,GKIOXARI G,DOLLAR P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[16]YU F,WANG D,SHELHAMER E,et al.Deep layer aggregation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2403-2412..
[17]KENDALL A,GAL Y,CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7482-7491.
[18]WU Y,HE K.Group normalization[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[19]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11534-11542.
[20]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[21]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[22]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al.3d boun-ding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7074-7082.
[23]GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:The kitti dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[24]LIAN Q,YE B,XU R,et al.Exploring Geometric Consistency for Monocular 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1685-1694.
[25]FU H,GONG M,WANG C,et al.Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
[1] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[2] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[3] WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[4] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[5] CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[6] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[7] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[8] WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[9] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
[10] BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.
[11] ZENG Zihui, LI Chaoyang, LIAO Qing. Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario [J]. Computer Science, 2024, 51(7): 108-115.
[12] YANG Zhenzhen, WANG Dongtao, YANG Yongpeng, HUA Renyu. Multi-embedding Fusion Based on top-N Recommendation [J]. Computer Science, 2024, 51(7): 140-145.
[13] HU Haibo, YANG Dan, NIE Tiezheng, KOU Yue. Graph Contrastive Learning Incorporating Multi-influence and Preference for Social Recommendation [J]. Computer Science, 2024, 51(7): 146-155.
[14] LI Jiaying, LIANG Yudong, LI Shaoji, ZHANG Kunpeng, ZHANG Chao. Study on Algorithm of Depth Image Super-resolution Guided by High-frequency Information ofColor Images [J]. Computer Science, 2024, 51(7): 197-205.
[15] LOU Zhengzheng, ZHANG Xin, HU Shizhe, WU Yunpeng. Foggy Weather Object Detection Method Based on YOLOX_s [J]. Computer Science, 2024, 51(7): 206-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!