计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 192-199.doi: 10.11896/jsjkx.230500071

• 计算机图形学&多媒体 • 上一篇    下一篇

基于高深约束与边缘融合的单目3D目标检测

浦斌1, 梁正友1,2, 孙宇1,2   

  1. 1 广西大学计算机与电子信息学院 南宁 530004
    2 广西大学广西多媒体通信与网络技术重点实验室 南宁 530004
  • 收稿日期:2023-05-10 修回日期:2023-08-10 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 梁正友(zhyliang@gxu.edu.cn)
  • 作者简介:(pbbingo@foxmail.com)
  • 基金资助:
    国家自然科学基金(62171145)

Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion

PU Bin1, LIANG Zhengyou1,2, SUN Yu1,2   

  1. 1 School of Computer and Electronics Information,Guangxi University,Nanning 530004,China
    2 Guangxi Key Laboratory of Multimedia Communication and Network Technology,Guangxi University,Nanning 530004,China
  • Received:2023-05-10 Revised:2023-08-10 Online:2024-08-15 Published:2024-08-13
  • About author:PU Bin,born in 1997,postgraduate.His main research interests include mono-cular 3D object detection and image classification.
    LIANG Zhengyou,born in 1968,Ph.D,professor,is a member of CCF(No.16803M).His main research interests include computer vision,artificial intelligence and parallel distributed computing.
  • Supported by:
    National Natural Science Foundation of China(62171145).

摘要: 单目3D目标检测旨在通过单目图像完成3D目标检测,现有的单目3D目标检测算法大多基于经典的2D目标检测算法。针对单目3D目标检测算法中通过直接回归的实例深度估计不准,导致检测精度较差的问题,提出了一种基于高深约束与边缘特征融合的单目3D目标检测算法。在实例深度估计方法上采用几何投影关系下的实例3D高度与2D高度计算高深约束,将实例深度的预测转化为对目标的2D高度以及3D高度的预测;针对单目图像存在图像边缘截断目标,采用基于深度可分离卷积的边缘融合模块来加强对边缘目标的特征提取;对于图像中目标的远近造成的目标多尺度问题,设计了基于空洞卷积的多尺度混合注意力模块,增强了对最高层特征图的多尺度特征提取。实验结果表明,所提方法在KITTI数据集上的汽车类别检测精度相比基准模型提升了7.11%,优于当前的方法。

关键词: 单目3D目标检测, 高深约束, 边缘融合, 多尺度特征, 注意力机制

Abstract: Monocular 3D object detection aims to complete 3D object detection using monocular images,and most existing monocular 3D object detection algorithms are based on classical 2D object detection algorithms.To address the issue of inaccurate instance depth estimation through direct regression in monocular 3D object detection algorithms,which leads to poor detection accuracy,a monocular 3D object detection algorithm based on height-depth constraint and edge feature fusion is proposed.In the instance depth estimation method,the height-depth constraint is calculated by the instance 3D height and 2D height under the geometric projection relationship,mainly converting the prediction of instance depth into the prediction of 2D height and 3D height of the object.To address the issue of object truncation at image edges in monocular images,an edge fusion module based on depth separable convolution is used to enhance the feature extraction of edge objects.For the multi-scale problem caused by the proximity and distance of objects in the image,a multi-scale mix attention module based on dilated convolution is designed to enhance the multi-scale feature extraction of the highest layer feature map.Experimental results demonstrate the effectiveness of the proposed method,as it achieves a 7.11% improvement in car category detection accuracy compared to the baseline model on the KITTI dataset,outperforming the current methods.

Key words: Monocular 3D object detection, Height-Depth constraint, Edge fusion, Multi-scale feature, Attention mechanism

中图分类号: 

  • TP391
[1]ZHOU X,WANG D,KRAHENBUHL P.Objects as points[EB/OL].(2019-04-16)[2022-09-24].https://arxiv.org/abs/1904.07850.
[2]LIU Z,WU Z,TOTH R.Smoke:Single-stage monocular 3d object detection via keypoint estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:996-997.
[3]CHEN Y,TAI L,SUN K,et al.Monopair:Monocular 3d object detection using pairwise spatial relationships[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12093-12102.
[4]MA X,ZHANG Y,XU D,et al.Delving into localization errors for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4721-4730.
[5]DING M,HUO Y,YI H,et al.Learning depth-guided convolutions for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:1000-1001.
[6]WANG L,DU L,YE X,et al.Depth-conditioned dynamic message propagation for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:454-463.
[7]ZHOU Z,DU L,YE X,et al.SGM3D:stereo guided monocular3d object detection[J].IEEE Robotics and Automation Letters,2022,7(4):10478-10485.
[8]CHEN H,HUANG Y,TIAN W,et al.Monorun:Monocular 3d object detection by reconstruction and uncertainty propagation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10379-10388.
[9]READING C,HARAKEH A,CHAE J,et al.Categorical depth distribution network for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8555-8564.
[10]HUANG K C,WU T H,SU H T,et al.Monodtr:Monocular 3d object detection with depth-aware transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:4012-4021.
[11]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017.
[12]ZHANG Y,LU J,ZHOU J.Objects are different:Flexible monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:3289-3298.
[13]LU Y,MA X,YANG L,et al.Geometry uncertainty projection network for monocular 3d object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:3111-3121.
[14]KUMAR A,BRAZIL G,CORONA E,et al.Deviant:Depthequivariant network for monocular 3d object detection[C]//Computer Vision-ECCV 2022:17th European Conference,Tel Aviv,Israel,Part IX.Cham:Springer Nature Switzerland,2022:664-683.
[15]HE K,GKIOXARI G,DOLLAR P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[16]YU F,WANG D,SHELHAMER E,et al.Deep layer aggregation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2403-2412..
[17]KENDALL A,GAL Y,CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7482-7491.
[18]WU Y,HE K.Group normalization[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[19]WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:11534-11542.
[20]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[21]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2980-2988.
[22]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al.3d boun-ding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7074-7082.
[23]GEIGER A,LENZ P,STILLER C,et al.Vision meets robotics:The kitti dataset[J].The International Journal of Robotics Research,2013,32(11):1231-1237.
[24]LIAN Q,YE B,XU R,et al.Exploring Geometric Consistency for Monocular 3D Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:1685-1694.
[25]FU H,GONG M,WANG C,et al.Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2002-2011.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!