计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 145-150.doi: 10.11896/jsjkx.190900172

• 计算机图形学&多媒体 • 上一篇    下一篇

基于两级网络的三维目标检测算法

沈琦1, 陈逸伦2, 刘枢3, 刘利刚1   

  1. 1 中国科学技术大学数学科学学院 合肥230026
    2 香港中文大学计算机科学与工程学院 香港999077
    3 深圳市腾讯计算机系统有限公司 广东 深圳518057
  • 收稿日期:2019-09-25 修回日期:2019-11-25 出版日期:2020-10-15 发布日期:2020-10-16
  • 通讯作者: 刘利刚(lgliu@mail.ustc.edu.cn)
  • 作者简介:sq000333@mail.ustc.edu.cn

3D Object Detection Algorithm Based on Two-stage Network

SHEN Qi1, CHEN Yi-lun2, LIU Shu3, LIU Li-gang1   

  1. 1 School of Mathematical Sciences,University of Science and Technology of China,Hefei 230026,China
    2 Department of Computer Science,and Engineering,The Chinese University of Hong Kong,Hong Kong 999077,China
    3 Tencent Holdings Ltd.,Shenzhen,Guangdong 518057,China
  • Received:2019-09-25 Revised:2019-11-25 Online:2020-10-15 Published:2020-10-16
  • About author:SHEN Qi,born in 1996,postgraduate.Her main research interests include object detection and so on.
    LIU Li-gang,born in 1975,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include computer graphics and so on.

摘要: 文中提出了一种基于激光雷达点云的三维目标检测算法VoxelRCNN(Voxelization Region-Based Convolutional Neural Networks),该算法基于VoxelNet三维目标检测网络算法,将RCNN算法的思想从二维目标检测运用到三维目标检测中。VoxelRCNN算法由两级构成,第一级的目标是用区域提案网络提取候选区域框信息,第二级的目标是对第一级提取的目标检测框进行更精细的修正,以得到更精确的目标检测结果。第一级网络对整个场景的点云进行体素化,对每个体素块提取特征作为卷积神经网络的输入,经过卷积神经网络计算得到最后的特征图,根据特征图对包围盒信息进行回归学习。第二级网络依据第一级提取的候选区域信息以及特征信息,通过池化得到等大特征信息,再次回归学习包围盒信息。在KITTI数据集上的实验结果表明,提出的网络结构是有意义的。

关键词: KITTI数据集, 卷积神经网络, 区域提案网络, 三维目标检测, 体素化

Abstract: This paper proposes a 3D object detection algorithm,named VoxelRCNN,on the basis of LIDAR point cloud.This algorithm is based on VoxelNet 3D object detection network algorithm,and the idea of RCNN algorithm is applied to 3D object detection from 2D object detection.The VoxelRCNN algorithm is composed of two stages.Stage-1 aims to extract the information of candidate region box with the regional proposal network,and stage-2 aims to refine the object detection box extracted in stage-1,to obtain more accurate detection results.The stage-1 network voxelizes the point cloud of the whole scene,extracts the features of each voxel block as the input of the convolutional neural network,and obtains the final characteristic map through the convolutional neural network calculation.Then,the enveloping box information is learnt by regression according to the feature map.In stage-2,on the basis of the candidate region information and feature information extracted in stage-1,equivalent feature information is obtained by pooling,and returning to learning bounding box information again.Experimental results on KITTI dataset show that the proposed network structure performs well.

Key words: 3D object detection, Convolutional neural network, Kitti dataset, Region proposal network, Voxelization

中图分类号: 

  • TP391.41
[1]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[2]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Springer,Cham,2016:21-37.
[4]CHEN X,MA H,WAN J,et al.Multi-view 3d object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1907-1915.
[5]YANG B,LUO W,URTASUN R.Pixor:Real-time 3d object detection from point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7652-7660.
[6]ZHOU Y,TUZEL O.Voxelnet:End-to-end learning for point cloud based 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4490-4499.
[7]QI C R,SU H,MO K,et al.Pointnet:Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660.
[8]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[9]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[10]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[11]HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision.2017:2961-2969.
[12]HINTERSTOISSER S,LEPETIT V,ILIC S,et al.Dominantorientation templates for real-time detection of texture-less objects[C]//2010 IEEE Computer Society Conference on Compu-ter Vision and Pattern Recognition.IEEE,2010:2257-2264.
[13]HINTERSTOISSER S,CAGNIART C,ILIC S,et al.Gradient response maps for real-time detection of textureless objects[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,34(5):876-888.
[14]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al.3d bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7074-7082.
[15]CHEN X,KUNDU K,ZHANG Z,et al.Monocular 3d object detection for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2147-2156.
[16]XU B,CHEN Z.Multi-level fusion based 3d object detectionfrom monocular images[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:2345-2353.
[17]CHABOT F,CHAOUCH M,RABARISOA J,et al.Deep manta:A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2040-2049.
[18]LI B.3d fully convolutional network for vehicle detection inpoint cloud[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2017:1513-1518.
[19]ENGELCKE M,RAO D,WANG D Z,et al.Vote3deep:Fast object detection in 3d point clouds using efficient convolutional neural networks[C]//2017 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2017:1355-1361.
[20]SIMON M,MILZ S,AMENDE K,et al.Complex-YOLO:AnEuler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds[C]//European Conference on Computer Vision.Springer,Cham,2018:197-209.
[21]QI C R,YI L,SU H,et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017:5099-5108.
[22]KU J,MOZIFIAN M,LEE J,et al.Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2018:1-8.
[23]LIANG M,YANG B,WANG S,et al.Deep continuous fusionfor multi-sensor 3d object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:641-656.
[24]QI C R,LIU W,WU C,et al.Frustum pointnets for 3d object detection from rgb-d data[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:918-927.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[12] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[13] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[14] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[15] 吴子斌, 闫巧.
基于动量的映射式梯度下降算法
Projected Gradient Descent Algorithm with Momentum
计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!