Computer Science ›› 2020, Vol. 47 ›› Issue (10): 145-150.doi: 10.11896/jsjkx.190900172

• Computer Graphics & Multimedia • Previous Articles     Next Articles

3D Object Detection Algorithm Based on Two-stage Network

SHEN Qi1, CHEN Yi-lun2, LIU Shu3, LIU Li-gang1   

  1. 1 School of Mathematical Sciences,University of Science and Technology of China,Hefei 230026,China
    2 Department of Computer Science,and Engineering,The Chinese University of Hong Kong,Hong Kong 999077,China
    3 Tencent Holdings Ltd.,Shenzhen,Guangdong 518057,China
  • Received:2019-09-25 Revised:2019-11-25 Online:2020-10-15 Published:2020-10-16
  • About author:SHEN Qi,born in 1996,postgraduate.Her main research interests include object detection and so on.
    LIU Li-gang,born in 1975,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include computer graphics and so on.

Abstract: This paper proposes a 3D object detection algorithm,named VoxelRCNN,on the basis of LIDAR point cloud.This algorithm is based on VoxelNet 3D object detection network algorithm,and the idea of RCNN algorithm is applied to 3D object detection from 2D object detection.The VoxelRCNN algorithm is composed of two stages.Stage-1 aims to extract the information of candidate region box with the regional proposal network,and stage-2 aims to refine the object detection box extracted in stage-1,to obtain more accurate detection results.The stage-1 network voxelizes the point cloud of the whole scene,extracts the features of each voxel block as the input of the convolutional neural network,and obtains the final characteristic map through the convolutional neural network calculation.Then,the enveloping box information is learnt by regression according to the feature map.In stage-2,on the basis of the candidate region information and feature information extracted in stage-1,equivalent feature information is obtained by pooling,and returning to learning bounding box information again.Experimental results on KITTI dataset show that the proposed network structure performs well.

Key words: 3D object detection, Convolutional neural network, Kitti dataset, Region proposal network, Voxelization

CLC Number: 

  • TP391.41
[1]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[2]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[3]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]//European Conference on Computer Vision.Springer,Cham,2016:21-37.
[4]CHEN X,MA H,WAN J,et al.Multi-view 3d object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1907-1915.
[5]YANG B,LUO W,URTASUN R.Pixor:Real-time 3d object detection from point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7652-7660.
[6]ZHOU Y,TUZEL O.Voxelnet:End-to-end learning for point cloud based 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4490-4499.
[7]QI C R,SU H,MO K,et al.Pointnet:Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660.
[8]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[9]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[10]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[11]HE K,GKIOXARI G,DOLLÁR P,et al.Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision.2017:2961-2969.
[12]HINTERSTOISSER S,LEPETIT V,ILIC S,et al.Dominantorientation templates for real-time detection of texture-less objects[C]//2010 IEEE Computer Society Conference on Compu-ter Vision and Pattern Recognition.IEEE,2010:2257-2264.
[13]HINTERSTOISSER S,CAGNIART C,ILIC S,et al.Gradient response maps for real-time detection of textureless objects[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,34(5):876-888.
[14]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al.3d bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7074-7082.
[15]CHEN X,KUNDU K,ZHANG Z,et al.Monocular 3d object detection for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2147-2156.
[16]XU B,CHEN Z.Multi-level fusion based 3d object detectionfrom monocular images[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:2345-2353.
[17]CHABOT F,CHAOUCH M,RABARISOA J,et al.Deep manta:A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2040-2049.
[18]LI B.3d fully convolutional network for vehicle detection inpoint cloud[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2017:1513-1518.
[19]ENGELCKE M,RAO D,WANG D Z,et al.Vote3deep:Fast object detection in 3d point clouds using efficient convolutional neural networks[C]//2017 IEEE International Conference on Robotics and Automation (ICRA).IEEE,2017:1355-1361.
[20]SIMON M,MILZ S,AMENDE K,et al.Complex-YOLO:AnEuler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds[C]//European Conference on Computer Vision.Springer,Cham,2018:197-209.
[21]QI C R,YI L,SU H,et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017:5099-5108.
[22]KU J,MOZIFIAN M,LEE J,et al.Joint 3d proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2018:1-8.
[23]LIANG M,YANG B,WANG S,et al.Deep continuous fusionfor multi-sensor 3d object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:641-656.
[24]QI C R,LIU W,WU C,et al.Frustum pointnets for 3d object detection from rgb-d data[C]//Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:918-927.
[1] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[2] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[3] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[4] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[5] LIU Yue-hong, NIU Shao-hua, SHEN Xian-hao. Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(7): 127-131.
[6] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[7] YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[8] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[9] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[10] WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[11] SUN Jie-qi, LI Ya-feng, ZHANG Wen-bo, LIU Peng-hui. Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation [J]. Computer Science, 2022, 49(6A): 434-440.
[12] WU Zi-bin, YAN Qiao. Projected Gradient Descent Algorithm with Momentum [J]. Computer Science, 2022, 49(6A): 178-183.
[13] ZHAO Zheng-peng, LI Jun-gang, PU Yuan-yuan. Low-light Image Enhancement Based on Retinex Theory by Convolutional Neural Network [J]. Computer Science, 2022, 49(6): 199-209.
[14] HU Fu-yuan, WAN Xin-jun, SHEN Ming-fei, XU Jiang-lang, YAO Rui, TAO Zhong-ben. Survey Progress on Image Instance Segmentation Methods of Deep Convolutional Neural Network [J]. Computer Science, 2022, 49(5): 10-24.
[15] XU Hua-chi, SHI Dian-xi, CUI Yu-ning, JING Luo-xi, LIU Cong. Time Information Integration Network for Event Cameras [J]. Computer Science, 2022, 49(5): 43-49.
Full text



No Suggested Reading articles found!