计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 94-102.doi: 10.11896/jsjkx.190400142
张鹏, 宋一凡, 宗立波, 刘立波
ZHANG Peng, SONG Yi-fan, ZONG Li-bo, LIU Li-bo
摘要: 目标检测算法应用广泛,一直是计算机视觉领域备受关注的研究热点。近年来,随着深度学习的发展,3D图像的目标检测研究取得了巨大的突破。与2D目标检测相比,3D目标检测结合了深度信息,能够提供目标的位置、方向和大小等空间场景信息,在自动驾驶和机器人领域发展迅速。文中首先对基于深度学习的2D目标检测算法进行概述;其次根据图像、激光雷达、多传感器等不同数据采集方式,分析目前具有代表性和开创性的3D目标检测算法;结合自动驾驶的应用场景,对比分析不同 3D 目标检测算法的性能、优势和局限性;最后总结了3D目标检测的应用意义以及待解决的问题,并对 3D 目标检测的发展方向和新的挑战进行了讨论和展望。
中图分类号:
[1]LI L J,SOCHER R,LI F F.Towards total scene understanding:Classification,annotation and segmentation in an automatic framework[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:2036-2043. [2]YILMAZ A,JAVED O,SHAH M.Object tracking:A Survey[J].ACM Computing Surveys,2006,38(4):1-45. [3]KE Y,SUKTHANKAR R,HEBERT M.Event detection in crowded videos[C]//Proceedings of the IEEE International Conference on Computer Vision.2007:1-8. [4]LOWE D G.Object recognition from local scale-invariant features[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,1999:1150-1157. [5]PAPAGEORGIOU C,POGGIO T.A Trainable system for object detection[J].International Journal of Computer Vision,2000,38(1):15-33. [6]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2005:886-893. [7]SERMANET P,EIGEN D,ZHANG X,et al.OverFeat:Integrated Recognition,Localization and Detection using Convolutional Networks[J].arXiv:1312.6229v4. [8]YANG M H,KRIEGMAN D J,AHUJA N.Detecting faces in images:A survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(1):34-58. [9]ENZWEILER M,GAVRILA D M.Monocular pedestrian detection:Survey and experiments[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(12):2179-2195. [10]SUN Z,BEBIS G,MILLER R.On-road vehicle detection:A review[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(5):694-711. [11]JAIMES A,SEBE N.Multimodal human-computer interaction:A survey[J].Computer Vision and Image Understanding,2007,108(1/2):116-134. [12]YE Q,DOERMANN D.Text Detection and Recognition in Imagery:A Survey[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(7):1480-1500. [13]HELD C,KRUMM J,MARKEL P,et al.Intelligent video surveillance[J].Computer,2012,45(3):83-84. [14]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al.3D bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5632-5640. [15]JONES M,VIOLA P.Robust real-time object detection[J].International Journal of Computer Vision,2001,57:137-1154. [16]VIOLA P,JONES M J.Robust Real-Time Face Detection[J].International Journal of Computer Vision,2004,57(2):137-154. [17]ZHU Q,AVIDAN S,YEH M C,et al.Fast human detection using a cascade of histograms of oriented gradients[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2006:1491-1498. [18]FELZENSZWALB P,MCALLESTER D,RAMANAN D.A discriminatively trained,multiscale,deformable part model[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2008:1-8. [19]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D.Cascade object detection with deformable part models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2010:2241-2248. [20]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507. [21]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105. [22]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:580-587. [23]UIJLINGS J R R,VAN DE SANDE K E A,GEVERS T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-157. [24]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [25]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448. [26]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [27]LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:936-944. [28]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:779-788. [29]REDMON J,FARHADI A.YOLO9000:Better,faster,stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:6517-6525. [30]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767v1. [31]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]//European Conference on Computer Vision.Springer,2016,9905 LNCS:21-37. [32]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2017:2999-3007. [33]PASCAL.Leaderboards for the Evaluations on PASCAL VOC Data[EB/OL].http://host.robots.ox.ac.uk:8080/leaderboard/main_bootstrap.php. [34]Object Detection[EB/OL].https://handong1587.github.io/ deep_learning/2015/10/09/object-detection.html. [35]GEIGER A,LENZ P,URTASUN R.Are we ready for autonomous driving? the KITTI vision benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2012:3354-3361. [36]GUPTA S,GIRSHICK R,ARBELÁEZ P,et al.Learning richfeatures from RGB-D images for object detection and segmentation[C]//European Conference on Computer Vision.Cham:Springer,2014:345-360. [37]CHEN X,ZHU Y.3D Object Proposals for Accurate Object Class Detection[C]//Advances in Neural Information Processing Systems.2015:1-9. [38]SONG S,XIAO J.Sliding shapes for 3D object detection in depth images[C]//European Conference on Computer Vision.Springer,2014:634-651. [39]CAI Q,WEI L W,LI H S.Object Detection in RGB-D Image Based on ANNet[J].Journal of System Simulation,2016,28(9):2260-2266. [40]LUO J,JIANG M,LIU X,et al.RGB-D object recogonition based on multimodal deep learning[J].Computer Engineering and Design,2017,38(6):1624-1629. [41]SONG S,XIAO J.Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2016:808-816. [42]DENG Z,LATECKI L J.Amodal detection of 3D objects:Inferring 3D bounding boxes from 2D ones in RGB-depth images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2017:398-406. [43]ZHAO X,GUO W,LIU J.Object detection adopting sub-step merging of super-pixel and multi-modal fusion in RGB-D[J].Journal of Image and Graphics,2018,23(8):1231-1241. [44]DOUILLARD B,UNDERWOOD J,KUNTZ N,et al.On the segmentation of 3D lidar point clouds[C]//Proceedings of the IEEE International Conference on Robotics and Automation.2011:2798-2805. [45]KLASING K,WOLLHERR D,BUSS M.A clustering method for efficient segmentation of 3D laser data[C]//Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2008:4043-4048. [46]PAPON J,ABRAMOV A,SCHOELER M,et al.Voxel cloud connectivity segmentation - Supervoxels for point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2013:2027-2034. [47]LI B,ZHANG T,XIA T.Vehicle Detection from 3D Lidar Using Fully Convolutional Network[C]// Robotics:Science and Systems.2016. [48]LI B.3D fully convolutional network for vehicle detection in point cloud[C]//International Conference on Intelligent Robots and Systems.IEEE,2017:1513-1518. [49]ZENG WANG D,POSNER I.Voting for Voting in Online Point Cloud Object Detection[C]//Robotics:Science and Systems.2015. [50]ENGELCKE M,RAO D,WANG D Z,et al.Vote3Deep:Fastobject detection in 3D point clouds using efficient convolutional neural networks[C]//Proceedings of the IEEE International Conference on Robotics and Automation.IEEE,2017:1355-1361. [51]QI C R,SU H,MO K,et al.PointNet:Deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:77-85. [52]QI C R,YI L,SU H,et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017:5099-5108. [53]WANG X J,MA J,WANG N N,et al.Deep Learning Model for Point Clouds Classification Based on Graph Convolutional Network[J/OL].Laser & Optoelectronics Progress:1-c9.http://kns.nki.net/kcms/detail/31.1690.TN.20190508.1151.042.html. [54]QI C R,LIU W,WU C,et al.Frustum PointNets for 3D Object Detection from RGB-D Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:918-927. [55]WANG W,YU R,HUANG Q,et al.SGPN:Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:2569-2578. [56]ALI W,ABDELKARIM S,ZAHRAN M,et al.YOLO3D:End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud[C]//European Conference on Computer Vision.Springer,2018. [57]ZHOU Y,TUZEL O.VoxelNet:End-to-End Learning for Point Cloud Based 3D Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:4490-4499. [58]CRAMER H,SCHEUNERT U,WANIELIK G.Multi sensor fusion for object detection using generalized feature models[C]//Proceedings of the International Conference of Information Fusion.IEEE,2003. [59]CHO H,SEO Y W,KUMAR B V K V,et al.A multi-sensor fusion system for moving object detection and tracking in urban driving environments[C]//Proceedings of the IEEE International Conference on Robotics and Automation.2014:1836-1843. [60]CHEN X,MA H,WAN J,et al.Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:6526-6534. [61]LIANG M,YANG B,WANG S,et al.Deep Continuous Fusion for Multi-sensor 3D Object Detection[C]//European Conference on Computer Vision.Springer,2018:663-678. [62]KIT.The KITTI Vision Benchmark Suite[EB/OL].http://www.cvlibs.net/datasets/kitti/eval_object.php. |
[1] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[2] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[3] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[4] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[5] | 王灿, 刘永坚, 解庆, 马艳春. 基于软标签和样本权重优化的Anchor Free目标检测算法 Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization 计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240 |
[6] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[7] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[8] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[9] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[10] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[11] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[12] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[13] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[14] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[15] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
|