计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 11-19.doi: 10.11896/j.issn.1002-137X.2018.09.002
王慧玲1,2, 綦小龙1,2, 武港山2
WANG Hui-ling1,2, QI Xiao-long1,2, WU Gang-shan2
摘要: 目标检测是计算机视觉领域中的一个研究热点。近年来,深度学习中的卷积神经网络在目标检测任务上表现突出。文中综述了深度学习在目标检测技术中的研究进展。首先,介绍了目标检测的两种方法和常用数据集,并分析了基于深度学习的方法在目标检测任务上所具有的优势。其次,根据深度学习的目标检测方法的发展过程,介绍了该方法所使用的经典卷积神经网络模型,并分析了各网络模型的特点。然后,从获取特征的能力、检测的速度及所使用的关键技术等方面进行了分析和总结。最后,根据基于深度学习的目标检测方法中存在的困难和挑战,对未来的发展趋势做了思考和展望。
中图分类号:
[1]AGGARWAL J K,RYOO M S.Human activity analysis: A review[J].ACM Computing Surveys (CSUR),2011,43(3):16. [2]DATTA R,JOSHI D,LI J,et al.Image Retrieval:Ideas,Influe-nces,and Trends of The New Age[J].ACM Computing Surveys (CSUR),2008,40(2):5. [3]KRÜGER V,KRAGIC D,UDE A,et al.The Meaning of Action:a Review on Action Recognition and Mapping[J].Advanced Robotics,2007,21(13):1473-1501. [4]PALMESE M,TRUCCO A.From 3-D Sonar Images to Augmented Reality Models for Objects Buried on The Seafloor[J].IEEE Transactions on Instrumentation and Measurement,2008,57(4):820-828. [5]LI L J,SOCHER R,LI F F.Towards Total Scene Understan-ding:Classification,Annotation and Segmentation in an Automa-tic Framework[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2010:49-56. [6]BENGIO Y.Learning Deep Architectures for AI[J].Founda-tions and Trends@ in Machine Learning,2009,2(1):1-127. [7]DENG L.A Tutorial Survey of Architectures,Algorithms,and Applications for Deep Learning[J].APSIPA Transactions on Signal and Information Processing,2014,3(1):1-29. [8]SCHMIDHUBER J.Deep Learning in Neural Networks:An Overview[J].Neural networks,2015,61(1):85-117. [9]BENGIO Y.Deep Iearning of Representations:Looking forward[C]∥Proceedings of International Conference on Statistical Language and Speech Processing.Heidelberg:SpringerPress,2013:1-37. [10]BENGIO Y,COURVILLE A,VINCENT P.Representation learning:A Review and New Perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828. [11]LECUN Y.Learning Invariant Feature Hierarchies[C]∥Proceedings of European Conference on Computer Vision.Heidelberg:Springer,2012:496-505. [12]MOHAMED A,DAHL G,HINTON G.Deep Belief Networks for Phone Recognition[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2009:39-48. [13]LOWE D G.Object Recognition FromLocal Scale-Invariant Features[C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,1999:1150-1157. [14]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2005:886-893. [15]HARRIS C,STEPHENS M.A Combined Corner and Edge Detector[C]∥Proceedings of AlveyVision Conference.Manchester:Springer,1988:147-151. [16]COLLINS M,SCHAPIRE R E,SINGER Y.Logistic Regression,AdaBoost and BregmanDistances[J].Machine Learning,Springer,2002,48(1-3):253-285. [17]JOACHIMS T.Making large-scale SVM learning practical:Technical Report,SFB 475[R].Komplexitätsreduktion in Multi-variaten Datenstrukturen,Universität Dortmund,1998. [18]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D,et al.Object Detection with Discriminatively Trained Part-Based Models[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(9):1627. [19]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-Based Lear-ning Applied to Document Recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [20]HINTON G E,SALAKHUTDINOV R R.Reducing the Dimensionality of Data with Neural Networks[J].Science,2006,313(5786):504-507. [21]BENGIO Y,LAMBLIN P,POPOVICI D,et al.Greedy layer-wise training of deep networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2006:153-160. [22]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2012:1097-1105. [23]DENG J,DONG W,SOCHER R,et al.Imagenet:A Large-Scale Hierarchical Image Database[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2009:248-255. [24]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:770-778. [25]SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,Inception-Resnet and the Impact of Residual Connections on Learning[C]∥Proceedings of AAAI Conference on Artificial Intelligence.Menlo Park,CA :AAAI Press,2017:4-12. [26]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:580-587. [27]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The Pascal Visual Object Classes (voc) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338. [28]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]∥Proceedings of European Conference on Computer Vision.New York:Springer,2014:740-755. [29]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integrated Recognition, Localization and Detection Using Convolutional Networks [C]∥International Conference on Learning Representations.New York:IEEE Press,2014:368-384. [30]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]∥Proceedings of European Conference on Computer Vision.New York:Springer,2014:818-833. [31]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C]∥International Conference on Learning Representations.New York:IEEE Press,2015:1264-1278. [32]SZEGEDY C,LIU W,JIA Y,et al.Going Deeper With Convolutions[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1-9. [33]LIN M,CHEN Q,YAN S.Network in network[C]∥International Conference on Learning Representations.New York:IEEE Press,2014:1567-1577. [34]IOFFE S,SZEGEDY C.Batch normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]∥Proceedings of International Conference on Machine Learning.Heidelberg:Springer Press,2015:448-456. [35]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:2818-2826. [36]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5987-5995. [37]BENGIO Y,SIMARD P,FRASCONI P.Learning Long-Term Dependencies with Gradient Descent is Difficult[J].IEEE transactions on neural networks,1994,5(2):157-166. [38]GLOROT X,BENGIO Y.Understanding the Difficulty of Trai-ning Deep Feedforward Neural Networks[C]∥Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.New York:IEEE Press,2010:249-256. [39]LECUN Y,BOTTOU L,ORR G B,et al.Efficient backprop[M]∥Neural Networks:Tricks of the Trade.Berlin:Springer Berlin Heidelberg,1998:9-50. [40]SAXE A M,MCCLELLAND J L,GANGULI S,et al.Exact solutions to the nonlinear dynamics of learning in deep linear neural networks[C]∥ICLR.2014:1-22. [41]BA J,FREY B.Adaptive Dropout for Training Deep Neural Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2013:3084-3092. [42]HE K,SUN J.Convolutional Neural Networks at Constrained Time Cost[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:5353-5360. [43]SRIVASTAVA R K,GREFF K,SCHMIDHUBER J.Highway Networks [C]∥International Conference on Learning Representations.New York:IEEE Press,2015:567-573. [44]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely Connected Convolutional Networks[J/OL].https://arXiv.org/abs/1608.06993. [45]BALDI P,SADOWSKI P J.Understanding Dropout[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:2814-2822. [46]YIN X,GOUDRIAAN J,LANTINGA E A,et al.A flexible Sigmoid Function of Determinate Growth[J].Annals of Botany,2003,91(3):753-753. [47]XU B,WANG N,CHEN T,et al.Empirical Evaluation of Rectified Activations in Convolutional Network[J/OL].https://arxiv.org/abs/1505.00853 , 2015-3-5/2015-11-27. [48]GOODFELLOW I J,WARDEFARLEY D,MIRZA M,et al.Maxout Network[C]∥ICML 2013.2013:1319-1327. [49]UIJLINGS J R R,SANDE K E A V D,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171. [50]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[C]∥Proceedings of European Conference on Computer Vision.Heidelberg:Springer Press,2016:21-37. [51]GIRSHICK R.Fast R-CNN [C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,2015:1440-1448. [52]REN S,HE K,GIRSHICK R.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]∥Proceedings of International Conference on Neural Information Processing Systems.MIT Press,2015:91-99. [53]KIM K H,HONG S,ROH B,et al.Pvanet:Deep but lightweight neural networks for real-time object detection[J/OL].https://arxiv.org/abs/1608.08021,2016-8-29/2016-9-30. [54]KONG T,YAO A,CHEN Y,et al.Hypernet:Towards accurate region proposal generation and joint object detection[C]∥Proceedings of the IEEE conference on computer vision and pattern recognition.New York:IEEE Press,2016:845-853. [55]DAI J,LI Y,HE K,et al.R-fcn:Object Detection Via Region-Based Fully Convolutional Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2016:379-387. [56]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:936-944. [57]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788. [58]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:101-110. [59]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot multibox detector[C]∥European conference on computer vision.Cham ,Springer,2016:21-37. [60]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional Single Shot Detector[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2301-2312. [61]WANG X,SHRIVASTAVA A,GUPTA A.A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2606-2615. [62]HE X,ZHANG C,ZHANG L,et al.A-Optimal Projection for Image Representation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(5):1009-1015. [63]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.NewYork: IEEE Press,2017:2999-3007. [64]BODLA N,SINGH B,CHELLAPPA R,et al.Improving Object Detection With One Line of Code[J/OL].https://arxiv.org/abs/1704.04503. [65]RODRIGUEZ M,LAPTEV I,SIVIC J,et al.Density-Aware Person Detection and Tracking in Crowds[C]∥Proceedings of International Conference on Computer Vision.New York:IEEE Press,2011:2423-2430. [66]TANG S,ANDRILUKA M,SCHIELE B.Detection and Trac-king of Occluded People[J].International Journal of Computer Vision,2014,110(1):58-69. [67]REN J,CHEN X,LIU J,et al.Accurate Single Stage Detector Using Recurrent Rolling Convolution[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5420-5428. [68]SHRIVASTAVA A,SUKTHANKAR R,MALIK J,et al.Be-yond Skip Connections:Top-Down Modulation for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5421-5431. [69]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]∥Proceedings of International Conference on Computer Vision.New York:IEEE Press,2017:2980-2988. [70]POIRSON P,AMMIRATO P,FU C Y,et al.Fast Single Shot Detection and Pose Estimation.Fast Single Shot Detection and Pose Estimation[C]∥Proceedings of 3D Vision (3DV).New York:IEEE,Press,2016:676-684. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[3] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[4] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[7] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[8] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[9] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[10] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[11] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[12] | 王灿, 刘永坚, 解庆, 马艳春. 基于软标签和样本权重优化的Anchor Free目标检测算法 Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization 计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240 |
[13] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[14] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[15] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
|