计算机科学 ›› 2018, Vol. 45 ›› Issue (9): 11-19.doi: 10.11896/j.issn.1002-137X.2018.09.002

• 综述 • 上一篇    下一篇

基于深度卷积神经网络的目标检测技术的研究进展

王慧玲1,2, 綦小龙1,2, 武港山2   

  1. 伊犁师范学院电子与信息工程学院 新疆 伊宁8350001
    南京大学计算机科学与技术系 南京2100232
  • 收稿日期:2017-12-12 出版日期:2018-09-20 发布日期:2018-10-10
  • 通讯作者: 綦小龙(1981-),男,博士生,讲师,主要研究方向为机器学习、模式识别;武港山(1967-),男,教授,博士生导师,主要研究方向为媒体内容分析、多媒体信息检索,E-mail:gswu@nju.edu.cn
  • 作者简介:王慧玲(1981-),女,博士生,讲师,主要研究方向为计算机视觉、图像分析与处理,E-mail:dg1633019@ smail.nju.edu.cn
  • 基金资助:
    本文受国家自然科学基金(61663045)资助。

Research Progress of Object Detection Technology Based on Convolutional Neural Network in Deep Learning

WANG Hui-ling1,2, QI Xiao-long1,2, WU Gang-shan2   

  1. Department of Electronics and Information Engineering,Yili Normal University,Yining,Xinjiang 835000,China1
    Department of Computer Science and Technology,Nanjing University,Nanjing 210023,China2
  • Received:2017-12-12 Online:2018-09-20 Published:2018-10-10

摘要: 目标检测是计算机视觉领域中的一个研究热点。近年来,深度学习中的卷积神经网络在目标检测任务上表现突出。文中综述了深度学习在目标检测技术中的研究进展。首先,介绍了目标检测的两种方法和常用数据集,并分析了基于深度学习的方法在目标检测任务上所具有的优势。其次,根据深度学习的目标检测方法的发展过程,介绍了该方法所使用的经典卷积神经网络模型,并分析了各网络模型的特点。然后,从获取特征的能力、检测的速度及所使用的关键技术等方面进行了分析和总结。最后,根据基于深度学习的目标检测方法中存在的困难和挑战,对未来的发展趋势做了思考和展望。

关键词: 卷积神经网络, 目标检测, 深度学习

Abstract: Object detection is a hot topic in the field of computer vision.In recent years,convolutional neural network in deep learning has performed prominently in object detection tasks.This paper surveyed the research progress of deep learning in object detection.Firstly,two methods and commonly datasets of object detection were introduced and the advantages of deep learning based on object detection tasks were analyzed.Secondly,according to the development process of the object detection method based on deep learning,the classical convolutional neural network model used in this method was introduced,and the characteristics of each network model were analyzed.Then the aspects of the ability to acquire features,the speed of detection,and theused key technologies were analyzed and summarized.Finally,according to the difficulties and challenges existing in the object detection method based on deep learning and the future development trend,the thinking and outlook were made.

Key words: Convolution neural network, Deep learning, Object detection

中图分类号: 

  • TP183
[1]AGGARWAL J K,RYOO M S.Human activity analysis: A review[J].ACM Computing Surveys (CSUR),2011,43(3):16.
[2]DATTA R,JOSHI D,LI J,et al.Image Retrieval:Ideas,Influe-nces,and Trends of The New Age[J].ACM Computing Surveys (CSUR),2008,40(2):5.
[3]KRÜGER V,KRAGIC D,UDE A,et al.The Meaning of Action:a Review on Action Recognition and Mapping[J].Advanced Robotics,2007,21(13):1473-1501.
[4]PALMESE M,TRUCCO A.From 3-D Sonar Images to Augmented Reality Models for Objects Buried on The Seafloor[J].IEEE Transactions on Instrumentation and Measurement,2008,57(4):820-828.
[5]LI L J,SOCHER R,LI F F.Towards Total Scene Understan-ding:Classification,Annotation and Segmentation in an Automa-tic Framework[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2010:49-56.
[6]BENGIO Y.Learning Deep Architectures for AI[J].Founda-tions and Trends@ in Machine Learning,2009,2(1):1-127.
[7]DENG L.A Tutorial Survey of Architectures,Algorithms,and Applications for Deep Learning[J].APSIPA Transactions on Signal and Information Processing,2014,3(1):1-29.
[8]SCHMIDHUBER J.Deep Learning in Neural Networks:An Overview[J].Neural networks,2015,61(1):85-117.
[9]BENGIO Y.Deep Iearning of Representations:Looking forward[C]∥Proceedings of International Conference on Statistical Language and Speech Processing.Heidelberg:SpringerPress,2013:1-37.
[10]BENGIO Y,COURVILLE A,VINCENT P.Representation
learning:A Review and New Perspectives[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828.
[11]LECUN Y.Learning Invariant Feature Hierarchies[C]∥Proceedings of European Conference on Computer Vision.Heidelberg:Springer,2012:496-505.
[12]MOHAMED A,DAHL G,HINTON G.Deep Belief Networks for Phone Recognition[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2009:39-48.
[13]LOWE D G.Object Recognition FromLocal Scale-Invariant Features[C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,1999:1150-1157.
[14]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2005:886-893.
[15]HARRIS C,STEPHENS M.A Combined Corner and Edge Detector[C]∥Proceedings of AlveyVision Conference.Manchester:Springer,1988:147-151.
[16]COLLINS M,SCHAPIRE R E,SINGER Y.Logistic Regression,AdaBoost and BregmanDistances[J].Machine Learning,Springer,2002,48(1-3):253-285.
[17]JOACHIMS T.Making large-scale SVM learning practical:Technical Report,SFB 475[R].Komplexitätsreduktion in Multi-variaten Datenstrukturen,Universität Dortmund,1998.
[18]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D,et al.Object Detection with Discriminatively Trained Part-Based Models[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,32(9):1627.
[19]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-Based Lear-ning Applied to Document Recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[20]HINTON G E,SALAKHUTDINOV R R.Reducing the Dimensionality of Data with Neural Networks[J].Science,2006,313(5786):504-507.
[21]BENGIO Y,LAMBLIN P,POPOVICI D,et al.Greedy layer-wise training of deep networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2006:153-160.
[22]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2012:1097-1105.
[23]DENG J,DONG W,SOCHER R,et al.Imagenet:A Large-Scale Hierarchical Image Database[C]∥Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2009:248-255.
[24]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Ima-ge Recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:770-778.
[25]SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,Inception-Resnet and the Impact of Residual Connections on Learning[C]∥Proceedings of AAAI Conference on Artificial Intelligence.Menlo Park,CA :AAAI Press,2017:4-12.
[26]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2014:580-587.
[27]EVERINGHAM M,VAN GOOL L,WILLIAMS C K I,et al.The Pascal Visual Object Classes (voc) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
[28]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]∥Proceedings of European Conference on Computer Vision.New York:Springer,2014:740-755.
[29]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integrated Recognition, Localization and Detection Using Convolutional Networks [C]∥International Conference on Learning Representations.New York:IEEE Press,2014:368-384.
[30]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]∥Proceedings of European Conference on Computer Vision.New York:Springer,2014:818-833.
[31]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[C]∥International Conference on Learning Representations.New York:IEEE Press,2015:1264-1278.
[32]SZEGEDY C,LIU W,JIA Y,et al.Going Deeper With Convolutions[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:1-9.
[33]LIN M,CHEN Q,YAN S.Network in network[C]∥International Conference on Learning Representations.New York:IEEE Press,2014:1567-1577.
[34]IOFFE S,SZEGEDY C.Batch normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]∥Proceedings of International Conference on Machine Learning.Heidelberg:Springer Press,2015:448-456.
[35]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:2818-2826.
[36]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5987-5995.
[37]BENGIO Y,SIMARD P,FRASCONI P.Learning Long-Term Dependencies with Gradient Descent is Difficult[J].IEEE transactions on neural networks,1994,5(2):157-166.
[38]GLOROT X,BENGIO Y.Understanding the Difficulty of Trai-ning Deep Feedforward Neural Networks[C]∥Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.New York:IEEE Press,2010:249-256.
[39]LECUN Y,BOTTOU L,ORR G B,et al.Efficient backprop[M]∥Neural Networks:Tricks of the Trade.Berlin:Springer Berlin Heidelberg,1998:9-50.
[40]SAXE A M,MCCLELLAND J L,GANGULI S,et al.Exact solutions to the nonlinear dynamics of learning in deep linear neural networks[C]∥ICLR.2014:1-22.
[41]BA J,FREY B.Adaptive Dropout for Training Deep Neural
Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2013:3084-3092.
[42]HE K,SUN J.Convolutional Neural Networks at Constrained Time Cost[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:5353-5360.
[43]SRIVASTAVA R K,GREFF K,SCHMIDHUBER J.Highway Networks [C]∥International Conference on Learning Representations.New York:IEEE Press,2015:567-573.
[44]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely Connected Convolutional Networks[J/OL].https://arXiv.org/abs/1608.06993.
[45]BALDI P,SADOWSKI P J.Understanding Dropout[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2013:2814-2822.
[46]YIN X,GOUDRIAAN J,LANTINGA E A,et al.A flexible Sigmoid Function of Determinate Growth[J].Annals of Botany,2003,91(3):753-753.
[47]XU B,WANG N,CHEN T,et al.Empirical Evaluation of Rectified Activations in Convolutional Network[J/OL].https://arxiv.org/abs/1505.00853 , 2015-3-5/2015-11-27.
[48]GOODFELLOW I J,WARDEFARLEY D,MIRZA M,et al.Maxout Network[C]∥ICML 2013.2013:1319-1327.
[49]UIJLINGS J R R,SANDE K E A V D,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[50]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in
Deep Convolutional Networks for Visual Recognition[C]∥Proceedings of European Conference on Computer Vision.Heidelberg:Springer Press,2016:21-37.
[51]GIRSHICK R.Fast R-CNN [C]∥Proceedings of IEEE International Conference on Computer Vision.New York:IEEE Press,2015:1440-1448.
[52]REN S,HE K,GIRSHICK R.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]∥Proceedings of International Conference on Neural Information Processing Systems.MIT Press,2015:91-99.
[53]KIM K H,HONG S,ROH B,et al.Pvanet:Deep but lightweight neural networks for real-time object detection[J/OL].https://arxiv.org/abs/1608.08021,2016-8-29/2016-9-30.
[54]KONG T,YAO A,CHEN Y,et al.Hypernet:Towards accurate region proposal generation and joint object detection[C]∥Proceedings of the IEEE conference on computer vision and pattern recognition.New York:IEEE Press,2016:845-853.
[55]DAI J,LI Y,HE K,et al.R-fcn:Object Detection Via Region-Based Fully Convolutional Networks[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Cambridge:MIT Press,2016:379-387.
[56]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:936-944.
[57]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788.
[58]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:101-110.
[59]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single shot
multibox detector[C]∥European conference on computer vision.Cham ,Springer,2016:21-37.
[60]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional Single Shot Detector[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2301-2312.
[61]WANG X,SHRIVASTAVA A,GUPTA A.A-Fast-RCNN:
Hard Positive Generation via Adversary for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:2606-2615.
[62]HE X,ZHANG C,ZHANG L,et al.A-Optimal Projection for Image Representation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(5):1009-1015.
[63]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.NewYork: IEEE Press,2017:2999-3007.
[64]BODLA N,SINGH B,CHELLAPPA R,et al.Improving Object Detection With One Line of Code[J/OL].https://arxiv.org/abs/1704.04503.
[65]RODRIGUEZ M,LAPTEV I,SIVIC J,et al.Density-Aware Person Detection and Tracking in Crowds[C]∥Proceedings of International Conference on Computer Vision.New York:IEEE Press,2011:2423-2430.
[66]TANG S,ANDRILUKA M,SCHIELE B.Detection and Trac-king of Occluded People[J].International Journal of Computer Vision,2014,110(1):58-69.
[67]REN J,CHEN X,LIU J,et al.Accurate Single Stage Detector Using Recurrent Rolling Convolution[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5420-5428.
[68]SHRIVASTAVA A,SUKTHANKAR R,MALIK J,et al.Be-yond Skip Connections:Top-Down Modulation for Object Detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5421-5431.
[69]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[C]∥Proceedings of International Conference on Computer Vision.New York:IEEE Press,2017:2980-2988.
[70]POIRSON P,AMMIRATO P,FU C Y,et al.Fast Single Shot Detection and Pose Estimation.Fast Single Shot Detection and Pose Estimation[C]∥Proceedings of 3D Vision (3DV).New York:IEEE,Press,2016:676-684.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[12] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[13] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[14] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[15] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!