计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 17-26.

• 综述研究 • 上一篇    下一篇

卷积神经网络在目标检测中的应用综述

于进勇1, 丁鹏程2, 王超1   

  1. 海军航空大学控制工程系 山东 烟台2640011
    海军航空大学研究生五队 山东 烟台2640012
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 丁鹏程(1994-),男,硕士生,主要研究方向为深度学习,E-mail:632875352@qq.com
  • 作者简介:于进勇(1976-),男,博士,副教授,主要研究方向为深度学习、飞行器智能控制等;王 超(1988-),男,硕士,讲师,主要研究方向为智能算法、飞行器控制与制导。

Overview:Application of Convolution Neural Network in Object Detection

YU Jin-yong1, DING Peng-cheng2, WANG Chao1   

  1. Department of Control Engineering,Naval Aeronautical University,Yantai,Shandong 264001,China1
    Postgraduate Team No.5,Naval Aeronautical University,Yantai,Shandong 264001,China2
  • Online:2019-02-26 Published:2019-02-26

摘要: 深度学习作为机器学习的一个分支,在各个领域的应用越来越广,已经成为语音识别、自然语言处理、信息检索等方面的一个主要发展方向;其在图像分类、目标检测等方面更是不断取得新的突破。文中首先梳理了卷积神经网络在目标检测中的典型应用;其次,对几种典型卷积神经网络的结构进行了对比,并总结了各自的优缺点;最后,讨论了深度学习现阶段存在的问题以及未来的发展方向。

关键词: 计算机视觉, 卷积神经网络, 目标检测, 深度学习

Abstract: As a branch of machine learning,deep learning hasobtained wide application in various fields,and has become a major development direction of speech recognition,natural language processing,information retrieval and other aspects.Especially in image classification and object detection,it has made new breakthroughs.This paper first sorted out the typical applications of convolution neural network in object detection.Secondly,this paper compared several typical convolutional neural network structures,and summed up their advantages and disadvantages.Finally,the existing problems and the future development direction of deep learning were discussed.

Key words: Computer vision, Convolutional neural networks, Deep learning, Object detection

中图分类号: 

  • TP751
[1]LI H,ZHAO R,WANG X.Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for PixelwiseClassification[J].Computer Science,arXiv:1412,4526,2014.
[2]李彦宏.2012百度年会主题报告:相信技术的力量[R].北京:百度,2013.
[3]张建明,詹智财,成科扬,等.深度学习的研究与发展[J].江苏大学学报(自然科学版),2015,36(2):191-200.
[4]SHEN Y,HE X,GAO J,et al.Learning semantic representations using convolutional neural networks for web search[C]∥International Conference on World Wide Web.ACM,2014:373-374.
[5]GREFENSTETTE E,BLUNSOM P,FREITAS N D,et al.A Deep Architecture for Semantic Parsing[J].Computer Science,2014,30(5):1-15.
[6]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[J].ar-Xiv:1404.2188,2014.
[7]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[8]WALLACH I,DZAMBA M,HEIFETS A.AtomNet:A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery[J].Mathematische Zeitschrift,2015,47(1):34-46.
[9]LIU Y,RACAH E,PRABHAT,et al.Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets[J].arXiv:1605.01156,2016.
[10]CLARK C,STORKEY A.Teaching Deep Convolutional Neural Networks to Play Go[J].arXiv:1412.3409,2014:1766-1774.
[11]FUHL W,SANTINI T,KASNECI G,et al.PupilNet:Convolutional Neural Networks for Robust Pupil Detection[J].Revista De Odontologia Da Unesp,2016,19(1):806-821.
[12]ZHANG X,ZOU J,HE K,et al.Accelerating Very Deep Convolutional Networks for Classification and Detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(10):1943.
[13]HARIHARAN B,ARBELEZ P,GIRSHICK R,et al.Simultaneous Detection and Segmentation[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:297-312.
[14]张慧,王坤峰,王飞跃.深度学习在目标视觉检测中的应用进展与展望[J].自动化学报,2017,43(8):1289-1305.
[15]LIENHART R,MAYDT J.An extended set of Haar-like fea-tures for rapid object detection[C]∥International Conference on Image Processing.IEEE,2002:900-903.
[16]VIOLA P,JONES M.Rapid Object Detection using a Boosted Cascade of Simple Features[C]∥Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2001).IEEE,2003:511-518.
[17]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2005).IEEE,2005:886-893.
[18]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[19]LIN C F,WANG S D.Fuzzy support vector machines[J].IEEE Transactions on Neural Networks,2002,13(2):464.
[20]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D, et al.Object detection with discriminatively trained part-based models[J].Computer,2014,47(2):6-7.
[21]卢宏涛,张秦川.深度卷积神经网络在计算机视觉中的应用研究综述[J].数据采集与处理,2016,31(1):1-17.
[22]EVERINGHAM M,ESLAMI S M A,GOOL L V,et al.The Pascal,Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.
[23]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:740-755.
[24]MOTTAGHI R,CHEN X,LIU X,et al.The Role of Context for Object Detection and Semantic Segmentation in the Wild[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:891-898.
[25]LIU C,YUEN J,TORRALBA A.Nonparametric scene parsing:Label transfer via dense scene alignment[C]∥IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,1972:1972-1979.
[26]OTSU N.A thresholding selection method from gray-level histogram[J].IEEE Transactions on Systems Man & Cybernetics,1979,9(1):62-66.
[27]BOVIK A C.On detecting edges in speckle imagery[J].IEEE Transactions on Acoustics Speech & Signal Processing,1988,36(10):1618-1627.
[28]BEZDEK J C.Pattern Recognition with Fuzzy Objective Function Algorithms[M].Plenum,1981.
[29]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
[30]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].Computer Science,2014(4):357-361.
[31]KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2011:109-117.
[32]NOH H,HONG S,HAN B.Learning Deconvolution Network for Semantic Segmentation[C]∥IEEE International Conference on Computer Vision.IEEE,2015:1520-1528.
[33]ZHENG S,JAYASUMANA S,ROMERA-PAREDES B,et al.Conditional Random Fields as Recurrent Neural Networks[C]∥IEEE International Conference on Computer Vision.IEEE Computer Society,2015:1529-1537.
[34]JEGOU S,DROZDZAL M,VAZQUEZ D,et al.The One Hundred Layers Tiramisu:Fully Convolutional DenseNets for Semantic Segmentation[C]∥Computer Vision and Pattern Recognition Workshops.IEEE,2017:1175-1183.
[35]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105.
[36]HE K,ZHANG X,REN S,et al.Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification[J].arXiv:1502:01852,2015:1026-1034.
[37]XIE G S,ZHANG X Y,SHU X,et al.Task-driven feature pooling for image classification[C]∥IEEE International Conference on Computer Vision(ICCV).IEEE,2015.
[38]WU R,WANG B,WANG W,et al.Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification[C]∥2015 IEEE International Conference on Computer Vision(ICCVA).IEEE,2015:1287-1295.
[39]KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images[J].Handbook of Systemic Autommune Diseases,2009,1(4):1-58.
[40]LI F F,FERGUS R,PERONA P.Learning Generative Visual Models from Few Training Examples:An Incremental Bayesian Approach Tested on 101 Object Categories[C]∥Conference on Computer Vision and Pattern Recognition Workshop(CVPRW’04).IEEE,2005:178-178.
[41]GRIFFIN G,HOLUB A,PERONA P.Caltech-256 Object Category Dataset[R].California Institute of Technology,2007.
[42]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]∥IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2009).IEEE,2009:248-255.
[43]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1-9.
[44]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[45]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]∥Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[46]HUANG G,LIU Z,WEINBERGER K Q.Densely Connected Convolutional Networks[C]∥CVPR.2016.
[47]CHEN Y,LI J,XIAO H,et al.Dual Path Networks[J].arXiv:1707.01629,2017.
[48]EVERINGHAM M,GOOL L V,WILLIAMS C K I,et al.The Pascal Visual Object Classes (VOC) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
[49]XIAO J,HAYS J,EHINGER K A,et al.SUN database:Large-scale scene recognition from abbey to zoo[C]∥Computer Vision and Pattern Recognition.IEEE,2010:3485-3492.
[50]UIJLINGS J R R,SANDE K E A V D,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[51]ZITNICK C L,DOLLÁR P.Edge Boxes:Locating Object Proposals from Edges[C]∥European Conference on Computer Vision.Springer,Cham,2014:391-405.
[52]温捷文,战荫伟,凌伟林,等.实时目标检测算法YOLO的批再规范化处理[J].计算机应用研究,2018,35(11):1-2.
[53]SERMANET P,EIGEN D,ZHANG X,et al.OverFeat:Inte-grated Recognition,Localization and Detection using Convolutional Networks[J].arXiv:1312.6229,2013.
[54]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:580-587.
[55]GIRSHICK R.Fast R-CNN[C]∥IEEE International Con-ference on Computer Vision.IEEE Computer Society,2015:1440-1448.
[56]OUYANG W,LOY C C,TANG X,et al.DeepID-Net:Defor-mable deep convolutional neural networks for object detection[C]∥Computer Vision and Pattern Recognition.IEEE,2015:2403-2412.
[57]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]∥International Conference on Neural Information Processing Systems.MIT Press,2015:91-99.
[58]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training Re-gion-Based Object Detectors with Online Hard Example Mining[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:761-769.
[59]SUNG KK.Learning and example selection for object and pattern detection[M].Massachusetts Institute of Technology,1996.
[60]YANG F,CHOI W,LIN Y.Exploit All the Layers:Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers[C]∥Computer Vision and Pattern Recognition.IEEE,2016:2129-2137.
[61]BELL S,ZITNICK C L,BALA K,et al.Inside-Outside Net:Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2874-2883.
[62]BYEON W,BREUEL T M,RAUE F,et al.Scene labeling with LSTM recurrent neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3547-3555.
[63]HE K,GKIOXARI G,DOLLR P,et al.Mask R-CNN[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,PP(99):1.
[64]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:936-944.
[65]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al. Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[66]LI J,LIANG X,WEI Y,et al.Perceptual Generative Adversarial Networks for Small Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer So-ciety,2017:1951-1959.
[67]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE,2016:779-788.
[68]NAJIBI M,RASTEGARI M,DAVIS L S.G-CNN:An Iterative Grid Based Object Detector[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2369-2377.
[69]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot MultiBoxDetector[M]∥Computer Vision-ECCV 2016.Springer International Publishing,2016:21-37.
[70]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[J].arXiv:1612.08242,2016:6517-6525.
[71]REN J,CHEN X,LIU J,et al.Accurate Single Stage Detector Using Recurrent Rolling Convolution[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:752-760.
[72]LIPTON Z C,BERKOWITZ J,ELKAN C.A Critical Review of Recurrent Neural Networks for Sequence Learning[J].arXiv:1506.00019,2015.
[73]KARPATHY A,TODERICI G,SHETTY S,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
[74]JI S,YANG M,YU K.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2012,35(1):221-231.
[75]BACCOUCHE M,MAMALET F,WOLF C,et al.Sequential deep learning for human action recognition[C]∥International Conference on Human Behavior Unterstanding.Springer-Verlag,2011:29-39.
[76]KANG K,LI H,YAN J,et al.T-CNN:Tubelets with Convolutional Neural Networks for Object Detection from Videos[J].arXiv:1604.02532,2016.
[77]ZHU X,XIONG Y,DAI J,et al.Deep Feature Flow for Video Recognition[J].arXiv:1611.07715,2016.
[78]潘光远.光流场算法及其在视频目标检测中的应用研究[D].上海:上海交通大学,2008.
[79]SHOU Z,CHAN J,ZAREIAN A,et al.CDC:Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:1417-1426.
[80]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833.
[81]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[82]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Visual Object Detection with Deformable Part Models[C]∥Computer Vision and Pattern Recognition.IEEE,2010:2241-2248.
[83]GU C,LIM J J,ARBELAEZ P,et al.Recognition using regions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:1030-1037.
[84]CARREIRA J,SMINCHISESCU C.CPMC:Automatic Object Segmentation Using Constrained Parametric Min-Cuts[M].IEEE Computer Society,2012.
[85]王万国,田兵,刘越,等.基于RCNN的无人机巡检图像电力小部件识别研究[J].地球信息科学学报,2017,19(2):256-263.
[86]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361.
[87]DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[J].arXiv:1605.06409,2016.
[88]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet Large Scale Visual Recognition Challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[89]LIN M,CHEN Q,YAN S.Network In Network[J].arXiv: 1312.44003v3,2013.
[90]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436.
[91]DAI J,QI H,XIONG Y,et al.Deformable Convolutional Networks[C]∥IEEE International Conference on Computer Vision.IEEE,2017:764-773.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[6] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[7] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[8] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[11] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[12] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[13] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[14] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[15] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!