卷积神经网络在目标检测中的应用综述

摘要/Abstract

摘要： 深度学习作为机器学习的一个分支,在各个领域的应用越来越广,已经成为语音识别、自然语言处理、信息检索等方面的一个主要发展方向;其在图像分类、目标检测等方面更是不断取得新的突破。文中首先梳理了卷积神经网络在目标检测中的典型应用;其次,对几种典型卷积神经网络的结构进行了对比,并总结了各自的优缺点;最后,讨论了深度学习现阶段存在的问题以及未来的发展方向。

关键词: 计算机视觉, 卷积神经网络, 目标检测, 深度学习

Abstract: As a branch of machine learning,deep learning hasobtained wide application in various fields,and has become a major development direction of speech recognition,natural language processing,information retrieval and other aspects.Especially in image classification and object detection,it has made new breakthroughs.This paper first sorted out the typical applications of convolution neural network in object detection.Secondly,this paper compared several typical convolutional neural network structures,and summed up their advantages and disadvantages.Finally,the existing problems and the future development direction of deep learning were discussed.

Key words: Computer vision, Convolutional neural networks, Deep learning, Object detection

中图分类号:

TP751

于进勇, 丁鹏程, 王超. 卷积神经网络在目标检测中的应用综述[J]. 计算机科学, 2018, 45(11A): 17-26. https://doi.org/

YU Jin-yong, DING Peng-cheng, WANG Chao. Overview:Application of Convolution Neural Network in Object Detection[J]. Computer Science, 2018, 45(11A): 17-26. https://doi.org/

参考文献

[1]LI H,ZHAO R,WANG X.Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for PixelwiseClassification[J].Computer Science,arXiv:1412,4526,2014.
[2]李彦宏.2012百度年会主题报告:相信技术的力量[R].北京:百度,2013.
[3]张建明,詹智财,成科扬,等.深度学习的研究与发展[J].江苏大学学报(自然科学版),2015,36(2):191-200.
[4]SHEN Y,HE X,GAO J,et al.Learning semantic representations using convolutional neural networks for web search[C]∥International Conference on World Wide Web.ACM,2014:373-374.
[5]GREFENSTETTE E,BLUNSOM P,FREITAS N D,et al.A Deep Architecture for Semantic Parsing[J].Computer Science,2014,30(5):1-15.
[6]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[J].ar-Xiv:1404.2188,2014.
[7]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[8]WALLACH I,DZAMBA M,HEIFETS A.AtomNet:A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery[J].Mathematische Zeitschrift,2015,47(1):34-46.
[9]LIU Y,RACAH E,PRABHAT,et al.Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets[J].arXiv:1605.01156,2016.
[10]CLARK C,STORKEY A.Teaching Deep Convolutional Neural Networks to Play Go[J].arXiv:1412.3409,2014:1766-1774.
[11]FUHL W,SANTINI T,KASNECI G,et al.PupilNet:Convolutional Neural Networks for Robust Pupil Detection[J].Revista De Odontologia Da Unesp,2016,19(1):806-821.
[12]ZHANG X,ZOU J,HE K,et al.Accelerating Very Deep Convolutional Networks for Classification and Detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(10):1943.
[13]HARIHARAN B,ARBELEZ P,GIRSHICK R,et al.Simultaneous Detection and Segmentation[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:297-312.
[14]张慧,王坤峰,王飞跃.深度学习在目标视觉检测中的应用进展与展望[J].自动化学报,2017,43(8):1289-1305.
[15]LIENHART R,MAYDT J.An extended set of Haar-like fea-tures for rapid object detection[C]∥International Conference on Image Processing.IEEE,2002:900-903.
[16]VIOLA P,JONES M.Rapid Object Detection using a Boosted Cascade of Simple Features[C]∥Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2001).IEEE,2003:511-518.
[17]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2005).IEEE,2005:886-893.
[18]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[19]LIN C F,WANG S D.Fuzzy support vector machines[J].IEEE Transactions on Neural Networks,2002,13(2):464.
[20]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D, et al.Object detection with discriminatively trained part-based models[J].Computer,2014,47(2):6-7.
[21]卢宏涛,张秦川.深度卷积神经网络在计算机视觉中的应用研究综述[J].数据采集与处理,2016,31(1):1-17.
[22]EVERINGHAM M,ESLAMI S M A,GOOL L V,et al.The Pascal,Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.
[23]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:740-755.
[24]MOTTAGHI R,CHEN X,LIU X,et al.The Role of Context for Object Detection and Semantic Segmentation in the Wild[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:891-898.
[25]LIU C,YUEN J,TORRALBA A.Nonparametric scene parsing:Label transfer via dense scene alignment[C]∥IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,1972:1972-1979.
[26]OTSU N.A thresholding selection method from gray-level histogram[J].IEEE Transactions on Systems Man & Cybernetics,1979,9(1):62-66.
[27]BOVIK A C.On detecting edges in speckle imagery[J].IEEE Transactions on Acoustics Speech & Signal Processing,1988,36(10):1618-1627.
[28]BEZDEK J C.Pattern Recognition with Fuzzy Objective Function Algorithms[M].Plenum,1981.
[29]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
[30]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].Computer Science,2014(4):357-361.
[31]KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2011:109-117.
[32]NOH H,HONG S,HAN B.Learning Deconvolution Network for Semantic Segmentation[C]∥IEEE International Conference on Computer Vision.IEEE,2015:1520-1528.
[33]ZHENG S,JAYASUMANA S,ROMERA-PAREDES B,et al.Conditional Random Fields as Recurrent Neural Networks[C]∥IEEE International Conference on Computer Vision.IEEE Computer Society,2015:1529-1537.
[34]JEGOU S,DROZDZAL M,VAZQUEZ D,et al.The One Hundred Layers Tiramisu:Fully Convolutional DenseNets for Semantic Segmentation[C]∥Computer Vision and Pattern Recognition Workshops.IEEE,2017:1175-1183.
[35]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105.
[36]HE K,ZHANG X,REN S,et al.Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification[J].arXiv:1502:01852,2015:1026-1034.
[37]XIE G S,ZHANG X Y,SHU X,et al.Task-driven feature pooling for image classification[C]∥IEEE International Conference on Computer Vision(ICCV).IEEE,2015.
[38]WU R,WANG B,WANG W,et al.Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification[C]∥2015 IEEE International Conference on Computer Vision(ICCVA).IEEE,2015:1287-1295.
[39]KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images[J].Handbook of Systemic Autommune Diseases,2009,1(4):1-58.
[40]LI F F,FERGUS R,PERONA P.Learning Generative Visual Models from Few Training Examples:An Incremental Bayesian Approach Tested on 101 Object Categories[C]∥Conference on Computer Vision and Pattern Recognition Workshop(CVPRW’04).IEEE,2005:178-178.
[41]GRIFFIN G,HOLUB A,PERONA P.Caltech-256 Object Category Dataset[R].California Institute of Technology,2007.
[42]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]∥IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2009).IEEE,2009:248-255.
[43]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1-9.
[44]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[45]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]∥Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[46]HUANG G,LIU Z,WEINBERGER K Q.Densely Connected Convolutional Networks[C]∥CVPR.2016.
[47]CHEN Y,LI J,XIAO H,et al.Dual Path Networks[J].arXiv:1707.01629,2017.
[48]EVERINGHAM M,GOOL L V,WILLIAMS C K I,et al.The Pascal Visual Object Classes (VOC) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338.
[49]XIAO J,HAYS J,EHINGER K A,et al.SUN database:Large-scale scene recognition from abbey to zoo[C]∥Computer Vision and Pattern Recognition.IEEE,2010:3485-3492.
[50]UIJLINGS J R R,SANDE K E A V D,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[51]ZITNICK C L,DOLLÁR P.Edge Boxes:Locating Object Proposals from Edges[C]∥European Conference on Computer Vision.Springer,Cham,2014:391-405.
[52]温捷文,战荫伟,凌伟林,等.实时目标检测算法YOLO的批再规范化处理[J].计算机应用研究,2018,35(11):1-2.
[53]SERMANET P,EIGEN D,ZHANG X,et al.OverFeat:Inte-grated Recognition,Localization and Detection using Convolutional Networks[J].arXiv:1312.6229,2013.
[54]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:580-587.
[55]GIRSHICK R.Fast R-CNN[C]∥IEEE International Con-ference on Computer Vision.IEEE Computer Society,2015:1440-1448.
[56]OUYANG W,LOY C C,TANG X,et al.DeepID-Net:Defor-mable deep convolutional neural networks for object detection[C]∥Computer Vision and Pattern Recognition.IEEE,2015:2403-2412.
[57]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]∥International Conference on Neural Information Processing Systems.MIT Press,2015:91-99.
[58]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training Re-gion-Based Object Detectors with Online Hard Example Mining[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:761-769.
[59]SUNG KK.Learning and example selection for object and pattern detection[M].Massachusetts Institute of Technology,1996.
[60]YANG F,CHOI W,LIN Y.Exploit All the Layers:Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers[C]∥Computer Vision and Pattern Recognition.IEEE,2016:2129-2137.
[61]BELL S,ZITNICK C L,BALA K,et al.Inside-Outside Net:Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2874-2883.
[62]BYEON W,BREUEL T M,RAUE F,et al.Scene labeling with LSTM recurrent neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3547-3555.
[63]HE K,GKIOXARI G,DOLLR P,et al.Mask R-CNN[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,PP(99):1.
[64]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:936-944.
[65]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al. Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680.
[66]LI J,LIANG X,WEI Y,et al.Perceptual Generative Adversarial Networks for Small Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer So-ciety,2017:1951-1959.
[67]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE,2016:779-788.
[68]NAJIBI M,RASTEGARI M,DAVIS L S.G-CNN:An Iterative Grid Based Object Detector[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2369-2377.
[69]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot MultiBoxDetector[M]∥Computer Vision-ECCV 2016.Springer International Publishing,2016:21-37.
[70]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[J].arXiv:1612.08242,2016:6517-6525.
[71]REN J,CHEN X,LIU J,et al.Accurate Single Stage Detector Using Recurrent Rolling Convolution[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:752-760.
[72]LIPTON Z C,BERKOWITZ J,ELKAN C.A Critical Review of Recurrent Neural Networks for Sequence Learning[J].arXiv:1506.00019,2015.
[73]KARPATHY A,TODERICI G,SHETTY S,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
[74]JI S,YANG M,YU K.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2012,35(1):221-231.
[75]BACCOUCHE M,MAMALET F,WOLF C,et al.Sequential deep learning for human action recognition[C]∥International Conference on Human Behavior Unterstanding.Springer-Verlag,2011:29-39.
[76]KANG K,LI H,YAN J,et al.T-CNN:Tubelets with Convolutional Neural Networks for Object Detection from Videos[J].arXiv:1604.02532,2016.
[77]ZHU X,XIONG Y,DAI J,et al.Deep Feature Flow for Video Recognition[J].arXiv:1611.07715,2016.
[78]潘光远.光流场算法及其在视频目标检测中的应用研究[D].上海:上海交通大学,2008.
[79]SHOU Z,CHAN J,ZAREIAN A,et al.CDC:Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:1417-1426.
[80]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833.
[81]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[82]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Visual Object Detection with Deformable Part Models[C]∥Computer Vision and Pattern Recognition.IEEE,2010:2241-2248.
[83]GU C,LIM J J,ARBELAEZ P,et al.Recognition using regions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:1030-1037.
[84]CARREIRA J,SMINCHISESCU C.CPMC:Automatic Object Segmentation Using Constrained Parametric Min-Cuts[M].IEEE Computer Society,2012.
[85]王万国,田兵,刘越,等.基于RCNN的无人机巡检图像电力小部件识别研究[J].地球信息科学学报,2017,19(2):256-263.
[86]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361.
[87]DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[J].arXiv:1605.06409,2016.
[88]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet Large Scale Visual Recognition Challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[89]LIN M,CHEN Q,YAN S.Network In Network[J].arXiv: 1312.44003v3,2013.
[90]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436.
[91]DAI J,QI H,XIONG Y,et al.Deformable Convolutional Networks[C]∥IEEE International Conference on Computer Vision.IEEE,2017:764-773.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed