计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 17-26.
于进勇1, 丁鹏程2, 王超1
YU Jin-yong1, DING Peng-cheng2, WANG Chao1
摘要: 深度学习作为机器学习的一个分支,在各个领域的应用越来越广,已经成为语音识别、自然语言处理、信息检索等方面的一个主要发展方向;其在图像分类、目标检测等方面更是不断取得新的突破。文中首先梳理了卷积神经网络在目标检测中的典型应用;其次,对几种典型卷积神经网络的结构进行了对比,并总结了各自的优缺点;最后,讨论了深度学习现阶段存在的问题以及未来的发展方向。
中图分类号:
[1]LI H,ZHAO R,WANG X.Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for PixelwiseClassification[J].Computer Science,arXiv:1412,4526,2014. [2]李彦宏.2012百度年会主题报告:相信技术的力量[R].北京:百度,2013. [3]张建明,詹智财,成科扬,等.深度学习的研究与发展[J].江苏大学学报(自然科学版),2015,36(2):191-200. [4]SHEN Y,HE X,GAO J,et al.Learning semantic representations using convolutional neural networks for web search[C]∥International Conference on World Wide Web.ACM,2014:373-374. [5]GREFENSTETTE E,BLUNSOM P,FREITAS N D,et al.A Deep Architecture for Semantic Parsing[J].Computer Science,2014,30(5):1-15. [6]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[J].ar-Xiv:1404.2188,2014. [7]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014. [8]WALLACH I,DZAMBA M,HEIFETS A.AtomNet:A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery[J].Mathematische Zeitschrift,2015,47(1):34-46. [9]LIU Y,RACAH E,PRABHAT,et al.Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets[J].arXiv:1605.01156,2016. [10]CLARK C,STORKEY A.Teaching Deep Convolutional Neural Networks to Play Go[J].arXiv:1412.3409,2014:1766-1774. [11]FUHL W,SANTINI T,KASNECI G,et al.PupilNet:Convolutional Neural Networks for Robust Pupil Detection[J].Revista De Odontologia Da Unesp,2016,19(1):806-821. [12]ZHANG X,ZOU J,HE K,et al.Accelerating Very Deep Convolutional Networks for Classification and Detection[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2016,38(10):1943. [13]HARIHARAN B,ARBELEZ P,GIRSHICK R,et al.Simultaneous Detection and Segmentation[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:297-312. [14]张慧,王坤峰,王飞跃.深度学习在目标视觉检测中的应用进展与展望[J].自动化学报,2017,43(8):1289-1305. [15]LIENHART R,MAYDT J.An extended set of Haar-like fea-tures for rapid object detection[C]∥International Conference on Image Processing.IEEE,2002:900-903. [16]VIOLA P,JONES M.Rapid Object Detection using a Boosted Cascade of Simple Features[C]∥Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2001).IEEE,2003:511-518. [17]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2005).IEEE,2005:886-893. [18]CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297. [19]LIN C F,WANG S D.Fuzzy support vector machines[J].IEEE Transactions on Neural Networks,2002,13(2):464. [20]FELZENSZWALB P F,GIRSHICK R B,MCALLESTER D, et al.Object detection with discriminatively trained part-based models[J].Computer,2014,47(2):6-7. [21]卢宏涛,张秦川.深度卷积神经网络在计算机视觉中的应用研究综述[J].数据采集与处理,2016,31(1):1-17. [22]EVERINGHAM M,ESLAMI S M A,GOOL L V,et al.The Pascal,Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136. [23]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[M]∥Computer Vision-ECCV 2014.Springer International Publishing,2014:740-755. [24]MOTTAGHI R,CHEN X,LIU X,et al.The Role of Context for Object Detection and Semantic Segmentation in the Wild[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:891-898. [25]LIU C,YUEN J,TORRALBA A.Nonparametric scene parsing:Label transfer via dense scene alignment[C]∥IEEE Conference on Computer Vision and Pattern Recognition,2009(CVPR 2009).IEEE,1972:1972-1979. [26]OTSU N.A thresholding selection method from gray-level histogram[J].IEEE Transactions on Systems Man & Cybernetics,1979,9(1):62-66. [27]BOVIK A C.On detecting edges in speckle imagery[J].IEEE Transactions on Acoustics Speech & Signal Processing,1988,36(10):1618-1627. [28]BEZDEK J C.Pattern Recognition with Fuzzy Objective Function Algorithms[M].Plenum,1981. [29]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Computer Vision and Pattern Recognition.IEEE,2015:3431-3440. [30]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs[J].Computer Science,2014(4):357-361. [31]KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2011:109-117. [32]NOH H,HONG S,HAN B.Learning Deconvolution Network for Semantic Segmentation[C]∥IEEE International Conference on Computer Vision.IEEE,2015:1520-1528. [33]ZHENG S,JAYASUMANA S,ROMERA-PAREDES B,et al.Conditional Random Fields as Recurrent Neural Networks[C]∥IEEE International Conference on Computer Vision.IEEE Computer Society,2015:1529-1537. [34]JEGOU S,DROZDZAL M,VAZQUEZ D,et al.The One Hundred Layers Tiramisu:Fully Convolutional DenseNets for Semantic Segmentation[C]∥Computer Vision and Pattern Recognition Workshops.IEEE,2017:1175-1183. [35]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.,2012:1097-1105. [36]HE K,ZHANG X,REN S,et al.Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification[J].arXiv:1502:01852,2015:1026-1034. [37]XIE G S,ZHANG X Y,SHU X,et al.Task-driven feature pooling for image classification[C]∥IEEE International Conference on Computer Vision(ICCV).IEEE,2015. [38]WU R,WANG B,WANG W,et al.Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification[C]∥2015 IEEE International Conference on Computer Vision(ICCVA).IEEE,2015:1287-1295. [39]KRIZHEVSKY A.Learning Multiple Layers of Features from Tiny Images[J].Handbook of Systemic Autommune Diseases,2009,1(4):1-58. [40]LI F F,FERGUS R,PERONA P.Learning Generative Visual Models from Few Training Examples:An Incremental Bayesian Approach Tested on 101 Object Categories[C]∥Conference on Computer Vision and Pattern Recognition Workshop(CVPRW’04).IEEE,2005:178-178. [41]GRIFFIN G,HOLUB A,PERONA P.Caltech-256 Object Category Dataset[R].California Institute of Technology,2007. [42]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]∥IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2009).IEEE,2009:248-255. [43]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1-9. [44]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014. [45]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]∥Computer Vision and Pattern Recognition.IEEE,2016:770-778. [46]HUANG G,LIU Z,WEINBERGER K Q.Densely Connected Convolutional Networks[C]∥CVPR.2016. [47]CHEN Y,LI J,XIAO H,et al.Dual Path Networks[J].arXiv:1707.01629,2017. [48]EVERINGHAM M,GOOL L V,WILLIAMS C K I,et al.The Pascal Visual Object Classes (VOC) Challenge[J].International Journal of Computer Vision,2010,88(2):303-338. [49]XIAO J,HAYS J,EHINGER K A,et al.SUN database:Large-scale scene recognition from abbey to zoo[C]∥Computer Vision and Pattern Recognition.IEEE,2010:3485-3492. [50]UIJLINGS J R R,SANDE K E A V D,GEVERS T,et al.Selective Search for Object Recognition[J].International Journal of Computer Vision,2013,104(2):154-171. [51]ZITNICK C L,DOLLÁR P.Edge Boxes:Locating Object Proposals from Edges[C]∥European Conference on Computer Vision.Springer,Cham,2014:391-405. [52]温捷文,战荫伟,凌伟林,等.实时目标检测算法YOLO的批再规范化处理[J].计算机应用研究,2018,35(11):1-2. [53]SERMANET P,EIGEN D,ZHANG X,et al.OverFeat:Inte-grated Recognition,Localization and Detection using Convolutional Networks[J].arXiv:1312.6229,2013. [54]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2014:580-587. [55]GIRSHICK R.Fast R-CNN[C]∥IEEE International Con-ference on Computer Vision.IEEE Computer Society,2015:1440-1448. [56]OUYANG W,LOY C C,TANG X,et al.DeepID-Net:Defor-mable deep convolutional neural networks for object detection[C]∥Computer Vision and Pattern Recognition.IEEE,2015:2403-2412. [57]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]∥International Conference on Neural Information Processing Systems.MIT Press,2015:91-99. [58]SHRIVASTAVA A,GUPTA A,GIRSHICK R.Training Re-gion-Based Object Detectors with Online Hard Example Mining[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2016:761-769. [59]SUNG KK.Learning and example selection for object and pattern detection[M].Massachusetts Institute of Technology,1996. [60]YANG F,CHOI W,LIN Y.Exploit All the Layers:Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers[C]∥Computer Vision and Pattern Recognition.IEEE,2016:2129-2137. [61]BELL S,ZITNICK C L,BALA K,et al.Inside-Outside Net:Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2874-2883. [62]BYEON W,BREUEL T M,RAUE F,et al.Scene labeling with LSTM recurrent neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3547-3555. [63]HE K,GKIOXARI G,DOLLR P,et al.Mask R-CNN[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,PP(99):1. [64]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer Society,2017:936-944. [65]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al. Generative Adversarial Networks[J].Advances in Neural Information Processing Systems,2014,3:2672-2680. [66]LI J,LIANG X,WEI Y,et al.Perceptual Generative Adversarial Networks for Small Object Detection[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE Computer So-ciety,2017:1951-1959. [67]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]∥IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE,2016:779-788. [68]NAJIBI M,RASTEGARI M,DAVIS L S.G-CNN:An Iterative Grid Based Object Detector[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2369-2377. [69]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot MultiBoxDetector[M]∥Computer Vision-ECCV 2016.Springer International Publishing,2016:21-37. [70]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[J].arXiv:1612.08242,2016:6517-6525. [71]REN J,CHEN X,LIU J,et al.Accurate Single Stage Detector Using Recurrent Rolling Convolution[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:752-760. [72]LIPTON Z C,BERKOWITZ J,ELKAN C.A Critical Review of Recurrent Neural Networks for Sequence Learning[J].arXiv:1506.00019,2015. [73]KARPATHY A,TODERICI G,SHETTY S,et al.Large-Scale Video Classification with Convolutional Neural Networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732. [74]JI S,YANG M,YU K.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2012,35(1):221-231. [75]BACCOUCHE M,MAMALET F,WOLF C,et al.Sequential deep learning for human action recognition[C]∥International Conference on Human Behavior Unterstanding.Springer-Verlag,2011:29-39. [76]KANG K,LI H,YAN J,et al.T-CNN:Tubelets with Convolutional Neural Networks for Object Detection from Videos[J].arXiv:1604.02532,2016. [77]ZHU X,XIONG Y,DAI J,et al.Deep Feature Flow for Video Recognition[J].arXiv:1611.07715,2016. [78]潘光远.光流场算法及其在视频目标检测中的应用研究[D].上海:上海交通大学,2008. [79]SHOU Z,CHAN J,ZAREIAN A,et al.CDC:Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:1417-1426. [80]ZEILER M D,FERGUS R.Visualizing and Understanding Convolutional Networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833. [81]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [82]FELZENSZWALB P,GIRSHICK R,MCALLESTER D,et al.Visual Object Detection with Deformable Part Models[C]∥Computer Vision and Pattern Recognition.IEEE,2010:2241-2248. [83]GU C,LIM J J,ARBELAEZ P,et al.Recognition using regions[C]∥IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:1030-1037. [84]CARREIRA J,SMINCHISESCU C.CPMC:Automatic Object Segmentation Using Constrained Parametric Min-Cuts[M].IEEE Computer Society,2012. [85]王万国,田兵,刘越,等.基于RCNN的无人机巡检图像电力小部件识别研究[J].地球信息科学学报,2017,19(2):256-263. [86]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361. [87]DAI J,LI Y,HE K,et al.R-FCN:Object Detection via Region-based Fully Convolutional Networks[J].arXiv:1605.06409,2016. [88]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet Large Scale Visual Recognition Challenge[J].International Journal of Computer Vision,2015,115(3):211-252. [89]LIN M,CHEN Q,YAN S.Network In Network[J].arXiv: 1312.44003v3,2013. [90]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436. [91]DAI J,QI H,XIONG Y,et al.Deformable Convolutional Networks[C]∥IEEE International Conference on Computer Vision.IEEE,2017:764-773. |
[1] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[2] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[3] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[4] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[5] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[6] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[7] | 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132 |
[8] | 王灿, 刘永坚, 解庆, 马艳春. 基于软标签和样本权重优化的Anchor Free目标检测算法 Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization 计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240 |
[9] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[10] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[11] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[12] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[13] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[14] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[15] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
|