计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 63-73.doi: 10.11896/j.issn.1002-137X.2019.03.008
陈超,齐峰
CHEN Chao, QI Feng
摘要: 近年来,深度学习在计算机视觉、语音识别、自然语言处理和医疗影像处理等领域取得了一系列显著的研究成果。在不同类型的深度神经网络中,卷积神经网络得到了最广泛的研究,这不仅体现在学术研究领域的繁荣,更体现在对相关产业产生了巨大的现实影响和商业价值上。随着标注样本数据集的快速增长和图形处理器(GPU)性能的大幅度提高,卷积神经网络的相关研究得到了迅速的发展,并在计算机视觉领域的各种任务中成效卓然。首先,回顾了卷积神经网络的发展历史;其次,介绍了卷积神经网络的基本结构及各组件的作用;然后,详细描述了卷积神经网络在卷积层、池化层和激活函数等方面的改进研究,总结了自1998年以来比较有代表性的神经网络架构:AlexNet,ZF-Net,VGGNet,GoogLeNet,ResNet,DenseNet,DPN和SENet;在计算机视觉领域,重点介绍了卷积神经网络在图像分类/定位、目标检测、目标分割、目标跟踪、行为识别和图像超分辨率重构等应用方面的最新研究进展;最后,对卷积神经网络研究中亟待解决的问题与挑战进行了总结。
中图分类号:
[1]HUBEL D H,WIESEL T N.Receptive fields,binocular interaction and functional architecture in the cat's visual cortex[J].The Journal of physiology,1962,160(1):106-154. [2]FUKUSHIMA K.Neocognitron:A self-organizing neural net- work model for a mechanism of pattern recognition unaffected by shift in position[J].Biological Cybernetics,1980,36(4):193-202. [3]FUKUSHIMA K,MIYAKE S,ITO T.Neocognitron:A neural network model for a mechanism of visual pattern recognition[J].IEEE Transactions on Systems,Man,and Cybernetics,1982,SMC-13(5):826-834. [4]LECUN Y,BOSER B E,DENKER J S,et al.Handwritten digit recognition with a back-propagation network[C]∥Advances in neural information processing systems.1990:396-404. [5]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [6]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105. [7]ZEILER M D,FERGUS R.Visualizing and understandingconvolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833. [8]LIN M,CHEN Q,YAN S.Network in network[J].arXiv: 1312.4400,2013. [9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [10]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9. [11]HE K,ZHANG X,REN S,et al.Deep residual learning for ima- ge recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778. [12]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely connected convolutional networks[J].arXiv:1608.06993,2016. [13]CHEN Y,LI J,XIAO H,et al.Dual path networks[C]∥Advances in Neural Information Processing Systems.2017:4470-4478. [14]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[J].arXiv:1709.01507,2017. [15]ZHAI S,CHENG Y,ZHANG Z M,et al.Doubly convolutional neural networks[C]∥Advances in Neural Information Proces-sing Systems.2016:1082-1090. [16]HYVRINEN A,KSTER U.Complex cell pooling and the statistics of natural images[J].Network:Computation in Neural Systems,2007,18(2):81-100. [17]BRUNA J,SZLAM A,LECUN Y.Signal recovery from pooling representations[J].arXiv:1311.4025,2013. [18]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580,2012. [19]WAN L,ZEILER M,ZHANG S,et al.Regularization of neural networks using dropconnect[C]∥International Conference on Machine Learning.2013:1058-1066. [20]YU D,WANG H,CHEN P,et al.Mixed pooling for convolu- tional neural networks[C]∥International Conference on Rough Sets and Knowledge Technology.Springer,Cham,2014:364-375. [21]ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[J].arXiv:1301.3557,2013. [22]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361. [23]RIPPEL O,SNOEK J,ADAMS R P.Spectral representations for convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2015:2449-2457. [24]NAIR V,HINTON G E.Rectified linear units improve restric- ted Boltzmann machines[C]∥Proceedings of the 27th international conference on machine learning (ICML-10).2010:807-814. [25]MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]∥Proc.ICML.2013. [26]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034. [27]RAMACHANDRAN P,ZOPH B,LE Q.Searching for activa- tion functions[J].arXiv:1710.05941. [28]NGUYEN D T,LI W,OGUNBONA P O.Human detection from images and videos:A survey[J].Pattern Recognition,2016,51(C):148-175. [29]LI Y,WANG S,TIAN Q,et al.Feature representation for statistical-learning-based object detection:A review[J].Pattern Recognition,2015,48(11):3542-3559. [30]PEDERSOLI M,VEDALDI A,GONZLEZ J,et al.A coarse-to-fine approach for fast deformable object detection[J].Pattern Recognition,2015,48(5):1844-1853. [31]NOWLAN S J,PLATT J C.A convolutional neural network hand tracker[C]∥Advances in Neural Information Processing Systems.1995:901-908. [32]GIRSHICK R,IANDOLA F,DARRELL T,et al.Deformable part models are convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:437-446. [33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587. [34]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integra- ted recognition,localization and detection using convolutional networks[J].arXiv:1312.6229,2013. [35]GIRSHICK R.Fast r-cnn[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448. [36]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99. [37]LIN T Y,DOLLR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]∥CVPR.2017:4. [38]HE K,GKIOXARI G,DOLLR,et al.Mask r-cnn[C]∥2017 IEEE International Conference on Computer Vision (ICCV).IEEE,2017:2980-2988. [39]UIJLINGS J R R,VAN DE SANDE K E A,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171. [40]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788. [41]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shot multibox detector[C]∥European Conference on Computer Vision.Springer,Cham,2016:21-37. [42]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,Hawaii,USA,2017. [43]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017. [44]PINHEIRO P O,COLLOBERT R,DOLLR P.Learningto segment object candidates[C]∥Advances in Neural Information Processing Systems.2015:1990-1998. [45]PINHEIRO P O,LIN T Y,COLLOBERT R,et al.Learning to refine object segments[C]∥European Conference on Computer Vision.Springer,Cham,2016:75-91. [46]ZAGORUYKO S,LERER A,LIN T Y,et al.A multipath network for object detection[J].arXiv:1604.02135,2016. [47]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440. [48]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].arXiv:1606.00915,2016. [49]DAI J,HE K,LI Y,et al.Instance-sensitive fully convolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2016:534-549. [50]DAI J,HE K,SUN J.Instance-aware semantic segmentation via multi-task network cascades[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3150-3158. [51]ZHANG K,SONG H.Real-time visual tracking via online weighted multiple instance learning[J].Pattern Recognition,2013,46(1):397-411. [52]ZHANG S,YAO H,SUN X,et al.Sparse coding based visual tracking:Review and experimental comparison[J].Pattern Re-cognition,2013,46(7):1772-1788. [53]ZHANG S,WANG J,WANG Z,et al.Multi-target tracking by learning local-to-global trajectory models[J].Pattern Recognition,2015,48(2):580-590. [54]FAN J,XU W,WU Y,et al.Human tracking using convolutio- nal neural networks[J].IEEE Transactions on Neural Networks,2010,21(10):1610-1623. [55]LI H,LI Y,PORIKLI F.DeepTrack:Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking[C]∥Proceedings British Machine Vision Conference.2014:3. [56]CHEN Y,YANG X,ZHONG B,et al.CNNTracker:online discriminative object tracking via deep convolutional neural network[J].Applied Soft Computing,2016,38:1088-1098. [57]HONG S,YOU T,KWAK S,et al.Online tracking by learning discriminative saliency map with convolutional neural network[C]∥International Conference on Machine Learning.2015:597-606. [58]JI S,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231. [59]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2014:1725-1732. [60]SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C]∥Advances in Neural Information Processing Systems.2014:568-576. [61]CHRON G,LAPTEV I,SCHMID C.P-CNN:Pose-based CNN features for action recognition[C]∥Proceedings of the IEEE International Conference Cn Vomputer vision.2015:3218-3226. [62]DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]∥European Conference on Computer Vision.Springer,Cham,2014:184-199. [63]DONG C,LOY C C,TANG X.Accelerating the super-resolution convolutional neural network[C]∥European Conference on Computer Vision.Springer International Publishing,2016:391-407. [64]SHI W,CABALLERO J,HUSZR F,et al.Real-time single ima- ge and video super-resolution using an efficient sub-pixel convo-lutional neural network[C]∥Proceedings of the IEEE Conferen-ce on Computer Vision and Pattern Recognition.2016:1874-1883. [65]KIM J,KWON LEE J,MU LEE K.Accurate image super-resolution using very deep convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654. [66]LAI W S,HUANG J B,AHUJA N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[J].ar-Xiv:1704.03915,2017. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[4] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[5] | 李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023 |
[6] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[7] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[8] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[9] | 陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121 |
[10] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[11] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[12] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[13] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[14] | 张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036 |
[15] | 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105 |
|