卷积神经网络的发展及其在计算机视觉领域中的应用综述

doi:10.11896/j.issn.1002-137X.2019.03.008

摘要/Abstract

摘要： 近年来,深度学习在计算机视觉、语音识别、自然语言处理和医疗影像处理等领域取得了一系列显著的研究成果。在不同类型的深度神经网络中,卷积神经网络得到了最广泛的研究,这不仅体现在学术研究领域的繁荣,更体现在对相关产业产生了巨大的现实影响和商业价值上。随着标注样本数据集的快速增长和图形处理器(GPU)性能的大幅度提高,卷积神经网络的相关研究得到了迅速的发展,并在计算机视觉领域的各种任务中成效卓然。首先,回顾了卷积神经网络的发展历史;其次,介绍了卷积神经网络的基本结构及各组件的作用;然后,详细描述了卷积神经网络在卷积层、池化层和激活函数等方面的改进研究,总结了自1998年以来比较有代表性的神经网络架构:AlexNet,ZF-Net,VGGNet,GoogLeNet,ResNet,DenseNet,DPN和SENet;在计算机视觉领域,重点介绍了卷积神经网络在图像分类/定位、目标检测、目标分割、目标跟踪、行为识别和图像超分辨率重构等应用方面的最新研究进展;最后,对卷积神经网络研究中亟待解决的问题与挑战进行了总结。

关键词: 计算机视觉, 卷积神经网络, 人工智能, 深度学习

Abstract: In recent years,deep learning has achieved a series of remarkable research results in various fields such as computer vision,speech recognition,natural language processing and medical image processing.In different types of deep neural networks,convolution neural network has obtained most extensive study,not only reflecting the prosperity in aca-demic field,but also making a tremendous realistic impact and commercial value on the related industries.With the rapidgrowth of annotation sample data sets and the drastic improvement of GPU performance,related researches on convolutional neural networks are rapidly developed and have achieved remarkable results in various tasks in the field of computer vision.This paper reviewed the history of convolution neural network firstly.Then it introduced the basic structure of convolutional neural network and the function of each component.Next,it described the improvements of convolution neural network in convolution layer,pooling layer and activation functionin detail.Also,it summarized typical neural network architectures since 1998(such as AlexNet,ZF-Net,VGGNet,GoogLeNet,ResNet,DenseNet,DPN and SENet).In the field of computer vision,this paper emphatically introducedthe latest research progresses of convolution neural network in image classification / localization,target detection,target segmentation,target tracking,behavior re-cognition and image super-resolution reconstruction.Finally,it summarized the problems and challenges to be solvedabout convolutional neural network.

Key words: Artificial intelligence, Computer vision, Convolution neural network, Deep learning

中图分类号:

TP183

陈超,齐峰. 卷积神经网络的发展及其在计算机视觉领域中的应用综述[J]. 计算机科学, 2019, 46(3): 63-73. https://doi.org/10.11896/j.issn.1002-137X.2019.03.008

CHEN Chao, QI Feng. Review on Development of Convolutional Neural Network and Its Application in Computer Vision[J]. Computer Science, 2019, 46(3): 63-73. https://doi.org/10.11896/j.issn.1002-137X.2019.03.008

参考文献

[1]HUBEL D H,WIESEL T N.Receptive fields,binocular interaction and functional architecture in the cat's visual cortex[J].The Journal of physiology,1962,160(1):106-154.
[2]FUKUSHIMA K.Neocognitron:A self-organizing neural net-
work model for a mechanism of pattern recognition unaffected by shift in position[J].Biological Cybernetics,1980,36(4):193-202.
[3]FUKUSHIMA K,MIYAKE S,ITO T.Neocognitron:A neural
network model for a mechanism of visual pattern recognition[J].IEEE Transactions on Systems,Man,and Cybernetics,1982,SMC-13(5):826-834.
[4]LECUN Y,BOSER B E,DENKER J S,et al.Handwritten digit recognition with a back-propagation network[C]∥Advances in neural information processing systems.1990:396-404.
[5]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based
learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[6]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet
classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[7]ZEILER M D,FERGUS R.Visualizing and understandingconvolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833.
[8]LIN M,CHEN Q,YAN S.Network in network[J].arXiv:
1312.4400,2013.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9.
[11]HE K,ZHANG X,REN S,et al.Deep residual learning for ima-
ge recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[12]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely connected convolutional networks[J].arXiv:1608.06993,2016.
[13]CHEN Y,LI J,XIAO H,et al.Dual path networks[C]∥Advances in Neural Information Processing Systems.2017:4470-4478.
[14]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[J].arXiv:1709.01507,2017.
[15]ZHAI S,CHENG Y,ZHANG Z M,et al.Doubly convolutional neural networks[C]∥Advances in Neural Information Proces-sing Systems.2016:1082-1090.
[16]HYVRINEN A,KSTER U.Complex cell pooling and the
statistics of natural images[J].Network:Computation in Neural Systems,2007,18(2):81-100.
[17]BRUNA J,SZLAM A,LECUN Y.Signal recovery from pooling representations[J].arXiv:1311.4025,2013.
[18]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580,2012.
[19]WAN L,ZEILER M,ZHANG S,et al.Regularization of neural networks using dropconnect[C]∥International Conference on Machine Learning.2013:1058-1066.
[20]YU D,WANG H,CHEN P,et al.Mixed pooling for convolu-
tional neural networks[C]∥International Conference on Rough Sets and Knowledge Technology.Springer,Cham,2014:364-375.
[21]ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[J].arXiv:1301.3557,2013.
[22]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361.
[23]RIPPEL O,SNOEK J,ADAMS R P.Spectral representations
for convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2015:2449-2457.
[24]NAIR V,HINTON G E.Rectified linear units improve restric-
ted Boltzmann machines[C]∥Proceedings of the 27th international conference on machine learning (ICML-10).2010:807-814.
[25]MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]∥Proc.ICML.2013.
[26]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034.
[27]RAMACHANDRAN P,ZOPH B,LE Q.Searching for activa-
tion functions[J].arXiv:1710.05941.
[28]NGUYEN D T,LI W,OGUNBONA P O.Human detection
from images and videos:A survey[J].Pattern Recognition,2016,51(C):148-175.
[29]LI Y,WANG S,TIAN Q,et al.Feature representation for statistical-learning-based object detection:A review[J].Pattern Recognition,2015,48(11):3542-3559.
[30]PEDERSOLI M,VEDALDI A,GONZLEZ J,et al.A coarse-to-fine approach for fast deformable object detection[J].Pattern Recognition,2015,48(5):1844-1853.
[31]NOWLAN S J,PLATT J C.A convolutional neural network
hand tracker[C]∥Advances in Neural Information Processing Systems.1995:901-908.
[32]GIRSHICK R,IANDOLA F,DARRELL T,et al.Deformable
part models are convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:437-446.
[33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[34]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integra-
ted recognition,localization and detection using convolutional networks[J].arXiv:1312.6229,2013.
[35]GIRSHICK R.Fast r-cnn[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[36]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards
real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[37]LIN T Y,DOLLR P,GIRSHICK R,et al.Feature pyramid
networks for object detection[C]∥CVPR.2017:4.
[38]HE K,GKIOXARI G,DOLLR,et al.Mask r-cnn[C]∥2017
IEEE International Conference on Computer Vision (ICCV).IEEE,2017:2980-2988.
[39]UIJLINGS J R R,VAN DE SANDE K E A,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[40]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[41]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shot
multibox detector[C]∥European Conference on Computer Vision.Springer,Cham,2016:21-37.
[42]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,Hawaii,USA,2017.
[43]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017.
[44]PINHEIRO P O,COLLOBERT R,DOLLR P.Learningto
segment object candidates[C]∥Advances in Neural Information Processing Systems.2015:1990-1998.
[45]PINHEIRO P O,LIN T Y,COLLOBERT R,et al.Learning to refine object segments[C]∥European Conference on Computer Vision.Springer,Cham,2016:75-91.
[46]ZAGORUYKO S,LERER A,LIN T Y,et al.A multipath network for object detection[J].arXiv:1604.02135,2016.
[47]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[48]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].arXiv:1606.00915,2016.
[49]DAI J,HE K,LI Y,et al.Instance-sensitive fully convolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2016:534-549.
[50]DAI J,HE K,SUN J.Instance-aware semantic segmentation via multi-task network cascades[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3150-3158.
[51]ZHANG K,SONG H.Real-time visual tracking via online
weighted multiple instance learning[J].Pattern Recognition,2013,46(1):397-411.
[52]ZHANG S,YAO H,SUN X,et al.Sparse coding based visual tracking:Review and experimental comparison[J].Pattern Re-cognition,2013,46(7):1772-1788.
[53]ZHANG S,WANG J,WANG Z,et al.Multi-target tracking by learning local-to-global trajectory models[J].Pattern Recognition,2015,48(2):580-590.
[54]FAN J,XU W,WU Y,et al.Human tracking using convolutio-
nal neural networks[J].IEEE Transactions on Neural Networks,2010,21(10):1610-1623.
[55]LI H,LI Y,PORIKLI F.DeepTrack:Learning Discriminative
Feature Representations by Convolutional Neural Networks for Visual Tracking[C]∥Proceedings British Machine Vision Conference.2014:3.
[56]CHEN Y,YANG X,ZHONG B,et al.CNNTracker:online discriminative object tracking via deep convolutional neural network[J].Applied Soft Computing,2016,38:1088-1098.
[57]HONG S,YOU T,KWAK S,et al.Online tracking by learning discriminative saliency map with convolutional neural network[C]∥International Conference on Machine Learning.2015:597-606.
[58]JI S,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[59]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2014:1725-1732.
[60]SIMONYAN K,ZISSERMAN A.Two-stream convolutional
networks for action recognition in videos[C]∥Advances in Neural Information Processing Systems.2014:568-576.
[61]CHRON G,LAPTEV I,SCHMID C.P-CNN:Pose-based CNN features for action recognition[C]∥Proceedings of the IEEE International Conference Cn Vomputer vision.2015:3218-3226.
[62]DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]∥European Conference on Computer Vision.Springer,Cham,2014:184-199.
[63]DONG C,LOY C C,TANG X.Accelerating the super-resolution convolutional neural network[C]∥European Conference on Computer Vision.Springer International Publishing,2016:391-407.
[64]SHI W,CABALLERO J,HUSZR F,et al.Real-time single ima-
ge and video super-resolution using an efficient sub-pixel convo-lutional neural network[C]∥Proceedings of the IEEE Conferen-ce on Computer Vision and Pattern Recognition.2016:1874-1883.
[65]KIM J,KWON LEE J,MU LEE K.Accurate image super-resolution using very deep convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654.
[66]LAI W S,HUANG J B,AHUJA N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[J].ar-Xiv:1704.03915,2017.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[5]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[13]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[14]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed