卷积神经网络的发展及其在计算机视觉领域中的应用综述

doi:10.11896/j.issn.1002-137X.2019.03.008

Abstract

Abstract: In recent years,deep learning has achieved a series of remarkable research results in various fields such as computer vision,speech recognition,natural language processing and medical image processing.In different types of deep neural networks,convolution neural network has obtained most extensive study,not only reflecting the prosperity in aca-demic field,but also making a tremendous realistic impact and commercial value on the related industries.With the rapidgrowth of annotation sample data sets and the drastic improvement of GPU performance,related researches on convolutional neural networks are rapidly developed and have achieved remarkable results in various tasks in the field of computer vision.This paper reviewed the history of convolution neural network firstly.Then it introduced the basic structure of convolutional neural network and the function of each component.Next,it described the improvements of convolution neural network in convolution layer,pooling layer and activation functionin detail.Also,it summarized typical neural network architectures since 1998(such as AlexNet,ZF-Net,VGGNet,GoogLeNet,ResNet,DenseNet,DPN and SENet).In the field of computer vision,this paper emphatically introducedthe latest research progresses of convolution neural network in image classification / localization,target detection,target segmentation,target tracking,behavior re-cognition and image super-resolution reconstruction.Finally,it summarized the problems and challenges to be solvedabout convolutional neural network.

Key words: Artificial intelligence, Computer vision, Convolution neural network, Deep learning

CLC Number:

TP183

CHEN Chao, QI Feng. Review on Development of Convolutional Neural Network and Its Application in Computer Vision[J].Computer Science, 2019, 46(3): 63-73.

References

[1]HUBEL D H,WIESEL T N.Receptive fields,binocular interaction and functional architecture in the cat's visual cortex[J].The Journal of physiology,1962,160(1):106-154.
[2]FUKUSHIMA K.Neocognitron:A self-organizing neural net-
work model for a mechanism of pattern recognition unaffected by shift in position[J].Biological Cybernetics,1980,36(4):193-202.
[3]FUKUSHIMA K,MIYAKE S,ITO T.Neocognitron:A neural
network model for a mechanism of visual pattern recognition[J].IEEE Transactions on Systems,Man,and Cybernetics,1982,SMC-13(5):826-834.
[4]LECUN Y,BOSER B E,DENKER J S,et al.Handwritten digit recognition with a back-propagation network[C]∥Advances in neural information processing systems.1990:396-404.
[5]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based
learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[6]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet
classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[7]ZEILER M D,FERGUS R.Visualizing and understandingconvolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833.
[8]LIN M,CHEN Q,YAN S.Network in network[J].arXiv:
1312.4400,2013.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9.
[11]HE K,ZHANG X,REN S,et al.Deep residual learning for ima-
ge recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[12]HUANG G,LIU Z,WEINBERGER K Q,et al.Densely connected convolutional networks[J].arXiv:1608.06993,2016.
[13]CHEN Y,LI J,XIAO H,et al.Dual path networks[C]∥Advances in Neural Information Processing Systems.2017:4470-4478.
[14]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[J].arXiv:1709.01507,2017.
[15]ZHAI S,CHENG Y,ZHANG Z M,et al.Doubly convolutional neural networks[C]∥Advances in Neural Information Proces-sing Systems.2016:1082-1090.
[16]HYVRINEN A,KSTER U.Complex cell pooling and the
statistics of natural images[J].Network:Computation in Neural Systems,2007,18(2):81-100.
[17]BRUNA J,SZLAM A,LECUN Y.Signal recovery from pooling representations[J].arXiv:1311.4025,2013.
[18]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580,2012.
[19]WAN L,ZEILER M,ZHANG S,et al.Regularization of neural networks using dropconnect[C]∥International Conference on Machine Learning.2013:1058-1066.
[20]YU D,WANG H,CHEN P,et al.Mixed pooling for convolu-
tional neural networks[C]∥International Conference on Rough Sets and Knowledge Technology.Springer,Cham,2014:364-375.
[21]ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[J].arXiv:1301.3557,2013.
[22]HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361.
[23]RIPPEL O,SNOEK J,ADAMS R P.Spectral representations
for convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2015:2449-2457.
[24]NAIR V,HINTON G E.Rectified linear units improve restric-
ted Boltzmann machines[C]∥Proceedings of the 27th international conference on machine learning (ICML-10).2010:807-814.
[25]MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]∥Proc.ICML.2013.
[26]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034.
[27]RAMACHANDRAN P,ZOPH B,LE Q.Searching for activa-
tion functions[J].arXiv:1710.05941.
[28]NGUYEN D T,LI W,OGUNBONA P O.Human detection
from images and videos:A survey[J].Pattern Recognition,2016,51(C):148-175.
[29]LI Y,WANG S,TIAN Q,et al.Feature representation for statistical-learning-based object detection:A review[J].Pattern Recognition,2015,48(11):3542-3559.
[30]PEDERSOLI M,VEDALDI A,GONZLEZ J,et al.A coarse-to-fine approach for fast deformable object detection[J].Pattern Recognition,2015,48(5):1844-1853.
[31]NOWLAN S J,PLATT J C.A convolutional neural network
hand tracker[C]∥Advances in Neural Information Processing Systems.1995:901-908.
[32]GIRSHICK R,IANDOLA F,DARRELL T,et al.Deformable
part models are convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:437-446.
[33]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[34]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integra-
ted recognition,localization and detection using convolutional networks[J].arXiv:1312.6229,2013.
[35]GIRSHICK R.Fast r-cnn[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[36]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards
real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[37]LIN T Y,DOLLR P,GIRSHICK R,et al.Feature pyramid
networks for object detection[C]∥CVPR.2017:4.
[38]HE K,GKIOXARI G,DOLLR,et al.Mask r-cnn[C]∥2017
IEEE International Conference on Computer Vision (ICCV).IEEE,2017:2980-2988.
[39]UIJLINGS J R R,VAN DE SANDE K E A,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[40]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[41]LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shot
multibox detector[C]∥European Conference on Computer Vision.Springer,Cham,2016:21-37.
[42]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,Hawaii,USA,2017.
[43]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017.
[44]PINHEIRO P O,COLLOBERT R,DOLLR P.Learningto
segment object candidates[C]∥Advances in Neural Information Processing Systems.2015:1990-1998.
[45]PINHEIRO P O,LIN T Y,COLLOBERT R,et al.Learning to refine object segments[C]∥European Conference on Computer Vision.Springer,Cham,2016:75-91.
[46]ZAGORUYKO S,LERER A,LIN T Y,et al.A multipath network for object detection[J].arXiv:1604.02135,2016.
[47]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[48]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].arXiv:1606.00915,2016.
[49]DAI J,HE K,LI Y,et al.Instance-sensitive fully convolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2016:534-549.
[50]DAI J,HE K,SUN J.Instance-aware semantic segmentation via multi-task network cascades[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3150-3158.
[51]ZHANG K,SONG H.Real-time visual tracking via online
weighted multiple instance learning[J].Pattern Recognition,2013,46(1):397-411.
[52]ZHANG S,YAO H,SUN X,et al.Sparse coding based visual tracking:Review and experimental comparison[J].Pattern Re-cognition,2013,46(7):1772-1788.
[53]ZHANG S,WANG J,WANG Z,et al.Multi-target tracking by learning local-to-global trajectory models[J].Pattern Recognition,2015,48(2):580-590.
[54]FAN J,XU W,WU Y,et al.Human tracking using convolutio-
nal neural networks[J].IEEE Transactions on Neural Networks,2010,21(10):1610-1623.
[55]LI H,LI Y,PORIKLI F.DeepTrack:Learning Discriminative
Feature Representations by Convolutional Neural Networks for Visual Tracking[C]∥Proceedings British Machine Vision Conference.2014:3.
[56]CHEN Y,YANG X,ZHONG B,et al.CNNTracker:online discriminative object tracking via deep convolutional neural network[J].Applied Soft Computing,2016,38:1088-1098.
[57]HONG S,YOU T,KWAK S,et al.Online tracking by learning discriminative saliency map with convolutional neural network[C]∥International Conference on Machine Learning.2015:597-606.
[58]JI S,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[59]KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2014:1725-1732.
[60]SIMONYAN K,ZISSERMAN A.Two-stream convolutional
networks for action recognition in videos[C]∥Advances in Neural Information Processing Systems.2014:568-576.
[61]CHRON G,LAPTEV I,SCHMID C.P-CNN:Pose-based CNN features for action recognition[C]∥Proceedings of the IEEE International Conference Cn Vomputer vision.2015:3218-3226.
[62]DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]∥European Conference on Computer Vision.Springer,Cham,2014:184-199.
[63]DONG C,LOY C C,TANG X.Accelerating the super-resolution convolutional neural network[C]∥European Conference on Computer Vision.Springer International Publishing,2016:391-407.
[64]SHI W,CABALLERO J,HUSZR F,et al.Real-time single ima-
ge and video super-resolution using an efficient sub-pixel convo-lutional neural network[C]∥Proceedings of the IEEE Conferen-ce on Computer Vision and Pattern Recognition.2016:1874-1883.
[65]KIM J,KWON LEE J,MU LEE K.Accurate image super-resolution using very deep convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654.
[66]LAI W S,HUANG J B,AHUJA N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[J].ar-Xiv:1704.03915,2017.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Review on Development of Convolutional Neural Network and Its Application in Computer Vision

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[4]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8]	HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[9]	ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
[10]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11]	HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[12]	ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[13]	SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[14]	LIU Wei-ye, LU Hui-min, LI Yu-peng, MA Ning. Survey on Finger Vein Recognition Research [J]. Computer Science, 2022, 49(6A): 1-11.
[15]	SUN Fu-quan, CUI Zhi-qing, ZOU Peng, ZHANG Kun. Brain Tumor Segmentation Algorithm Based on Multi-scale Features [J]. Computer Science, 2022, 49(6A): 12-16.