Computer Science ›› 2019, Vol. 46 ›› Issue (3): 63-73.doi: 10.11896/j.issn.1002-137X.2019.03.008

• Surveys • Previous Articles     Next Articles

Review on Development of Convolutional Neural Network and Its Application in Computer Vision

CHEN Chao, QI Feng   

  1. (School of Management Science and Engineering,Shandong Normal University,Jinan 250000,China)
  • Received:2018-03-05 Revised:2018-06-27 Online:2019-03-15 Published:2019-03-22

Abstract: In recent years,deep learning has achieved a series of remarkable research results in various fields such as computer vision,speech recognition,natural language processing and medical image processing.In different types of deep neural networks,convolution neural network has obtained most extensive study,not only reflecting the prosperity in aca-demic field,but also making a tremendous realistic impact and commercial value on the related industries.With the rapidgrowth of annotation sample data sets and the drastic improvement of GPU performance,related researches on convolutional neural networks are rapidly developed and have achieved remarkable results in various tasks in the field of computer vision.This paper reviewed the history of convolution neural network firstly.Then it introduced the basic structure of convolutional neural network and the function of each component.Next,it described the improvements of convolution neural network in convolution layer,pooling layer and activation functionin detail.Also,it summarized typical neural network architectures since 1998(such as AlexNet,ZF-Net,VGGNet,GoogLeNet,ResNet,DenseNet,DPN and SENet).In the field of computer vision,this paper emphatically introducedthe latest research progresses of convolution neural network in image classification / localization,target detection,target segmentation,target tracking,behavior re-cognition and image super-resolution reconstruction.Finally,it summarized the problems and challenges to be solvedabout convolutional neural network.

Key words: Artificial intelligence, Deep learning, Convolution neural network, Computer vision

CLC Number: 

  • TP183
[1] HUBEL D H,WIESEL T N.Receptive fields,binocular interaction and functional architecture in the cat's visual cortex[J].The Journal of physiology,1962,160(1):106-154.
[2] FUKUSHIMA K.Neocognitron:A self-organizing neural net-work model for a mechanism of pattern recognition unaffected by shift in position[J].Biological Cybernetics,1980,36(4):193-202.
[3] FUKUSHIMA K,MIYAKE S,ITO T.Neocognitron:A neuralnetwork model for a mechanism of visual pattern recognition[J].IEEE Transactions on Systems,Man,and Cybernetics,1982,SMC-13(5):826-834.
[4] LECUN Y,BOSER B E,DENKER J S,et al.Handwritten digit recognition with a back-propagation network[C]∥Advances in neural information processing systems.1990:396-404.
[5] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[6] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[7] ZEILER M D,FERGUS R.Visualizing and understandingconvolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2014:818-833.
[8] LIN M,CHEN Q,YAN S.Network in network[J].arXiv:1312.4400,2013.
[9] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9.
[11] HE K,ZHANG X,REN S,et al.Deep residual learning for ima-ge recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[12] HUANG G,LIU Z,WEINBERGER K Q,et al.Densely connected convolutional networks[J].arXiv:1608.06993,2016.
[13] CHEN Y,LI J,XIAO H,et al.Dual path networks[C]∥Advances in Neural Information Processing Systems.2017:4470-4478.
[14] HU J,SHEN L,SUN G.Squeeze-and-excitation networks[J].arXiv:1709.01507,2017.
[15] ZHAI S,CHENG Y,ZHANG Z M,et al.Doubly convolutional neural networks[C]∥Advances in Neural Information Proces-sing Systems.2016:1082-1090.
[16] HYVRINEN A,KSTER U.Complex cell pooling and thestatistics of natural images[J].Network:Computation in Neural Systems,2007,18(2):81-100.
[17] BRUNA J,SZLAM A,LECUN Y.Signal recovery from pooling representations[J].arXiv:1311.4025,2013.
[18] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580,2012.
[19] WAN L,ZEILER M,ZHANG S,et al.Regularization of neural networks using dropconnect[C]∥International Conference on Machine Learning.2013:1058-1066.
[20] YU D,WANG H,CHEN P,et al.Mixed pooling for convolu-tional neural networks[C]∥International Conference on Rough Sets and Knowledge Technology.Springer,Cham,2014:364-375.
[21] ZEILER M D,FERGUS R.Stochastic pooling for regularization of deep convolutional neural networks[J].arXiv:1301.3557,2013.
[22] HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[C]∥European Conference on Computer Vision.Springer,Cham,2014:346-361.
[23] RIPPEL O,SNOEK J,ADAMS R P.Spectral representationsfor convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2015:2449-2457.
[24] NAIR V,HINTON G E.Rectified linear units improve restric-ted Boltzmann machines[C]∥Proceedings of the 27th international conference on machine learning (ICML-10).2010:807-814.
[25] MAAS A L,HANNUN A Y,NG A Y.Rectifier nonlinearities improve neural network acoustic models[C]∥Proc.ICML.2013.
[26] HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034.
[27] RAMACHANDRAN P,ZOPH B,LE Q.Searching for activa-tion functions[J].arXiv:1710.05941.
[28] NGUYEN D T,LI W,OGUNBONA P O.Human detectionfrom images and videos:A survey[J].Pattern Recognition,2016,51(C):148-175.
[29] LI Y,WANG S,TIAN Q,et al.Feature representation for statistical-learning-based object detection:A review[J].Pattern Recognition,2015,48(11):3542-3559.
[30] PEDERSOLI M,VEDALDI A,GONZLEZ J,et al.A coarse-to-fine approach for fast deformable object detection[J].Pattern Recognition,2015,48(5):1844-1853.
[31] NOWLAN S J,PLATT J C.A convolutional neural networkhand tracker[C]∥Advances in Neural Information Processing Systems.1995:901-908.
[32] GIRSHICK R,IANDOLA F,DARRELL T,et al.Deformablepart models are convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2015:437-446.
[33] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[34] SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:Integra-ted recognition,localization and detection using convolutional networks[J].arXiv:1312.6229,2013.
[35] GIRSHICK R.Fast r-cnn[C]∥Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[36] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
[37] LIN T Y,DOLLR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]∥CVPR.2017:4.
[38] HE K,GKIOXARI G,DOLLR,et al.Mask r-cnn[C]∥2017 IEEE International Conference on Computer Vision (ICCV).IEEE,2017:2980-2988.
[39] UIJLINGS J R R,VAN DE SANDE K E A,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[40] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:Unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[41] LIU W,ANGUELOV D,ERHAN D,et al.Ssd:Single shotmultibox detector[C]∥European Conference on Computer Vision.Springer,Cham,2016:21-37.
[42] REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Honolulu,Hawaii,USA,2017.
[43] FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional single shot detector[J].arXiv:1701.06659,2017.
[44] PINHEIRO P O,COLLOBERT R,DOLLR P.Learningtosegment object candidates[C]∥Advances in Neural Information Processing Systems.2015:1990-1998.
[45] PINHEIRO P O,LIN T Y,COLLOBERT R,et al.Learning to refine object segments[C]∥European Conference on Computer Vision.Springer,Cham,2016:75-91.
[46] ZAGORUYKO S,LERER A,LIN T Y,et al.A multipath network for object detection[J].arXiv:1604.02135,2016.
[47] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[48] CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].arXiv:1606.00915,2016.
[49] DAI J,HE K,LI Y,et al.Instance-sensitive fully convolutional networks[C]∥European Conference on Computer Vision.Springer,Cham,2016:534-549.
[50] DAI J,HE K,SUN J.Instance-aware semantic segmentation via multi-task network cascades[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:3150-3158.
[51] ZHANG K,SONG H.Real-time visual tracking via onlineweighted multiple instance learning[J].Pattern Recognition,2013,46(1):397-411.
[52] ZHANG S,YAO H,SUN X,et al.Sparse coding based visual tracking:Review and experimental comparison[J].Pattern Re-cognition,2013,46(7):1772-1788.
[53] ZHANG S,WANG J,WANG Z,et al.Multi-target tracking by learning local-to-global trajectory models[J].Pattern Recognition,2015,48(2):580-590.
[54] FAN J,XU W,WU Y,et al.Human tracking using convolutio-nal neural networks[J].IEEE Transactions on Neural Networks,2010,21(10):1610-1623.
[55] LI H,LI Y,PORIKLI F.DeepTrack:Learning DiscriminativeFeature Representations by Convolutional Neural Networks for Visual Tracking[C]∥Proceedings British Machine Vision Conference.2014:3.
[56] CHEN Y,YANG X,ZHONG B,et al.CNNTracker:online discriminative object tracking via deep convolutional neural network[J].Applied Soft Computing,2016,38:1088-1098.
[57] HONG S,YOU T,KWAK S,et al.Online tracking by learning discriminative saliency map with convolutional neural network[C]∥International Conference on Machine Learning.2015:597-606.
[58] JI S,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[59] KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[C]∥Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.2014:1725-1732.
[60] SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]∥Advances in Neural Information Processing Systems.2014:568-576.
[61] CHRON G,LAPTEV I,SCHMID C.P-CNN:Pose-based CNN features for action recognition[C]∥Proceedings of the IEEE International Conference Cn Vomputer vision.2015:3218-3226.
[62] DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]∥European Conference on Computer Vision.Springer,Cham,2014:184-199.
[63] DONG C,LOY C C,TANG X.Accelerating the super-resolution convolutional neural network[C]∥European Conference on Computer Vision.Springer International Publishing,2016:391-407.
[64] SHI W,CABALLERO J,HUSZR F,et al.Real-time single ima-ge and video super-resolution using an efficient sub-pixel convo-lutional neural network[C]∥Proceedings of the IEEE Conferen-ce on Computer Vision and Pattern Recognition.2016:1874-1883.
[65] KIM J,KWON LEE J,MU LEE K.Accurate image super-resolution using very deep convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654.
[66] LAI W S,HUANG J B,AHUJA N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[J].ar-Xiv:1704.03915,2017.
[1] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[2] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[3] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[4] DING Yu, WEI Hao, PAN Zhi-song, LIU Xin. Survey of Network Representation Learning [J]. Computer Science, 2020, 47(9): 52-59.
[5] HE Xin, XU Juan, JIN Ying-ying. Action-related Network:Towards Modeling Complete Changeable Action [J]. Computer Science, 2020, 47(9): 123-128.
[6] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[7] DENG Liang, XU Geng-lin, LI Meng-jie, CHEN Zhang-jin. Fast Face Recognition Based on Deep Learning and Multiple Hash Similarity Weighting [J]. Computer Science, 2020, 47(9): 163-168.
[8] BAO Yu-xuan, LU Tian-liang, DU Yan-hui. Overview of Deepfake Video Detection Technology [J]. Computer Science, 2020, 47(9): 283-292.
[9] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[10] WANG Wen-dao, WANG Run-ze, WEI Xin-lei, QI Yun-liang, MA Yi-de. Automatic Recognition of ECG Based on Stacked Bidirectional LSTM [J]. Computer Science, 2020, 47(7): 118-124.
[11] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[12] ZHANG Zhi-yang, ZHANG Feng-li, TAN Qi, WANG Rui-jin. Review of Information Cascade Prediction Methods Based on Deep Learning [J]. Computer Science, 2020, 47(7): 141-153.
[13] JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian. 4Bit-based Gradient Compression Method for Distributed Deep Learning System [J]. Computer Science, 2020, 47(7): 220-226.
[14] CHEN Jin-yin, ZHANG Dun-Jie, LIN Xiang, XU Xiao-dong and ZHU Zi-ling. False Message Propagation Suppression Based on Influence Maximization [J]. Computer Science, 2020, 47(6A): 17-23.
[15] CHENG Zhe, BAI Qian, ZHANG Hao, WANG Shi-pu and LIANG Yu. Improving Hi-C Data Resolution with Deep Convolutional Neural Networks [J]. Computer Science, 2020, 47(6A): 70-74.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .