深度学习优化算法研究

摘要/Abstract

摘要： 深度学习是机器学习领域热门的研究方向,深度学习中的训练和优化算法也受到了较高的关注和研究,已成为人工智能发展的重要推动力。基于卷积神经网络的基本结构,介绍了网络训练中激活函数和网络结构的选择、超参数的设置和优化算法,分析了各算法的优劣,并以Cifar-10数据集为训练样本进行了验证。实验结果表明,合适的训练方式和优化算法能够有效提高网络的准确性和收敛性。最后,在实际输电线图像识别中对最优算法进行了应用并取得了良好的效果。

关键词: 超参数, 激活函数, 卷积神经网络, 深度学习, 优化算法, 正则化

Abstract: Deep learning is a hot research field in machine learning.Training and optimization algorithm of deep lear-ning have also been high concern and studied,and has become an important driving force for the development of artificial intelligence.Based on the basic structure of convolution neural network,the selection of activation function,the setting of hyperparameters and optimization algorithms in network training were introduced in this paper.The advantages and disadvantages of each training and optimization algorithm were analyzed and verified by Cifar-10 data set as training samples.Experimental results show that the appropriate training methods and optimization algorithms can effectively improve the accuracy and convergence of the network.Finally,the optimal algorithm was applied in the image recognition of actual transmission line and achieved good result.

Key words: Activate function, Convolution neural network, Deep learning, Hyperparameter, Optimization algorithm, Regularization

中图分类号:

TP391

仝卫国, 李敏霞, 张一可. 深度学习优化算法研究[J]. 计算机科学, 2018, 45(11A): 155-159. https://doi.org/

TONG Wei-guo, LI Min-xia, ZHANG Yi-ke. Research on Optimization Algorithm of Deep Learning[J]. Computer Science, 2018, 45(11A): 155-159. https://doi.org/

参考文献

[1]刘建伟,刘媛,罗雄麟.深度学习研究进展[J].计算机应用研究,2014,31(7):1921-1930.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25(2):2012.
[3]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]∥International Conference on Machine Learning.Dasgupta andProceedings of the 30th International Conference on Machine Learning (ICML-13).2013:1139-1147.
[4]NAIR V,HINTON G E.Rectified Linear Units Improve Re-stricted Boltzmann Machines[C]∥International Conference on Machine Learning.Omnipress,2010:807-814.
[5]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[J].Journal of Machine Learning Research,2010,9:249-256.
[6]DAHL G E,SAINATH T N,HINTON G E.Improving deep neural networks for LVCSR using rectified linear units and dropout[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:8609-8613.
[7]GLOROT X,BORDES A,BENGIO Y.Deep Sparse Rectifier Neural Networks[J].Journal of Machine Learning Research,2010,15:315-323.
[8]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computer Science,2012,3(4):212-223.
[9]LE Q V,JIQUAN N,ADAM C,et al.On optimization methods for deep learning[C]∥Proceedings of the 28th International Conference on International Conference on Machine Learning.2011:265-272.
[10]DUCHI J,HAZAN E,SINGER Y.Adaptive Subgradient Me-thods for Online Learning and Stochastic Optimization[J].Journal of Machine Learning Research,2010,12(7):257-269.
[11]ZEILER M D.ADADELTA:An Adaptive Learning Rate Me-thod[J].arXiv:1212.5701.
[12]TIELEMANT,HINTON G.RMSProp:Divide the gradient by a running average of its recent magnitude[R].COURSERA:Neural Networks for Machine Learning.2012.
[13]KINGMA D,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.
[14]NESTEROV Y.A method of solving a convex programming problem with convergence rate O(1/k2)[J].Soviet Mathema-tics Doklady,1983(27):372-376.
[15] NESTEROV Y.Introductory lectures on convex optimization:a basic course[M].Applied Optimization Kluwer Academic,1998:119-120.
[16]GOODFELLOW I,BENGIO Y,COURVILLE A.深度学习[M].符天凡等,译.北京:人民邮电出版社,2017:240-249.
[17]FLETCHER R,POWELL M J D.A Rapidly Convergent De-scent Method for Minimization[J].Computer Journal,1963,6(6):163-168.
[18]MILLER S J.The Method of Least Squares[D].Brown University,2006:1-7.
[19]朱志鹏,喻芳,曾青霞,等.基于深度学习与偏最小二乘的分析方法及其医学应用[J].江西中医药大学报,2017,29(3):94-97.
[20]SEBER G A F,LEE A J.Linear regression analysis[M].Wiley,2012.
[21]CRISTIANINI N,SHAWER-TAYLOR J.An introduction to support vector machines and other kernel-based learning me-thods[M].Cambridge University Press,2000.
[22]KENNY V,NATHAL M,SALDANA S.Heuristic algorithms[C]∥ChE 345.Spring,2014.
[23]EISELT H A,SANDBLOM C L.Heuristic Algorithms[C]∥In-teger Programming and Network Models.2000:229-258.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[5]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[13]	陈俊, 何庆, 李守玉. 基于自适应反馈调节因子的阿基米德优化算法 Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor 计算机科学, 2022, 49(8): 237-246. https://doi.org/10.11896/jsjkx.210700150
[14]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[15]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed