一种新的优化机制:Rain

doi:10.11896/jsjkx.201100032

摘要/Abstract

摘要： 在机器学习领域,传统模型的损失函数为凸函数,故具有全局最优解,通过传统的梯度下降算法可以求得最优解。但在深度学习领域,由于模型函数的隐式表达及同层神经元的可交换性,其损失函数为非凸函数,传统的梯度下降算法无法求得最优解,即使是较为先进的SGDM,Adam,Adagrad,RMSprop等优化算法也无法逃脱局部最优解的局限性,在收敛速度上虽然已经有很大的提升,但仍不能满足现实需求。现有的一系列优化算法都是针对已有优化算法的缺陷或局限性进行改进,优化效果有些许提升,但对于不同数据集的表现不一致。文中提出一种新的优化机制Rain,该机制结合深度神经网络中的Dropout机制,并融入到优化算法上得以实现。该机制并不是原有优化算法的改进版,而是独立于所有优化算法的第三方机制,但可以和所有优化算法搭配使用,从而提高其对于数据集的适应性。该机制旨在对模型在训练集上的表现进行优化,测试集上的泛化问题并不作为该机制的关注点。文中利用Deep Crossing和FM两个模型搭配5种优化算法,分别在Frappe和MovieLens两个数据集上进行实验,结果表明,加入Rain机制的模型在训练集上的损失函数值明显减小,且收敛速度加快,但其在测试集上的表现与原模型相差无几,即泛化性较差。

关键词: Dropout机制, Rain机制, 深度学习, 收敛速度, 优化算法

Abstract: The loss function of the traditional model in the field of machine learning is convex,so it has a global optimal solution.The optimal solution can be obtained through the traditional gradient descent algorithm (SGD).However,in the field of deep learning,due to the implicit expression of the model function and the interchangeability of neurons in the same layer,the loss function is a non-convex function.Traditional gradient descent algorithms cannot find the optimal solution,even the more advanced optimization algorithms such as SGDM,Adam,Adagrad,and RMSprop cannot escape the limitations of local optimal solutions.Although the convergence speed has been greatly improved,they still cannot meet the actual needs.A series of existing optimization algorithms are improved based on the defects or limitations of the previous optimization algorithms,and the optimization effect is slightly improved,but the performance of different data sets is inconsistent.This article proposes a new optimization mechanism Rain,which combines the dropout mechanism in deep neural networks and integrates it into the optimization algorithm to achieve.This mechanism is not an improved version of the original optimization algorithm.It is a third-party mechanism independent of all optimization algorithms,but it can be used in combination with all optimization algorithms to improve its adaptability to data sets.This mechanism aims to optimize the performance of the model on the training set.The generalization problem on the test set is not the focus of this mechanism.This article uses Deep Crossing and FM two models with five optimization algorithms to conduct experiments on the Frappe and MovieLens data sets respectively.The results show that the model with the Rain mechanism has a significant reduction in the loss function value on the training set,and the convergence speed is accelerated,but its performance on the test set is almost the same as the original model,that is,its generalization is poor.

Key words: Convergence speed, Deep learning, Dropout mechanism, Optimization algorithm, Rain mechanism

中图分类号:

TP391

刘华玲, 皮常鹏, 刘梦瑶, 汤新. 一种新的优化机制:Rain[J]. 计算机科学, 2021, 48(11A): 63-70. https://doi.org/10.11896/jsjkx.201100032

LIU Hua-ling, PI Chang-peng, LIU Meng-yao, TANG Xin. New Optimization Mechanism:Rain[J]. Computer Science, 2021, 48(11A): 63-70. https://doi.org/10.11896/jsjkx.201100032

参考文献

[1]MITCHELL,TOM M.Machine learning[M].McGraw-Hill,1997.
[2]SILVER D,HUANG A,MADDISON C.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[3]HUANG Y,LIU D Y,HUANG K,et al.Overview on deeplearning[J].CAAI Transactions on Intelligent Systems,2019,1(14):1-19.
[4]QIU Y P.Neural networks and deep learning[M].Beijing:China Machine Press,2020.
[5]DAUPHIN Y,PASCANU R,GULCEHRE C,et al.Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[J].Advances in Neural Information Processing Systems,2014,27.
[6]LE Q V,NGIAM J,COATES A,et al.On optimization methods for deep learning[C]//International Conference on Machine Learning.2011.
[7]RUDER S.An overview of gradient descent optimization algorithms[Z].2016.
[8]YOUSOFF S N M,BAHARIN A,ABDULLAH A.A review on optimization algorithm for deep learning method in bioinformatics field[C]//2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES).2017.
[9]FLETCHER R.Practical methods of optimization[J].Journal of the Operational Research Society,2013,33(7):675-676.
[10]NOCEDAL J.Updating quasi-newton matrices with limitedstorage[J].Mathematics of Computation,1980,35(151):773-782.
[11]GOYAL P,DOLLAR P,GIRSHICK R,et al.Accurate,largeminibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017.
[12]LOSHCHILOV I,HUTTER F.Sgdr:Stochastic gradient de-scent with warm restarts[C]//International Joint Conference on Artificial Intelligence.2016.
[13]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7).
[14]TIELEMAN T,HINTON G.Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning.2012.
[15]ZEILER D M.Adadelta:An adaptive learning rate method[C]//International Joint Conference on Artificial Intelligence.2012.
[16]QIAN N.On the momentum term in gradient descent learning algorithms[J].Neural Netw,1999,12(1):145-151.
[17]NESTEROV Y.Gradient methods for minimizing compositefunctions[J].Mathematical Programming,2013,140(1):125-161.
[18]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.2013.
[19]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty oftraining recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013.
[20]KINGMA D,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[21]REDDI S J,KALE S,KUMAR S.On the convergence of adam and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[22]LUO L,XIONG Y,LIU Y,et al.Adaptive gradient methodswith dynamic bound of learning rate[C]//Proceedings of the International Conference on Learning Representations.2019.
[23]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back[C]//Proceedings of the Neural Information Processing Systems.2019.
[24]LIU L,JIANG H,HE P,et al.On the variance of the adaptive learning rate and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[25]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[26]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computerence,2012,3(4):212-223.
[27]RENDLE S.Factorization machines[C]//IEEE InternationalConference on Data Mining.2011.
[28]SHAN Y,HOENS T R,JIAO J,et al.Deep crossing:Web-scale modeling without manually crafted combinatorial features[C]//22nd ACMSIGKDD International Conference.2016.
[29]XIAO J,YE H,HE X,et al.Attentional factorization machines:Learning the weight of feature interactions via attention networks[C]//26^th International Joint Conference on Artificial Intelligence.2017.

相关文章 15

[1]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8]	陈俊, 何庆, 李守玉. 基于自适应反馈调节因子的阿基米德优化算法 Archimedes Optimization Algorithm Based on Adaptive Feedback Adjustment Factor 计算机科学, 2022, 49(8): 237-246. https://doi.org/10.11896/jsjkx.210700150
[9]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[10]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[11]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[12]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14]	刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[15]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed