计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 63-70.doi: 10.11896/jsjkx.201100032

• 智能计算 • 上一篇    下一篇

一种新的优化机制:Rain

刘华玲, 皮常鹏, 刘梦瑶, 汤新   

  1. 上海对外经贸大学统计与信息学院 上海201600
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 皮常鹏(2754759189@qq.com)
  • 作者简介:liuhl@suibe.edu.cn

New Optimization Mechanism:Rain

LIU Hua-ling, PI Chang-peng, LIU Meng-yao, TANG Xin   

  1. School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201600,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:LIU Hua-ling,born in 1964,Ph.D,professor.Her main research interests include privacy protection data mining and Internet financial intelligent monitoring.
    PI Chang-peng,born in 1996,postgra-duate.His main research interests include machine learning,deep learning and semantic recognition.

摘要: 在机器学习领域,传统模型的损失函数为凸函数,故具有全局最优解,通过传统的梯度下降算法可以求得最优解。但在深度学习领域,由于模型函数的隐式表达及同层神经元的可交换性,其损失函数为非凸函数,传统的梯度下降算法无法求得最优解,即使是较为先进的SGDM,Adam,Adagrad,RMSprop等优化算法也无法逃脱局部最优解的局限性,在收敛速度上虽然已经有很大的提升,但仍不能满足现实需求。现有的一系列优化算法都是针对已有优化算法的缺陷或局限性进行改进,优化效果有些许提升,但对于不同数据集的表现不一致。文中提出一种新的优化机制Rain,该机制结合深度神经网络中的Dropout机制,并融入到优化算法上得以实现。该机制并不是原有优化算法的改进版,而是独立于所有优化算法的第三方机制,但可以和所有优化算法搭配使用,从而提高其对于数据集的适应性。该机制旨在对模型在训练集上的表现进行优化,测试集上的泛化问题并不作为该机制的关注点。文中利用Deep Crossing和FM两个模型搭配5种优化算法,分别在Frappe和MovieLens两个数据集上进行实验,结果表明,加入Rain机制的模型在训练集上的损失函数值明显减小,且收敛速度加快,但其在测试集上的表现与原模型相差无几,即泛化性较差。

关键词: 深度学习, 优化算法, Dropout机制, Rain机制, 收敛速度

Abstract: The loss function of the traditional model in the field of machine learning is convex,so it has a global optimal solution.The optimal solution can be obtained through the traditional gradient descent algorithm (SGD).However,in the field of deep learning,due to the implicit expression of the model function and the interchangeability of neurons in the same layer,the loss function is a non-convex function.Traditional gradient descent algorithms cannot find the optimal solution,even the more advanced optimization algorithms such as SGDM,Adam,Adagrad,and RMSprop cannot escape the limitations of local optimal solutions.Although the convergence speed has been greatly improved,they still cannot meet the actual needs.A series of existing optimization algorithms are improved based on the defects or limitations of the previous optimization algorithms,and the optimization effect is slightly improved,but the performance of different data sets is inconsistent.This article proposes a new optimization mechanism Rain,which combines the dropout mechanism in deep neural networks and integrates it into the optimization algorithm to achieve.This mechanism is not an improved version of the original optimization algorithm.It is a third-party mechanism independent of all optimization algorithms,but it can be used in combination with all optimization algorithms to improve its adaptability to data sets.This mechanism aims to optimize the performance of the model on the training set.The generalization problem on the test set is not the focus of this mechanism.This article uses Deep Crossing and FM two models with five optimization algorithms to conduct experiments on the Frappe and MovieLens data sets respectively.The results show that the model with the Rain mechanism has a significant reduction in the loss function value on the training set,and the convergence speed is accelerated,but its performance on the test set is almost the same as the original model,that is,its generalization is poor.

Key words: Deep learning, Optimization algorithm, Dropout mechanism, Rain mechanism, Convergence speed

中图分类号: 

  • TP391
[1]MITCHELL,TOM M.Machine learning[M].McGraw-Hill,1997.
[2]SILVER D,HUANG A,MADDISON C.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[3]HUANG Y,LIU D Y,HUANG K,et al.Overview on deeplearning[J].CAAI Transactions on Intelligent Systems,2019,1(14):1-19.
[4]QIU Y P.Neural networks and deep learning[M].Beijing:China Machine Press,2020.
[5]DAUPHIN Y,PASCANU R,GULCEHRE C,et al.Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[J].Advances in Neural Information Processing Systems,2014,27.
[6]LE Q V,NGIAM J,COATES A,et al.On optimization methods for deep learning[C]//International Conference on Machine Learning.2011.
[7]RUDER S.An overview of gradient descent optimization algorithms[Z].2016.
[8]YOUSOFF S N M,BAHARIN A,ABDULLAH A.A review on optimization algorithm for deep learning method in bioinformatics field[C]//2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES).2017.
[9]FLETCHER R.Practical methods of optimization[J].Journal of the Operational Research Society,2013,33(7):675-676.
[10]NOCEDAL J.Updating quasi-newton matrices with limitedstorage[J].Mathematics of Computation,1980,35(151):773-782.
[11]GOYAL P,DOLLAR P,GIRSHICK R,et al.Accurate,largeminibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017.
[12]LOSHCHILOV I,HUTTER F.Sgdr:Stochastic gradient de-scent with warm restarts[C]//International Joint Conference on Artificial Intelligence.2016.
[13]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7).
[14]TIELEMAN T,HINTON G.Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning.2012.
[15]ZEILER D M.Adadelta:An adaptive learning rate method[C]//International Joint Conference on Artificial Intelligence.2012.
[16]QIAN N.On the momentum term in gradient descent learning algorithms[J].Neural Netw,1999,12(1):145-151.
[17]NESTEROV Y.Gradient methods for minimizing compositefunctions[J].Mathematical Programming,2013,140(1):125-161.
[18]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.2013.
[19]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty oftraining recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013.
[20]KINGMA D,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[21]REDDI S J,KALE S,KUMAR S.On the convergence of adam and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[22]LUO L,XIONG Y,LIU Y,et al.Adaptive gradient methodswith dynamic bound of learning rate[C]//Proceedings of the International Conference on Learning Representations.2019.
[23]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back[C]//Proceedings of the Neural Information Processing Systems.2019.
[24]LIU L,JIANG H,HE P,et al.On the variance of the adaptive learning rate and beyond[C]//Proceedings of the International Conference on Learning Representations.2019.
[25]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[26]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computerence,2012,3(4):212-223.
[27]RENDLE S.Factorization machines[C]//IEEE InternationalConference on Data Mining.2011.
[28]SHAN Y,HOENS T R,JIAO J,et al.Deep crossing:Web-scale modeling without manually crafted combinatorial features[C]//22nd ACMSIGKDD International Conference.2016.
[29]XIAO J,YE H,HE X,et al.Attentional factorization machines:Learning the weight of feature interactions via attention networks[C]//26th International Joint Conference on Artificial Intelligence.2017.
[1] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[2] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[3] 钱梦薇, 过弋. 融合偏置深度学习的距离分解Top-N推荐算法[J]. 计算机科学, 2021, 48(9): 103-109.
[4] 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134.
[5] 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152.
[6] 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法[J]. 计算机科学, 2021, 48(9): 174-180.
[7] 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186.
[8] 田野, 陈宏巍, 王法胜, 陈兴文. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234.
[9] 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256.
[10] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23.
[11] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究[J]. 计算机科学, 2021, 48(8): 80-85.
[12] 郭琳, 李晨, 陈晨, 赵睿, 范仕霖, 徐星雨. 基于通道注意递归残差网络的图像超分辨率重建[J]. 计算机科学, 2021, 48(8): 139-144.
[13] 刘帅, 芮挺, 胡育成, 杨成松, 王东. 基于深度学习SuperGlue算法的单目视觉里程计[J]. 计算机科学, 2021, 48(8): 157-161.
[14] 王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法[J]. 计算机科学, 2021, 48(8): 162-168.
[15] 田嵩旺, 蔺素珍, 杨博. 基于多判别器的多波段图像自监督融合方法[J]. 计算机科学, 2021, 48(8): 185-190.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 冯芙蓉, 张兆功. 目标轮廓检测技术新进展[J]. 计算机科学, 2021, 48(6A): 1 -9 .
[2] 孙正, 张小雪. 生物光声成像中声反射伪影抑制方法的研究进展[J]. 计算机科学, 2021, 48(6A): 10 -14 .
[3] 周欣, 刘硕迪, 潘薇, 陈媛媛. 自然交通场景中的车辆颜色识别[J]. 计算机科学, 2021, 48(6A): 15 -20 .
[4] 黄雪冰, 魏佳艺, 沈文宇, 凌力. 基于自适应加权重复值滤波和同态滤波的MR图像增强[J]. 计算机科学, 2021, 48(6A): 21 -27 .
[5] 江妍, 马瑜, 梁远哲, 王原, 李光昊, 马鼎. 基于分数阶麻雀搜索优化OTSU肺组织分割算法[J]. 计算机科学, 2021, 48(6A): 28 -32 .
[6] 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13 -23 .
[7] 周文辉, 石敏, 朱登明, 周军. 基于残差注意力网络的地震数据超分辨率方法[J]. 计算机科学, 2021, 48(8): 24 -31 .
[8] 朝乐门, 尹显龙. 人工智能治理理论及系统的现状与趋势[J]. 计算机科学, 2021, 48(9): 1 -8 .
[9] 雷羽潇, 段玉聪. 面向跨模态隐私保护的AI治理法律技术化框架[J]. 计算机科学, 2021, 48(9): 9 -20 .
[10] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .