计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 63-70.doi: 10.11896/jsjkx.201100032
刘华玲, 皮常鹏, 刘梦瑶, 汤新
LIU Hua-ling, PI Chang-peng, LIU Meng-yao, TANG Xin
摘要: 在机器学习领域,传统模型的损失函数为凸函数,故具有全局最优解,通过传统的梯度下降算法可以求得最优解。但在深度学习领域,由于模型函数的隐式表达及同层神经元的可交换性,其损失函数为非凸函数,传统的梯度下降算法无法求得最优解,即使是较为先进的SGDM,Adam,Adagrad,RMSprop等优化算法也无法逃脱局部最优解的局限性,在收敛速度上虽然已经有很大的提升,但仍不能满足现实需求。现有的一系列优化算法都是针对已有优化算法的缺陷或局限性进行改进,优化效果有些许提升,但对于不同数据集的表现不一致。文中提出一种新的优化机制Rain,该机制结合深度神经网络中的Dropout机制,并融入到优化算法上得以实现。该机制并不是原有优化算法的改进版,而是独立于所有优化算法的第三方机制,但可以和所有优化算法搭配使用,从而提高其对于数据集的适应性。该机制旨在对模型在训练集上的表现进行优化,测试集上的泛化问题并不作为该机制的关注点。文中利用Deep Crossing和FM两个模型搭配5种优化算法,分别在Frappe和MovieLens两个数据集上进行实验,结果表明,加入Rain机制的模型在训练集上的损失函数值明显减小,且收敛速度加快,但其在测试集上的表现与原模型相差无几,即泛化性较差。
中图分类号:
[1]MITCHELL,TOM M.Machine learning[M].McGraw-Hill,1997. [2]SILVER D,HUANG A,MADDISON C.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [3]HUANG Y,LIU D Y,HUANG K,et al.Overview on deeplearning[J].CAAI Transactions on Intelligent Systems,2019,1(14):1-19. [4]QIU Y P.Neural networks and deep learning[M].Beijing:China Machine Press,2020. [5]DAUPHIN Y,PASCANU R,GULCEHRE C,et al.Identifying and attacking the saddle point problem in high-dimensional non-convex optimization[J].Advances in Neural Information Processing Systems,2014,27. [6]LE Q V,NGIAM J,COATES A,et al.On optimization methods for deep learning[C]//International Conference on Machine Learning.2011. [7]RUDER S.An overview of gradient descent optimization algorithms[Z].2016. [8]YOUSOFF S N M,BAHARIN A,ABDULLAH A.A review on optimization algorithm for deep learning method in bioinformatics field[C]//2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES).2017. [9]FLETCHER R.Practical methods of optimization[J].Journal of the Operational Research Society,2013,33(7):675-676. [10]NOCEDAL J.Updating quasi-newton matrices with limitedstorage[J].Mathematics of Computation,1980,35(151):773-782. [11]GOYAL P,DOLLAR P,GIRSHICK R,et al.Accurate,largeminibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017. [12]LOSHCHILOV I,HUTTER F.Sgdr:Stochastic gradient de-scent with warm restarts[C]//International Joint Conference on Artificial Intelligence.2016. [13]DUCHI J,HAZAN E,SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research,2011,12(7). [14]TIELEMAN T,HINTON G.Lecture 6.5-rmsprop:Divide the gradient by a running average of its recent magnitude[C]//COURSERA:Neural Networks for Machine Learning.2012. [15]ZEILER D M.Adadelta:An adaptive learning rate method[C]//International Joint Conference on Artificial Intelligence.2012. [16]QIAN N.On the momentum term in gradient descent learning algorithms[J].Neural Netw,1999,12(1):145-151. [17]NESTEROV Y.Gradient methods for minimizing compositefunctions[J].Mathematical Programming,2013,140(1):125-161. [18]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.2013. [19]PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty oftraining recurrent neural networks[C]//Proceedings of the International Conference on Machine Learning.2013. [20]KINGMA D,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014. [21]REDDI S J,KALE S,KUMAR S.On the convergence of adam and beyond[C]//Proceedings of the International Conference on Learning Representations.2019. [22]LUO L,XIONG Y,LIU Y,et al.Adaptive gradient methodswith dynamic bound of learning rate[C]//Proceedings of the International Conference on Learning Representations.2019. [23]ZHANG M R,LUCAS J,HINTON G,et al.Lookahead optimizer:k steps forward,1 step back[C]//Proceedings of the Neural Information Processing Systems.2019. [24]LIU L,JIANG H,HE P,et al.On the variance of the adaptive learning rate and beyond[C]//Proceedings of the International Conference on Learning Representations.2019. [25]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536. [26]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].Computerence,2012,3(4):212-223. [27]RENDLE S.Factorization machines[C]//IEEE InternationalConference on Data Mining.2011. [28]SHAN Y,HOENS T R,JIAO J,et al.Deep crossing:Web-scale modeling without manually crafted combinatorial features[C]//22nd ACMSIGKDD International Conference.2016. [29]XIAO J,YE H,HE X,et al.Attentional factorization machines:Learning the weight of feature interactions via attention networks[C]//26th International Joint Conference on Artificial Intelligence.2017. |
[1] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35. |
[2] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58. |
[3] | 钱梦薇, 过弋. 融合偏置深度学习的距离分解Top-N推荐算法[J]. 计算机科学, 2021, 48(9): 103-109. |
[4] | 徐涛, 田崇阳, 刘才华. 基于深度学习的人群异常行为检测综述[J]. 计算机科学, 2021, 48(9): 125-134. |
[5] | 张新峰, 宋博. 一种基于改进三元组损失和特征融合的行人重识别方法[J]. 计算机科学, 2021, 48(9): 146-152. |
[6] | 林椹尠, 张梦凯, 吴成茂, 郑兴宁. 利用生成对抗网络的人脸图像分步补全法[J]. 计算机科学, 2021, 48(9): 174-180. |
[7] | 黄晓生, 徐静. 基于PCANet的非下采样剪切波域多聚焦图像融合[J]. 计算机科学, 2021, 48(9): 181-186. |
[8] | 田野, 陈宏巍, 王法胜, 陈兴文. 室内移动机器人的SLAM算法综述[J]. 计算机科学, 2021, 48(9): 223-234. |
[9] | 谢良旭, 李峰, 谢建平, 许晓军. 基于融合神经网络模型的药物分子性质预测[J]. 计算机科学, 2021, 48(9): 251-256. |
[10] | 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述[J]. 计算机科学, 2021, 48(8): 13-23. |
[11] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究[J]. 计算机科学, 2021, 48(8): 80-85. |
[12] | 郭琳, 李晨, 陈晨, 赵睿, 范仕霖, 徐星雨. 基于通道注意递归残差网络的图像超分辨率重建[J]. 计算机科学, 2021, 48(8): 139-144. |
[13] | 刘帅, 芮挺, 胡育成, 杨成松, 王东. 基于深度学习SuperGlue算法的单目视觉里程计[J]. 计算机科学, 2021, 48(8): 157-161. |
[14] | 王施云, 杨帆. 基于U-Net特征融合优化策略的遥感影像语义分割方法[J]. 计算机科学, 2021, 48(8): 162-168. |
[15] | 田嵩旺, 蔺素珍, 杨博. 基于多判别器的多波段图像自监督融合方法[J]. 计算机科学, 2021, 48(8): 185-190. |
|