基于AdaGrad+的自适应Heavy-Ball动量法及其最优个体收敛性

doi:10.11896/jsjkx.220900131

计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 220-226.doi: 10.11896/jsjkx.220900131

基于AdaGrad+的自适应Heavy-Ball动量法及其最优个体收敛性

韦洪旭¹, 陇盛², 陶蔚³, 陶卿¹

1 中国人民解放军陆军炮兵防空兵学院信息工程系合肥 230031
2 国防科技大学系统工程学院大数据与决策实验室长沙 410073
3 中国人民解放军军事科学院评估论证研究中心北京 100091

收稿日期:2022-09-14 修回日期:2023-07-05 出版日期:2023-11-15 发布日期:2023-11-06
通讯作者: 陶卿(qing.tao@ia.ac.cn)
作者简介:(1140064271@qq.com)

Adaptive Heavy-Ball Momentum Method Based on AdaGrad+ and Its Optimal Individual Convergence

WEI Hongxu¹, LONG Sheng², TAO Wei³, TAO Qing¹

1 Department of Information Engineering,Army Academy of Artillery and Air Defense of PLA,Hefei 230031,China
2 Laboratory for Big Data and Decision,College of Systems Engineering, National University of Defense Technology,Changsha 410073,China
3 Institute of Evaluation and Assessment Research,PLA Academy of Military Science,Beijing 100091,China

Received:2022-09-14 Revised:2023-07-05 Online:2023-11-15 Published:2023-11-06
About author:WEI Hongxu,born in 1998,postgra-duate.His main research interests include pattern recognition and machine learning.TAO Qing,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include pattern recognition,machine learning and applied mathematics.

摘要/Abstract

摘要： 自适应策略与动量法是提升优化算法性能的常用方法。目前自适应梯度方法大多采用AdaGrad型策略,但该策略在约束优化中效果不佳,为此,研究人员提出了更适用于处理约束问题的AdaGrad+方法,但其与SGD一样在非光滑凸情形下未达到最优个体收敛速率,结合NAG动量也并未达到预期的加速效果。针对上述问题,文中将AdaGrad+调整步长的策略与Heavy-Ball型动量法加速收敛的优点相结合,提出了一种基于AdaGrad+的自适应动量法。通过设置加权动量项、巧妙选取时变参数和灵活处理自适应矩阵,证明了该方法对于非光滑一般凸问题具有最优个体收敛速率。最后在l_∞∞范数约束下,通过求解典型的 hinge 损失函数优化问题验证了理论分析的正确性,通过深度卷积神经网络训练实验验证了该方法在实际应用中也具有良好性能。

关键词: 凸优化, 自适应策略, AdaGrad+, Heavy-Ball动量方法, 收敛速率

Abstract: Adaptive strategies and momentum methods are commonly used to improve the performance of optimization algorithms.Most of the adaptive gradient methods use the AdaGrad-type strategy at present.The AdaGrad+ method,which is more suitable for dealing with constrained problems,is proposed to solve the inefficiency of AdaGrad-type strategy on constrained optimization.But it is the same as SGD in non-smooth convex situations.The optimal individual convergence rate is not reached.Combining the strategy with NAG momentum but fail to achieve the expected acceleration effect.Aiming at the above problems,this paper proposes an adaptive momentum method based on AdaGrad+.The method uses the strategy of AdaGrad+ to adjust the step size,and inherits the advantages of the Heavy-Ball momentum method to accelerate the convergence.It is proved that the method achieves the optimal individual convergence rate for non-smooth convex problems by setting the weighted momentum term,selecting time-varying parameters skillfully and processing the adaptive matrix flexibly.Finally,experiments are conducted on the typical optimization problem of hinge loss function with l_∞ norm constraint,and the experiment results verify the correctness of the theoretical analysis.In addition,the deep learning experiments confirm that the proposed method also has good performance in practical applications.

Key words: Convex optimization, Adaptive strategy, AdaGrad+, Heavy-Ball momentum method, Convergence rate

中图分类号:

TP181

韦洪旭, 陇盛, 陶蔚, 陶卿. 基于AdaGrad+的自适应Heavy-Ball动量法及其最优个体收敛性[J]. 计算机科学, 2023, 50(11): 220-226. https://doi.org/10.11896/jsjkx.220900131

WEI Hongxu, LONG Sheng, TAO Wei, TAO Qing. Adaptive Heavy-Ball Momentum Method Based on AdaGrad+ and Its Optimal Individual Convergence[J]. Computer Science, 2023, 50(11): 220-226. https://doi.org/10.11896/jsjkx.220900131

参考文献

[1]ROBBINS H,MONRO S.A Stochastic Approximation Method[J].The Annals of Mathematical Statistics,1951,22(3):400-407.
[2]SUN T,LI D S,QUAN Z,et al.Heavy-ball Algorithms Always Escape Saddle Points[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence(IJCAI 2019).San Francisco:Morgan Kaufmann,2019:3520-3526.
[3]NESTEROV Y.A Method of Solving a Convex ProgrammingProblem with Convergence Rate O(1/k²)[J].Soviet Mathema-tics Doklady,1983,27(2):372-376.
[4]DUCHI J,HAZAN E,SINGER Y.Adaptive Subgradient Me-thods for Online Learning and Stochastic Optimization[J].Journal of Machine Learning Research,2011,12:2121-2159.
[5]TIELEMAN T,HINTON G.RMSProp:Divide the Gradient bya Running Average of its Recent Magnitude[D].Toronto:University of Toronto,2012.
[6]MATTHEW D.ZEILE R.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:1212.5701,2012.
[7]LEVY K Y.Online to offline conversions,universality and adaptive minibatch sizes[C]//Proceedings of the 31st Annual Conf.on Neural Information Processing Systems(NeurIPS).Cambridge,MA:MIT Press,2017:1613-1622.
[8]ALINE E,HUY L,ADRIAN V.Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7314-7321.
[9]POLYAK B T.Some Methods of Speeding up the Convergence of Iteration Methods[J].USSR Computational Mathematics and Mathematical Physics,1964,4(5):1-17.
[10]GHADIMI E,FEYZMAHDAVIAN H R,JOHANSSON M.GlobalConvergence of the Heavy-Ball Method for Convex Optimization[C]//Proceedings of the European Control Conference.Washington,USA:IEEE,2015:310-315.
[11]CHENG Y J,TAO W,LIU Y X,et al.Optimal Iindividual Convergence Rate of the Heavy-Ball-based Momentum Methods[J].Chinese Journal of Computer Research and Development,2019,56(8):1686-1694.
[12]TAO W,PAN Z S,CHU D J,et al.The Individual Convergence of Projected Subgradient Methods Using the Nesterov's Step-Size Strategy[J].Chinese Journal of Computers,2018,41(1):164-176.
[13]TAO W,PAN Z S,WU G W,et al.The Strength of Nesterov's Extrapolation in the Individual Convergence of Non smooth Optimization[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(7):2557-2568.
[14]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.2014.
[15]REDDI S J,KALE S,KUMAR S.On the Convergence of Adam and Beyond[J].arXiv:1904.09237,2019.
[16]TAO W,LONG S,WU G W,et al.The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-Ball Methods[J].arXiv:2102.07314,2021.
[17]ZHANG Z D,LONG S,BAO L,et al.AdaBelief Based Heavy-Ball Momentum Method[J].Pattern Recognition and Artificial Intelligence,2022,35(2):106-115.
[18]ZHUANG J T,TANG T,DING Y F,et al.AdaBelief Optimizer:Adapting Stepsizes by the Belief in Observed Gradients [J].arXiv:2010.07468,2020.
[19]ZINKEVICH M.Online Convex Programming and Generalized Infinitesimal Gradient Ascent[C]//Proceedings of the 20th International Conference on Machine Learning.New York:ACM,2003.
[20]AGARWAL A,BARTLETT P L,RAVIKUMAR P,et al.Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization[J].IEEE Transactions on Information Theory,2012,58(5):3235-3249.
[21]SHAMIR O,ZHANG T.Stochastic Gradient Descent for Non-smooth Optimization:Convergence Results and Optimal Averaging Schemes[C]//Proceedings of the 29th International Conference on Machine Learning.New York,USA:ACM,2013:71-79.
[22]ZOU F Y,SHEN L,JIE Z Q,et al.A Sufficient Condition for Convergences of Adam and RMSprop[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA,2019:11119-11127.
[23]SHALEV S S,SINGER Y,SREBRO N,et al.Pegasos:Primal estimated sub-gradient solver for svm[J].Mathematical Programming,2011,127(1):3-30.
[24]RAKHLIN A,SHAMIR O,SRIDHARAN K.Making gradient descent optimal for strongly convex stochastic optimization[C]//Proceedings s of the 29th International Conference on Machine Learning.New York,USA:ACM,2012:449-456.
[25]LIU J,JI S,YE J.SLEP:Sparse learning with efficient projections[D].Arizona:Arizona State University,2009.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于AdaGrad+的自适应Heavy-Ball动量法及其最优个体收敛性

Adaptive Heavy-Ball Momentum Method Based on AdaGrad+ and Its Optimal Individual Convergence

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0