基于AdaGrad+的自适应Heavy-Ball动量法及其最优个体收敛性

doi:10.11896/jsjkx.220900131

Abstract

Abstract: Adaptive strategies and momentum methods are commonly used to improve the performance of optimization algorithms.Most of the adaptive gradient methods use the AdaGrad-type strategy at present.The AdaGrad+ method,which is more suitable for dealing with constrained problems,is proposed to solve the inefficiency of AdaGrad-type strategy on constrained optimization.But it is the same as SGD in non-smooth convex situations.The optimal individual convergence rate is not reached.Combining the strategy with NAG momentum but fail to achieve the expected acceleration effect.Aiming at the above problems,this paper proposes an adaptive momentum method based on AdaGrad+.The method uses the strategy of AdaGrad+ to adjust the step size,and inherits the advantages of the Heavy-Ball momentum method to accelerate the convergence.It is proved that the method achieves the optimal individual convergence rate for non-smooth convex problems by setting the weighted momentum term,selecting time-varying parameters skillfully and processing the adaptive matrix flexibly.Finally,experiments are conducted on the typical optimization problem of hinge loss function with l_∞ norm constraint,and the experiment results verify the correctness of the theoretical analysis.In addition,the deep learning experiments confirm that the proposed method also has good performance in practical applications.

Key words: Convex optimization, Adaptive strategy, AdaGrad+, Heavy-Ball momentum method, Convergence rate

CLC Number:

TP181

WEI Hongxu, LONG Sheng, TAO Wei, TAO Qing. Adaptive Heavy-Ball Momentum Method Based on AdaGrad+ and Its Optimal Individual Convergence[J].Computer Science, 2023, 50(11): 220-226.

References

[1]ROBBINS H,MONRO S.A Stochastic Approximation Method[J].The Annals of Mathematical Statistics,1951,22(3):400-407.
[2]SUN T,LI D S,QUAN Z,et al.Heavy-ball Algorithms Always Escape Saddle Points[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence(IJCAI 2019).San Francisco:Morgan Kaufmann,2019:3520-3526.
[3]NESTEROV Y.A Method of Solving a Convex ProgrammingProblem with Convergence Rate O(1/k²)[J].Soviet Mathema-tics Doklady,1983,27(2):372-376.
[4]DUCHI J,HAZAN E,SINGER Y.Adaptive Subgradient Me-thods for Online Learning and Stochastic Optimization[J].Journal of Machine Learning Research,2011,12:2121-2159.
[5]TIELEMAN T,HINTON G.RMSProp:Divide the Gradient bya Running Average of its Recent Magnitude[D].Toronto:University of Toronto,2012.
[6]MATTHEW D.ZEILE R.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:1212.5701,2012.
[7]LEVY K Y.Online to offline conversions,universality and adaptive minibatch sizes[C]//Proceedings of the 31st Annual Conf.on Neural Information Processing Systems(NeurIPS).Cambridge,MA:MIT Press,2017:1613-1622.
[8]ALINE E,HUY L,ADRIAN V.Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7314-7321.
[9]POLYAK B T.Some Methods of Speeding up the Convergence of Iteration Methods[J].USSR Computational Mathematics and Mathematical Physics,1964,4(5):1-17.
[10]GHADIMI E,FEYZMAHDAVIAN H R,JOHANSSON M.GlobalConvergence of the Heavy-Ball Method for Convex Optimization[C]//Proceedings of the European Control Conference.Washington,USA:IEEE,2015:310-315.
[11]CHENG Y J,TAO W,LIU Y X,et al.Optimal Iindividual Convergence Rate of the Heavy-Ball-based Momentum Methods[J].Chinese Journal of Computer Research and Development,2019,56(8):1686-1694.
[12]TAO W,PAN Z S,CHU D J,et al.The Individual Convergence of Projected Subgradient Methods Using the Nesterov's Step-Size Strategy[J].Chinese Journal of Computers,2018,41(1):164-176.
[13]TAO W,PAN Z S,WU G W,et al.The Strength of Nesterov's Extrapolation in the Individual Convergence of Non smooth Optimization[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(7):2557-2568.
[14]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.2014.
[15]REDDI S J,KALE S,KUMAR S.On the Convergence of Adam and Beyond[J].arXiv:1904.09237,2019.
[16]TAO W,LONG S,WU G W,et al.The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-Ball Methods[J].arXiv:2102.07314,2021.
[17]ZHANG Z D,LONG S,BAO L,et al.AdaBelief Based Heavy-Ball Momentum Method[J].Pattern Recognition and Artificial Intelligence,2022,35(2):106-115.
[18]ZHUANG J T,TANG T,DING Y F,et al.AdaBelief Optimizer:Adapting Stepsizes by the Belief in Observed Gradients [J].arXiv:2010.07468,2020.
[19]ZINKEVICH M.Online Convex Programming and Generalized Infinitesimal Gradient Ascent[C]//Proceedings of the 20th International Conference on Machine Learning.New York:ACM,2003.
[20]AGARWAL A,BARTLETT P L,RAVIKUMAR P,et al.Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization[J].IEEE Transactions on Information Theory,2012,58(5):3235-3249.
[21]SHAMIR O,ZHANG T.Stochastic Gradient Descent for Non-smooth Optimization:Convergence Results and Optimal Averaging Schemes[C]//Proceedings of the 29th International Conference on Machine Learning.New York,USA:ACM,2013:71-79.
[22]ZOU F Y,SHEN L,JIE Z Q,et al.A Sufficient Condition for Convergences of Adam and RMSprop[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA,2019:11119-11127.
[23]SHALEV S S,SINGER Y,SREBRO N,et al.Pegasos:Primal estimated sub-gradient solver for svm[J].Mathematical Programming,2011,127(1):3-30.
[24]RAKHLIN A,SHAMIR O,SRIDHARAN K.Making gradient descent optimal for strongly convex stochastic optimization[C]//Proceedings s of the 29th International Conference on Machine Learning.New York,USA:ACM,2012:449-456.
[25]LIU J,JI S,YE J.SLEP:Sparse learning with efficient projections[D].Arizona:Arizona State University,2009.

Related Articles 15

[1]	XU Jie, ZHOU Xinzhi. Multi-elite Interactive Learning Based Particle Swarm Optimization Algorithm with Adaptive Bound-handling Technique [J]. Computer Science, 2023, 50(11): 210-219.
[2]	YU Xin, LIN Zhi-liang. Novel Neural Network for Dealing with a Kind of Non-smooth Pseudoconvex Optimization Problems [J]. Computer Science, 2022, 49(5): 227-234.
[3]	LIN Zhong-fu, YAN Li, HUANG Wei, LI Jie. Improved Crow Search Algorithm Based on Parameter Adaptive Strategy [J]. Computer Science, 2021, 48(6A): 260-263.
[4]	HUANG Xin-quan, LIU Ai-jun, LIANG Xiao-hu, WANG Heng. Matrix Theory Aided Convergence Analysis of Consensus Behavior in FANET with Beacon Loss [J]. Computer Science, 2021, 48(6): 288-295.
[5]	ZHOU Chuan. Optimization of Sharing Bicycle Density Distribution Based on Improved Salp Swarm Algorithm [J]. Computer Science, 2021, 48(11A): 106-110.
[6]	ZOU Hai-tao, ZHENG Shang, WANG Qi, YU Hua-long and GAO Shang. Adaptive High-order Rating Distance Recommendation Model Based on Newton Optimization [J]. Computer Science, 2020, 47(6A): 494-499.
[7]	DONG Ming-gang,LIU Bao,JING Chao. Multi-objective Differential Evolution Algorithm with Fuzzy Adaptive Ranking-based Mutation [J]. Computer Science, 2019, 46(7): 224-232.
[8]	CHEN Jin-yin, HU Ke-ke,LI Yu-wei. Research on UAV Multi-point Navigation Algorithm Based on MB-RRT^* [J]. Computer Science, 2018, 45(6A): 85-90.
[9]	TIAN Jia-qiang, CHEN Yong and ZHANG Jian-zhao. Tradeoff Optimization of Spectrum Opportunity Discovery and Licensed User Protection in Dynamic Spectrum Management [J]. Computer Science, 2018, 45(3): 98-101.
[10]	JIN Jia-pei, CHEN Xiao-diao, SHI Jia-er and CHEN Li-geng. Reparameterization-based Clipping Method for Root-finding Problem of Non-linear Equations [J]. Computer Science, 2018, 45(3): 63-66.
[11]	ZHANG Yue, SUN Hui-xiang, WEI Zheng-lei and HAN Bo. Chaotic Gray Wolf Optimization Algorithm with Adaptive Adjustment Strategy [J]. Computer Science, 2017, 44(Z11): 119-122.
[12]	CHEN Jin-yin, LI Yu-wei and DU Wen-yao. UAV Navigation Algorithm Research Based on EB-RRT* [J]. Computer Science, 2017, 44(Z11): 72-79.
[13]	YANG Fan, ZHANG Xiao-song and MING Yong. Research on Resource Allocation Based on Noncooperation Game for OFDMA-WLAN System [J]. Computer Science, 2016, 43(Z6): 319-321.
[14]	XUE Wei, ZHANG Wen-sheng and REN Jun-hong. Online Learning Based on Stochastic Spectral Gradient [J]. Computer Science, 2016, 43(9): 47-51.
[15]	YU Jie, LI Qiang, LI Sha-sha, MA Jun and LI Zhou-jun. Hybrid Method for Capturing Snapshot of Routing Table of DHT Networks [J]. Computer Science, 2015, 42(Z6): 263-265.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Adaptive Heavy-Ball Momentum Method Based on AdaGrad+ and Its Optimal Individual Convergence

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0