计算机科学 ›› 2016, Vol. 43 ›› Issue (9): 47-51.doi: 10.11896/j.issn.1002-137X.2016.09.008

• 2015 年第三届CCF 大数据学术会议 • 上一篇    下一篇

基于随机谱梯度的在线学习

薛伟,张文生,任俊宏   

  1. 南京理工大学计算机科学与工程学院 南京210094,南京理工大学计算机科学与工程学院 南京210094;中国科学院自动化研究所 北京100190,北京航空航天大学软件学院 北京100191
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金面上项目(61472423),国家自然科学基金重点项目(U1135005,8),江苏省普通高校研究生科研创新计划项目(KYZZ15_0123)资助

Online Learning Based on Stochastic Spectral Gradient

XUE Wei, ZHANG Wen-sheng and REN Jun-hong   

  • Online:2018-12-01 Published:2018-12-01

摘要: 考虑一类学习问题,问题的目标函数可表示为大量组函数的平均,并且假设每一个组件函数都是光滑的。在众多机器学习方法中,在线学习操作流程简洁、收敛速度快,而且可以实现模型的自动更新,为大数据的学习提供了有利的工具。针对这类问题,提出了一种基于随机谱梯度下降(Stochastic Spectral Gradient Descent,S2GD)的在线学习方法。该方法利用Rayleigh商收集目标函数的二阶信息来构造Hessian阵逆的近似。S2GD方法可以看作是谱梯度方法从确定性优化到随机优化的延伸。算法每次迭代所产生的搜索方向具有下降性,且现有结论表明算法收敛。在LIBSVM数据库上的初步实验表明S2GD方法是可行的、有效的。

关键词: 在线学习,随机优化,凸优化,随机梯度,谱梯度

Abstract: We considered a type of learning problems whose objective functions can be formulated as an average of a large number of component functions,and assumed that each component function is smooth.Among many machine learning methods,online learning provides a favorable tool for big data learning,not only due to its simple operation and fast convergence rate,but also for its ability to automatically update model.To solve such problems,we developed a stochastic spectral gradient descent (S2GD) method,which employs the Rayleigh quotient to collect second-order information to construct Hessian inverse approximations.S2GD can be viewed as an approach that extends the spectral gradient method working in deterministic setting to stochastic setting.At each iteration,the generated search direction guarantees descent property.The existing conclusion indicates that the S2GD method is of convergence.Preliminary experimental results on LIBSVM data sets are reported to demonstrate the feasibility and effectiveness of our approach.

Key words: Online learning,Stochastic optimization,Convex optimization,Stochastic gradient,Spectral gradient

[1] Vapnik V.The Nature of Statistical Learning Theory[M].New York:Springer,2000
[2] 李航.统计学习方法[M].北京:清华大学出版社,2012
[3] Bottou L.Stochastic Learning[M]∥Advanced Lectures on Machine Learning.Springer Berlin Heidelberg,2004:146-168
[4] Bottou L.Large-scale machine learning with stochastic gradient descent[C]∥Proceedings of Computational Statistics 2010.Physica-Verlag HD,2010:177-186
[5] Zhang Tong.Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]∥Proceedings of the 21st International Conference on Machine Learning.2004:919-926
[6] Robbins H,Monro S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407
[7] Powell W B.Approximate Dynamic Programming:Solving theCurses of Dimensionality[M].John Wiley & Sons,2007
[8] Johnson R,Zhang Tong.Accelerating stochastic gradient de-scent using predictive variance reduction[C]∥Advances in Neural Information Processing Systems.2013:315-323
[9] Konecˇny` J,Richtárik P.Semi-stochastic gradient descent methods.http://arxiv.org/pdf/1312.1666v2.pdf
[10] Le Roux N,Schmidt M,Bach F.A stochastic gradient method with an exponential convergence rate for finite training [C]∥Advances in Neural Information Processing Systems.2012:2663-2671
[11] Blatt D,Hero A O,Gauchman H.A convergent incremental gradient method with a constant step size [J].SIAM Journal on Optimization,2007,18(1):29-51
[12] Bordes A,Bottou L,Gallinari P.SGD-QN:careful quasi-Newton stochastic gradient descent[J].Journal of Machine Learning Research,2009,10:1737-1754
[13] Mokhtari A,Ribeiro A.A dual stochastic DFP algorithm for optimal resource allocation in wireless systems[C]∥IEEE 14th Workshop on Signal Processing Advances in Wireless Communications.2013:21-25
[14] Mokhtari A.Ribeiro A.Regularized stochastic BFGS algorithm[C]∥IEEE Global Conference on Signal and Information Processing.2013:1109-1112
[15] Schraudolph N,Yu Jin,Günter S.A stochastic quasi-newtonmethod for online convex optimization [C]∥International Conference on Artificial Intelligence and Statistics.2007:436-443
[16] Sopya K,Drozda P.Stochastic gradient descent with Barzilai-Borwein update step for SVM purposes[J].Information Sciences,2015,316(20):218-233
[17] Sun Wen-yu,Yuan Ya-xiang.Optimization Theory and Metho-ds:Nonlinear Programming[M].Springer Science & Business Media,2006
[18] Barzilai J,Borwein J M.Two-point step size gradient methods[J].IMA Journal of Numerical Analysis,1988,8(1):141-148
[19] Biglari F,Solimanpur M.Scaling on the spectral gradient method[J].Journal of Optimization Theory and Applications,2013,158(2):626-635
[20] Farid M,Leong W J,Hassan M A.A new two-step gradient-type method for large-scale unconstrained optimization[J].Computers and Mathematics with Applications,2010,59(10):3301-3307
[21] Wright S J,Nowak R D,Figueiredo M A T.Sparse reconstruction by separable approximation[J].IEEE Transactions on Signal Processing,2009,7(7):2479-2493
[22] Yu Gao-hang,Guan Lu-tai,Chen Wu-fan.Spectral conjugategradient methods with sufficient descent property for large-scale unconstrained optimization[J].Optimization Methods and Software,2008,23(2):275-293

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!