Computer Science ›› 2016, Vol. 43 ›› Issue (9): 47-51.doi: 10.11896/j.issn.1002-137X.2016.09.008

Previous Articles     Next Articles

Online Learning Based on Stochastic Spectral Gradient

XUE Wei, ZHANG Wen-sheng and REN Jun-hong   

  • Online:2018-12-01 Published:2018-12-01

Abstract: We considered a type of learning problems whose objective functions can be formulated as an average of a large number of component functions,and assumed that each component function is smooth.Among many machine learning methods,online learning provides a favorable tool for big data learning,not only due to its simple operation and fast convergence rate,but also for its ability to automatically update model.To solve such problems,we developed a stochastic spectral gradient descent (S2GD) method,which employs the Rayleigh quotient to collect second-order information to construct Hessian inverse approximations.S2GD can be viewed as an approach that extends the spectral gradient method working in deterministic setting to stochastic setting.At each iteration,the generated search direction guarantees descent property.The existing conclusion indicates that the S2GD method is of convergence.Preliminary experimental results on LIBSVM data sets are reported to demonstrate the feasibility and effectiveness of our approach.

Key words: Online learning,Stochastic optimization,Convex optimization,Stochastic gradient,Spectral gradient

[1] Vapnik V.The Nature of Statistical Learning Theory[M].New York:Springer,2000
[2] 李航.统计学习方法[M].北京:清华大学出版社,2012
[3] Bottou L.Stochastic Learning[M]∥Advanced Lectures on Machine Learning.Springer Berlin Heidelberg,2004:146-168
[4] Bottou L.Large-scale machine learning with stochastic gradient descent[C]∥Proceedings of Computational Statistics 2010.Physica-Verlag HD,2010:177-186
[5] Zhang Tong.Solving large scale linear prediction problems using stochastic gradient descent algorithms[C]∥Proceedings of the 21st International Conference on Machine Learning.2004:919-926
[6] Robbins H,Monro S.A stochastic approximation method[J].The Annals of Mathematical Statistics,1951,22(3):400-407
[7] Powell W B.Approximate Dynamic Programming:Solving theCurses of Dimensionality[M].John Wiley & Sons,2007
[8] Johnson R,Zhang Tong.Accelerating stochastic gradient de-scent using predictive variance reduction[C]∥Advances in Neural Information Processing Systems.2013:315-323
[9] Konecˇny` J,Richtárik P.Semi-stochastic gradient descent methods.http://arxiv.org/pdf/1312.1666v2.pdf
[10] Le Roux N,Schmidt M,Bach F.A stochastic gradient method with an exponential convergence rate for finite training [C]∥Advances in Neural Information Processing Systems.2012:2663-2671
[11] Blatt D,Hero A O,Gauchman H.A convergent incremental gradient method with a constant step size [J].SIAM Journal on Optimization,2007,18(1):29-51
[12] Bordes A,Bottou L,Gallinari P.SGD-QN:careful quasi-Newton stochastic gradient descent[J].Journal of Machine Learning Research,2009,10:1737-1754
[13] Mokhtari A,Ribeiro A.A dual stochastic DFP algorithm for optimal resource allocation in wireless systems[C]∥IEEE 14th Workshop on Signal Processing Advances in Wireless Communications.2013:21-25
[14] Mokhtari A.Ribeiro A.Regularized stochastic BFGS algorithm[C]∥IEEE Global Conference on Signal and Information Processing.2013:1109-1112
[15] Schraudolph N,Yu Jin,Günter S.A stochastic quasi-newtonmethod for online convex optimization [C]∥International Conference on Artificial Intelligence and Statistics.2007:436-443
[16] Sopya K,Drozda P.Stochastic gradient descent with Barzilai-Borwein update step for SVM purposes[J].Information Sciences,2015,316(20):218-233
[17] Sun Wen-yu,Yuan Ya-xiang.Optimization Theory and Metho-ds:Nonlinear Programming[M].Springer Science & Business Media,2006
[18] Barzilai J,Borwein J M.Two-point step size gradient methods[J].IMA Journal of Numerical Analysis,1988,8(1):141-148
[19] Biglari F,Solimanpur M.Scaling on the spectral gradient method[J].Journal of Optimization Theory and Applications,2013,158(2):626-635
[20] Farid M,Leong W J,Hassan M A.A new two-step gradient-type method for large-scale unconstrained optimization[J].Computers and Mathematics with Applications,2010,59(10):3301-3307
[21] Wright S J,Nowak R D,Figueiredo M A T.Sparse reconstruction by separable approximation[J].IEEE Transactions on Signal Processing,2009,7(7):2479-2493
[22] Yu Gao-hang,Guan Lu-tai,Chen Wu-fan.Spectral conjugategradient methods with sufficient descent property for large-scale unconstrained optimization[J].Optimization Methods and Software,2008,23(2):275-293

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!