Computer Science ›› 2016, Vol. 43 ›› Issue (Z6): 1-16.doi: 10.11896/j.issn.1002-137X.2016.6A.001

    Next Articles

Research and Development on Structured Sparse Models and Algorithms

LIU Jian-wei, CUI Li-peng and LUO Xiong-lin   

  • Online:2018-12-01 Published:2018-12-01

Abstract: The group sparse model has many important applications in the statistics,signal processing and machine learning.The group sparse model achieves feature group selection through introducing the sparsity-inducing penalty function into the objective function.It’s interesting that some group sparse models can achieve feature group selection and feature selection within groups simultaneously.According to the penalty functions,the sparse group models are mainly divided into two categories,i.e.,Group Lasso models and the group sparse models with non-convex penalty.This paper systematically summarized important group sparse models and analyzed the differences and relations between various group sparse models.In addition,we summarized and compared the statistical properties (such as model selection consistency,parameter estimation consistency and oracle property) and the solving algorithms for various group sparse models.Roughly speaking,the Group Lasso models include normal Group Lasso model,L∞,1 penalty Group Lasso model,overlapping Group Lasso model,tree guided Group Lasso model,multiple-output tree guided Group Lasso model,mixed Group Lasso model,adaptive Group Lasso model,logistic Group Lasso model and Bayesian Group Lasso model.Algorithms for solving group sparse model are composed of Group LARS,block coordinate descent method (block coordinate ascent method),active set method,interior point method,projected gradient method,spectral projected gradient method,alternating direction method of multipliers and block coordinate gradient descent method.We carried out a detailed analysis of these algorithms for specific group sparse models.Before using the optimization methods above,we must pretreat the objective function,i.e.,we must transform the nonsmooth,nonconvex and non-separable penalty function in the objective function of group sparse model into smooth,convex and separable functions.Variational inequalities,Nesterov’s smooth approximation techniques,local first-order approximation by Taylor series expansion,local quadratic approximation,the dual norm and dual function are often used in this step.Next,some group sparse mo-dels which are recently proposed were introduced,such as Group Lasso model based on generalized additive model,composite group bridge model,group square-root Lasso model,Group Lasso model based on Tobit model and so on.Finally,we talked the future research directions of the group sparse models.

Key words: Sparsity,Group sparsity,Penalty function,Group lasso,Feature group selection,Feature selection within group,Algorithm

[1] Tibshirani,R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society:Series B (Methodological),1996,58(1):267-288
[2] Yuan M,Lin Y.Model selection and estimation in regressionwith grouped variables[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2006,68(1):49-67
[3] Huang J,Zhang,T.The benefit of group sparsity[J].The Annals of Statistics,2010,38(4):1978-2004
[4] Bach F R.Consistency of the Group Lasso and multiple kernel learning[J].The Journal of Machine Learning Research,2008,9:1179-1225
[5] Wei F,Huang J.Consistent group selection in high-dimensional linear regression[J].Bernoulli:official journal of the Bernoulli Society for Mathematical Statistics and Probability,2010,16(4):1369-1384
[6] Lounici K,Pontil M,Van De Geer S,et al.Oracle inequalities and optimal inference under group sparsity[J].The Annals of Statistics,2011,39(4):2164-2204
[7] Liu H,Zhang J.Estimation consistency of the Group Lasso and its applications[C]∥Proceedings of the 12th International Conference on Artificial Intelligence and Statistics.Florida,USA:the MIT Press,2009:376-383
[8] Roth V,Fischer B.The group-lasso for generalized linear mo-dels:uniqueness of solutions and efficient algorithms[C]∥ Proceedings of the 25th International Conference on Machine Learning.Helsinki,Finland:ACM,2008:848-855
[9] Yang Y,Zou H.A Fast Unified Algorithm for Solving Group-Lasso Penalized Learning Problems[J].Journal of Computationaland Graphical Statistics,2015,25(6):1129-1141
[10] Qin Z,Scheinberg K,Goldfarb D.Efficient block-coordinate descent algorithms for the Group Lasso[J].Mathematical Programming Computation,2013,5(2):143-169
[11] Rakotomamonjy A.Surveying and comparing simultaneoussparse approximation (or group-Lasso) algorithms[J].Signal processing,2011,91(7):1505-1526
[12] Turlach B A,Venables W N,Wright S J.imultaneous variable selection[J].Technometrics,2005,47(3):349-363
[13] Tropp J A.Algorithms for simultaneous sparse approximation Part II:Convex relaxation[J].Signal Processing,2006,86(3):589-602
[14] Quattoni A,Collins M,Darrell T.Transfer learning for image classification with sparse prototype representations[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Alaska,USA:IEEE,2008.1-8
[15] Schmidt M W,Murphy K P,Fung G,et al.Structure learning in random fields for heart motion abnormality detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Alaska,USA:IEEE,2008:1-8
[16] Quattoni A,Carreras X,Collins M,et al.An efficient projection for L1,∞ regularization[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.Quebec,Canada:ACM,2009:857-864
[17] Vogt J E,Roth V.The group-Lasso:l1,∞ regularization versus l1,2 regularization[C]∥ Proceedings of the 32nd DAGM conference on Pattern recognition.Darmstadt,Germany:Springer-Verlag,2010:252-261
[18] Jenatton R,Audibert J Y,Bach F.Structured variable selection with sparsity-inducing norms[J].The Journal of Machine Learning Research,2011,12:2777-2824
[19] Grave E,Obozinski G R,Bach F R.Trace lasso:a trace norm regularization for correlated designs[C]∥Proceedings of Advances in Neural Information Processing Systems 24:25th Annual Conference on Neural Information Processing Systems.Granada,Spain:the MIT Press,2011:2187-2195
[20] Percival D.Theoretical properties of the overlapping groupsLasso[J].Electronic Journal of Statistics,2012,6(1):269-288
[21] Yuan L,Liu J,Ye J.Efficient Methods for Overlapping Group Lasso[J].IEEE transactions on pattern analysis and machine intelligence,2013,35(9):2104-2116
[22] Zhao P,Rocha G,Yu B.Grouped and hierarchical model selection through composite absolute penalties[J].The Annals of Statistics,2009,37(6A):3468-3497
[23] Jenatton R,Mairal J,Obozinski G,et al.Proximal methods for hierarchical sparse coding[J].Journal of Machine Learning Research,2011,12(2):2297-2334
[24] Liu J,Ye J.Moreau-Yosida regularization for grouped treestructure learning[C]∥ Proceedings of Advances in Neural Information Processing Systems 23:24th Annual Conference on Neural Information Processing Systems 2010.Vancouver,British Columbia,Canada:Curran Associates,2010:1459-1467
[25] Jawanpuria P,Nath J S,Ramakrishnan G.Generalized hierarchical kernel learning[J].The Journal of Machine Learning Research,2015,6(1):617-652
[26] Kim S,Xing E P.Tree-guided Group Lasso for multi-task regression with structured sparsity[C]∥ Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel:Omnipress,2010:543-550
[27] Kim S,Xing E P.Tree-guided Group Lasso for multi-response regression with structured sparsity,with an application to eQTL mapping[J].The Annals of Applied Statistics,2012,6(3):1095-1117
[28] Prechmann P,Bronstein A M,Sapiro G.Learning efficientsparse and low rank models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,7(9):1821-1833
[29] Simon N,Friedman J,Hastie T,et al.A sparse-Group Lasso[J].Journal of Computational and Graphical Statistics,2013,22(2):231-245
[30] Chatterjee S,Steinhaeuser K,Banerjee A,et al.Sparse Group Lasso:Consistency and Climate Applications[C]∥Proceedings of the 12th SIAM International Conference on Data Mining.California,USA:Omnipress,2012:47-58
[31] Wang H,Leng C.A note on adaptive Group Lasso[J].Computational Statistics and Data Analysis,2008,52(12):5277-5286
[32] Fan J,Ma Y,Dai W.Nonparametric independence screening in sparse ultra-high-dimensional additive models[J].Journal of the American Statistical Association,2014,9(507):1270-1284
[33] Meier L,Van De Geer S,Bühlmann P.The Group Lasso for logistic regression[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2008,0(1):53-71
[34] Kim S,Xing E P.Tree-guided Group Lasso for multi-task regression with structured sparsity[C]∥Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel:Omnipress,2010:543-550
[35] Sun H,Wang S.Penalized logistic regression for high-dimen-sional DNA methylation data with case-control studies[J].Bioinformatics,2012,28(10):1368-1375
[36] Raman S,Fuchs T J,Wild P J,et al.The Bayesian group-Lasso for analyzing contingency tables[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.Montreal,Canada:ACM,2009:881-888
[37] Chandran M.Analysis of Bayesian Group-Lasso in Regression Models[D].Florida:University of Florida,2011
[38] Wang L,Chen G,Li H.Group SCAD regression analysis for microarray time course gene expression data[J].Bioinformatics,2007,23(12):1486-1494
[39] Jiang D.Concave selection in generalized linear models[D].Iowa:University of Lowa,USA,2012
[40] Huang J,Breheny P,Ma S.A selective review of group selection in high-dimensional models[J].Statistical Science,2012,27(4):481-499
[41] Geng Z,Wang S,Yu M.Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study[J].Biometrics,2015,71(1):53-62
[42] Breheny P,Huang J.Penalized methods for bi-level variable selection[J].Statistics and its Interface,2009,2(3):369-380
[43] Breheny P,Huang J.Group descent algorithms for nonconvexpenalized linear and logistic regression models with grouped predictors[J].Statistics and Computing,2015,5(2):173-187
[44] Zou H.The adaptive Lasso and its oracle properties[J].Journal of the American Statistical Association,2006,101(476):1418-1429
[45] Fan J,Li R.Variable selection via nonconcave penalized likeli-hood and its oracle properties[J].Journal of the American Statistical Association,2001,96(456):1348-1360
[46] Zhao P,Yu B.On model selection consistency of Lasso[J].The Journal of Machine Learning Research,2006,7:2541-2563
[47] Zhang C H,Huang J.The sparsity and bias of the Lasso selection in high-dimensional linear regression[J].The Annals of Statistics,2008,36(4):1567-1594
[48] Bickel P J,Ritov Y,Tsybakov A B.Simultaneous analysis ofLasso and Dantzig selector[J].The Annals of Statistics,2009,37(4):1705-1732
[49] Parikh N,Boyd S P.Proximal Algorithms[J].Foundations and Trends in optimization,2014,1(3):127-239
[50] Zhou Z,Zhang Q,So A M.$\ell_ {1,p} $-Norm Regularization:Error Bounds and Convergence Rate Analysis of First-Order Methods[C]∥Proceedings of the 32nd International Con-ference on Machine Learning (ICML-15).Lille,France:MIT Press,2015:1501-1510
[51] Rakotomamonjy A,Flamary R,Gasso G,et al.Lp-Lqpenalty for sparse linear and sparse multiple kernel multi-task learning[J].IEEE Transaction on Neural Networks,2011,22(8):1307-1320
[52] Zhang C H.Nearly unbiased variable selection under minimaxconcave penalty[J].Annals of Statistics,2010,38(2):894-942
[53] Fu W J.Penalized regressions:the bridge versus the Lasso[J].Journal of Computational and Graphical Statistics,1998,7(3):397-416
[54] Nesterov Y.Smooth minimization of non-smooth functions[J].Mathematical Programming,2005,103(1):127-152
[55] Boyd S,Parikh N,Chu E.Distributed optimization and statistical learning via the alternating direction method of multipliers[J].Foundations and Trends in Machine Learning,2011,3(1):1-122
[56] Efron B,Hastie T,Johnstone I.Least angle regression[J].The Annals of statistics,2004,32(2):407-499
[57] Fu W J.Penalized regressions:the bridge versus the Lasso[J].Journal of Computational and Graphical Statistics,1998,7(3):397-416
[58] Fornasier M,Rauhut H.Recovery algorithms for vector-valued data with joint sparsity constraints[J].SIAM Journal on Numerical Analysis,2008,46(2):577-613
[59] Chen X,Lin Q,Kim S.Smoothing proximal gradient method for general structured sparse regression[J].The Annals of Applied Statistics,2012,6(2):719-752
[60] Tseng P,Yun S.A coordinate gradient descent method for nonsmooth separable minimization[J].Computational Optimization and Applications,2010,47(2):179-206
[61] Beck A,Teboulle M.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202
[62] Simon N,Tibshirani R.Standardization and the group Lassopenalty[J].Statistica Sinica,2012,22(3):983-1001
[63] Percival D.Theoretical properties of the overlapping groupsLasso[J].Electronic Journal of Statistics,2012,6(5):269-288
[64] Bunea F,Lederer J,She Y.The Group Square-Root Lasso:Theoretical Properties and Fast Algorithms[J].IEEE Transactions on Information Theory,2014,0(2):1313-1325
[65] Belloni A,Chernozhukov V,Wang L.Square-root Lasso:pivotal recovery of sparse signals via conic programming[J].Biometrika,2011,98(4):791-806
[66] Zhou J,Liu J,Narayan V A.Modeling disease progression viafused sparse Group Lasso[C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:1095-1103
[67] Tibshirani R,Saunders M,Rosset S.Sparsity and smoothnessvia the fused Lasso[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2005,67(1):91-108
[68] Seetharaman I.Consisitent bi-level variable selection via com-posite group bridge regression[D].Kansas State:Kansas State University,2013
[69] Nardi Y,Rinaldo A.On the asymptotic properties of the group lasso estimator for linear models[J].Electronic Journal of Statistics,2008,2(1):605-633
[70] Liu X,Wang Z,Wu Y.Group variable selection and estimation in the tobit censored response model[J].Computational Statistics and Data Analysis,2013,60(2):80-89
[71] W Zhan-feng,W Yao-hua,Z Lin-cheng.A LASSO-Type Ap-proach to Variable Selection and Estimation for Censored Regression Model[J].Chinese Journal of Applied Probability and Statistics,2010,26(1):66-80
[72] Yin J,Chen X,Xing E P.Group Sparse Additive Models[C]∥Proceedings of the 29th International Conference on Machine Learning.Omnipress.Scotland,UK:2012:871-878
[73] Ravikumar P,Lafferty J,Liu H.Sparse additive models[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2009,71(5):1009-1030
[74] Meinshausen N,Bühlmann P.Stability selection[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2010,72(4):417-473
[75] Borgi M A,Labate D,El Arbi M.Sparse multi-stage regularized feature learning for robust face recognition[J].Expert Systems with Applications,2015,42(1):269-279
[76] Zhang T.Multi-stage convex relaxation for feature selection[J].Bernoulli,2013,19(5B):2277-2293
[77] Asif M S,Romberg J.Fast and accurate algorithms for re-weighted-norm minimization[J].IEEE Transactions on Signal Processing,2013,61(23):5905-5916
[78] Geman D,Yang C.Nonlinear image recovery with half-quadratic regularization[J].IEEE Transactions on Image Processing,1995,4(7):932-946
[79] Zhao Y B,Li D.Reweighted \ell_1-Minimization for Sparse Solutions to Underdetermined Linear Systems[J].SIAM Journal on Optimization,2012,22(3):1065-1088
[80] Simon N,Friedman J,Hastie T.A sparse-group lasso[J].Journal of Computational and Graphical Statistics,2013,22(2):231-245
[81] Bühlmann P.Statistical significance in high-dimensional linearmodels[J].Bernoulli,2013,19(4):1212-1242
[82] Zhang C H,Zhang S S.Confidence intervals for low dimensional parameters in high dimensional linear models[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2014,76(1):217-242
[83] Ye F,Zhang C H.Rate Minimaxity of the Lasso and Dantzig Selector for the Lq Loss in Lr Balls[J].The Journal of Machine Learning Research,2010,11:3519-3540

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!