结构稀疏模型及其算法研究进展

doi:10.11896/j.issn.1002-137X.2016.6A.001

摘要/Abstract

摘要： 结构稀疏模型在统计学、信号处理和机器学习等领域中具有重要的应用。结构稀疏模型主要通过在目标函数中引入会导致组稀疏效果的罚函数来实现特征组结构选择。有趣的是一些组稀疏模型不仅能实现特征组选择,而且同时能够实现组内的特征选择。根据使用的罚函数的类型,结构稀疏模型主要分为组套索模型和非凸罚组稀疏模型两大类。系统地总结了重要的组结构稀疏模型,分析了各种组结构稀疏模型之间的区别与联系,归纳比较了各种组结构稀疏模型的统计特性(例如模型选择一致性、参数估计一致性和oracle性质)和组结构稀疏模型的求解算法。当前,结构套索模型主要包括普通组套索模型、L∞,1组套索模型、重叠组套索模型、树组套索模型、多输出树组套索模型、混合组套索模型、自适应组套索模型、逻辑斯蒂组套索模型和贝叶斯组套索模型。非凸罚组稀疏模型包括组SCAD罚模型、组桥模型和组MC罚模型等。求解组稀疏模型的算法有组最小角回归算法、块坐标下降(上升)算法、活动集算法、内点算法、投影梯度算法、谱投影梯度算法、轮换方向乘子算法和块坐标梯度下降算法等,结合组稀疏模型对这些算法进行了详细的分析。在使用上述优化方法前,通常需要对目标函数进行预处理,将不平滑的、非凸的、块坐标不可分离的组稀疏模型的目标函数向平滑、凸、块坐标可分离的方向进行转化,这一步常利用的技巧有变分不等式、Neste-rov的平滑近似技巧、局部一阶泰勒展开近似、局部二次近似、对偶范数和对偶函数等。接着给出了最新提出的一些组稀疏模型,如关于广义加模型的组套索模型、复合组桥模型、平方根组套索模型和关于Tobit模型的组套索模型等。最后,对组稀疏模型未来的研究方向进行了探讨。

关键词: 稀疏,组稀疏,罚函数,组套索,特征组选择,组内特征选择,算法

Abstract: The group sparse model has many important applications in the statistics,signal processing and machine learning.The group sparse model achieves feature group selection through introducing the sparsity-inducing penalty function into the objective function.It’s interesting that some group sparse models can achieve feature group selection and feature selection within groups simultaneously.According to the penalty functions,the sparse group models are mainly divided into two categories,i.e.,Group Lasso models and the group sparse models with non-convex penalty.This paper systematically summarized important group sparse models and analyzed the differences and relations between various group sparse models.In addition,we summarized and compared the statistical properties (such as model selection consistency,parameter estimation consistency and oracle property) and the solving algorithms for various group sparse models.Roughly speaking,the Group Lasso models include normal Group Lasso model,L∞,1 penalty Group Lasso model,overlapping Group Lasso model,tree guided Group Lasso model,multiple-output tree guided Group Lasso model,mixed Group Lasso model,adaptive Group Lasso model,logistic Group Lasso model and Bayesian Group Lasso model.Algorithms for solving group sparse model are composed of Group LARS,block coordinate descent method (block coordinate ascent method),active set method,interior point method,projected gradient method,spectral projected gradient method,alternating direction method of multipliers and block coordinate gradient descent method.We carried out a detailed analysis of these algorithms for specific group sparse models.Before using the optimization methods above,we must pretreat the objective function,i.e.,we must transform the nonsmooth,nonconvex and non-separable penalty function in the objective function of group sparse model into smooth,convex and separable functions.Variational inequalities,Nesterov’s smooth approximation techniques,local first-order approximation by Taylor series expansion,local quadratic approximation,the dual norm and dual function are often used in this step.Next,some group sparse mo-dels which are recently proposed were introduced,such as Group Lasso model based on generalized additive model,composite group bridge model,group square-root Lasso model,Group Lasso model based on Tobit model and so on.Finally,we talked the future research directions of the group sparse models.

Key words: Sparsity,Group sparsity,Penalty function,Group lasso,Feature group selection,Feature selection within group,Algorithm

刘建伟,崔立鹏,罗雄麟. 结构稀疏模型及其算法研究进展[J]. 计算机科学, 2016, 43(Z6): 1-16. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.001

LIU Jian-wei, CUI Li-peng and LUO Xiong-lin. Research and Development on Structured Sparse Models and Algorithms[J]. Computer Science, 2016, 43(Z6): 1-16. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.001

参考文献

[1] Tibshirani,R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society:Series B (Methodological),1996,58(1):267-288
[2] Yuan M,Lin Y.Model selection and estimation in regressionwith grouped variables[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2006,68(1):49-67
[3] Huang J,Zhang,T.The benefit of group sparsity[J].The Annals of Statistics,2010,38(4):1978-2004
[4] Bach F R.Consistency of the Group Lasso and multiple kernel learning[J].The Journal of Machine Learning Research,2008,9:1179-1225
[5] Wei F,Huang J.Consistent group selection in high-dimensional linear regression[J].Bernoulli:official journal of the Bernoulli Society for Mathematical Statistics and Probability,2010,16(4):1369-1384
[6] Lounici K,Pontil M,Van De Geer S,et al.Oracle inequalities and optimal inference under group sparsity[J].The Annals of Statistics,2011,39(4):2164-2204
[7] Liu H,Zhang J.Estimation consistency of the Group Lasso and its applications[C]∥Proceedings of the 12th International Conference on Artificial Intelligence and Statistics.Florida,USA:the MIT Press,2009:376-383
[8] Roth V,Fischer B.The group-lasso for generalized linear mo-dels:uniqueness of solutions and efficient algorithms[C]∥ Proceedings of the 25th International Conference on Machine Learning.Helsinki,Finland:ACM,2008:848-855
[9] Yang Y,Zou H.A Fast Unified Algorithm for Solving Group-Lasso Penalized Learning Problems[J].Journal of Computationaland Graphical Statistics,2015,25(6):1129-1141
[10] Qin Z,Scheinberg K,Goldfarb D.Efficient block-coordinate descent algorithms for the Group Lasso[J].Mathematical Programming Computation,2013,5(2):143-169
[11] Rakotomamonjy A.Surveying and comparing simultaneoussparse approximation (or group-Lasso) algorithms[J].Signal processing,2011,91(7):1505-1526
[12] Turlach B A,Venables W N,Wright S J.imultaneous variable selection[J].Technometrics,2005,47(3):349-363
[13] Tropp J A.Algorithms for simultaneous sparse approximation Part II:Convex relaxation[J].Signal Processing,2006,86(3):589-602
[14] Quattoni A,Collins M,Darrell T.Transfer learning for image classification with sparse prototype representations[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Alaska,USA:IEEE,2008.1-8
[15] Schmidt M W,Murphy K P,Fung G,et al.Structure learning in random fields for heart motion abnormality detection[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Alaska,USA:IEEE,2008:1-8
[16] Quattoni A,Carreras X,Collins M,et al.An efficient projection for L1,∞ regularization[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.Quebec,Canada:ACM,2009:857-864
[17] Vogt J E,Roth V.The group-Lasso:l1,∞ regularization versus l_1,2 regularization[C]∥ Proceedings of the 32nd DAGM conference on Pattern recognition.Darmstadt,Germany:Springer-Verlag,2010:252-261
[18] Jenatton R,Audibert J Y,Bach F.Structured variable selection with sparsity-inducing norms[J].The Journal of Machine Learning Research,2011,12:2777-2824
[19] Grave E,Obozinski G R,Bach F R.Trace lasso:a trace norm regularization for correlated designs[C]∥Proceedings of Advances in Neural Information Processing Systems 24:25th Annual Conference on Neural Information Processing Systems.Granada,Spain:the MIT Press,2011:2187-2195
[20] Percival D.Theoretical properties of the overlapping groupsLasso[J].Electronic Journal of Statistics,2012,6(1):269-288
[21] Yuan L,Liu J,Ye J.Efficient Methods for Overlapping Group Lasso[J].IEEE transactions on pattern analysis and machine intelligence,2013,35(9):2104-2116
[22] Zhao P,Rocha G,Yu B.Grouped and hierarchical model selection through composite absolute penalties[J].The Annals of Statistics,2009,37(6A):3468-3497
[23] Jenatton R,Mairal J,Obozinski G,et al.Proximal methods for hierarchical sparse coding[J].Journal of Machine Learning Research,2011,12(2):2297-2334
[24] Liu J,Ye J.Moreau-Yosida regularization for grouped treestructure learning[C]∥ Proceedings of Advances in Neural Information Processing Systems 23:24th Annual Conference on Neural Information Processing Systems 2010.Vancouver,British Columbia,Canada:Curran Associates,2010:1459-1467
[25] Jawanpuria P,Nath J S,Ramakrishnan G.Generalized hierarchical kernel learning[J].The Journal of Machine Learning Research,2015,6(1):617-652
[26] Kim S,Xing E P.Tree-guided Group Lasso for multi-task regression with structured sparsity[C]∥ Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel:Omnipress,2010:543-550
[27] Kim S,Xing E P.Tree-guided Group Lasso for multi-response regression with structured sparsity,with an application to eQTL mapping[J].The Annals of Applied Statistics,2012,6(3):1095-1117
[28] Prechmann P,Bronstein A M,Sapiro G.Learning efficientsparse and low rank models[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,7(9):1821-1833
[29] Simon N,Friedman J,Hastie T,et al.A sparse-Group Lasso[J].Journal of Computational and Graphical Statistics,2013,22(2):231-245
[30] Chatterjee S,Steinhaeuser K,Banerjee A,et al.Sparse Group Lasso:Consistency and Climate Applications[C]∥Proceedings of the 12th SIAM International Conference on Data Mining.California,USA:Omnipress,2012:47-58
[31] Wang H,Leng C.A note on adaptive Group Lasso[J].Computational Statistics and Data Analysis,2008,52(12):5277-5286
[32] Fan J,Ma Y,Dai W.Nonparametric independence screening in sparse ultra-high-dimensional additive models[J].Journal of the American Statistical Association,2014,9(507):1270-1284
[33] Meier L,Van De Geer S,Bühlmann P.The Group Lasso for logistic regression[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2008,0(1):53-71
[34] Kim S,Xing E P.Tree-guided Group Lasso for multi-task regression with structured sparsity[C]∥Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel:Omnipress,2010:543-550
[35] Sun H,Wang S.Penalized logistic regression for high-dimen-sional DNA methylation data with case-control studies[J].Bioinformatics,2012,28(10):1368-1375
[36] Raman S,Fuchs T J,Wild P J,et al.The Bayesian group-Lasso for analyzing contingency tables[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.Montreal,Canada:ACM,2009:881-888
[37] Chandran M.Analysis of Bayesian Group-Lasso in Regression Models[D].Florida:University of Florida,2011
[38] Wang L,Chen G,Li H.Group SCAD regression analysis for microarray time course gene expression data[J].Bioinformatics,2007,23(12):1486-1494
[39] Jiang D.Concave selection in generalized linear models[D].Iowa:University of Lowa,USA,2012
[40] Huang J,Breheny P,Ma S.A selective review of group selection in high-dimensional models[J].Statistical Science,2012,27(4):481-499
[41] Geng Z,Wang S,Yu M.Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study[J].Biometrics,2015,71(1):53-62
[42] Breheny P,Huang J.Penalized methods for bi-level variable selection[J].Statistics and its Interface,2009,2(3):369-380
[43] Breheny P,Huang J.Group descent algorithms for nonconvexpenalized linear and logistic regression models with grouped predictors[J].Statistics and Computing,2015,5(2):173-187
[44] Zou H.The adaptive Lasso and its oracle properties[J].Journal of the American Statistical Association,2006,101(476):1418-1429
[45] Fan J,Li R.Variable selection via nonconcave penalized likeli-hood and its oracle properties[J].Journal of the American Statistical Association,2001,96(456):1348-1360
[46] Zhao P,Yu B.On model selection consistency of Lasso[J].The Journal of Machine Learning Research,2006,7:2541-2563
[47] Zhang C H,Huang J.The sparsity and bias of the Lasso selection in high-dimensional linear regression[J].The Annals of Statistics,2008,36(4):1567-1594
[48] Bickel P J,Ritov Y,Tsybakov A B.Simultaneous analysis ofLasso and Dantzig selector[J].The Annals of Statistics,2009,37(4):1705-1732
[49] Parikh N,Boyd S P.Proximal Algorithms[J].Foundations and Trends in optimization,2014,1(3):127-239
[50] Zhou Z,Zhang Q,So A M.$\ell_ {1,p} $-Norm Regularization:Error Bounds and Convergence Rate Analysis of First-Order Methods[C]∥Proceedings of the 32nd International Con-ference on Machine Learning (ICML-15).Lille,France:MIT Press,2015:1501-1510
[51] Rakotomamonjy A,Flamary R,Gasso G,et al.L_p－L_qpenalty for sparse linear and sparse multiple kernel multi-task learning[J].IEEE Transaction on Neural Networks,2011,22(8):1307-1320
[52] Zhang C H.Nearly unbiased variable selection under minimaxconcave penalty[J].Annals of Statistics,2010,38(2):894-942
[53] Fu W J.Penalized regressions:the bridge versus the Lasso[J].Journal of Computational and Graphical Statistics,1998,7(3):397-416
[54] Nesterov Y.Smooth minimization of non-smooth functions[J].Mathematical Programming,2005,103(1):127-152
[55] Boyd S,Parikh N,Chu E.Distributed optimization and statistical learning via the alternating direction method of multipliers[J].Foundations and Trends in Machine Learning,2011,3(1):1-122
[56] Efron B,Hastie T,Johnstone I.Least angle regression[J].The Annals of statistics,2004,32(2):407-499
[57] Fu W J.Penalized regressions:the bridge versus the Lasso[J].Journal of Computational and Graphical Statistics,1998,7(3):397-416
[58] Fornasier M,Rauhut H.Recovery algorithms for vector-valued data with joint sparsity constraints[J].SIAM Journal on Numerical Analysis,2008,46(2):577-613
[59] Chen X,Lin Q,Kim S.Smoothing proximal gradient method for general structured sparse regression[J].The Annals of Applied Statistics,2012,6(2):719-752
[60] Tseng P,Yun S.A coordinate gradient descent method for nonsmooth separable minimization[J].Computational Optimization and Applications,2010,47(2):179-206
[61] Beck A,Teboulle M.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202
[62] Simon N,Tibshirani R.Standardization and the group Lassopenalty[J].Statistica Sinica,2012,22(3):983-1001
[63] Percival D.Theoretical properties of the overlapping groupsLasso[J].Electronic Journal of Statistics,2012,6(5):269-288
[64] Bunea F,Lederer J,She Y.The Group Square-Root Lasso:Theoretical Properties and Fast Algorithms[J].IEEE Transactions on Information Theory,2014,0(2):1313-1325
[65] Belloni A,Chernozhukov V,Wang L.Square-root Lasso:pivotal recovery of sparse signals via conic programming[J].Biometrika,2011,98(4):791-806
[66] Zhou J,Liu J,Narayan V A.Modeling disease progression viafused sparse Group Lasso[C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2012:1095-1103
[67] Tibshirani R,Saunders M,Rosset S.Sparsity and smoothnessvia the fused Lasso[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2005,67(1):91-108
[68] Seetharaman I.Consisitent bi-level variable selection via com-posite group bridge regression[D].Kansas State:Kansas State University,2013
[69] Nardi Y,Rinaldo A.On the asymptotic properties of the group lasso estimator for linear models[J].Electronic Journal of Statistics,2008,2(1):605-633
[70] Liu X,Wang Z,Wu Y.Group variable selection and estimation in the tobit censored response model[J].Computational Statistics and Data Analysis,2013,60(2):80-89
[71] W Zhan-feng,W Yao-hua,Z Lin-cheng.A LASSO-Type Ap-proach to Variable Selection and Estimation for Censored Regression Model[J].Chinese Journal of Applied Probability and Statistics,2010,26(1):66-80
[72] Yin J,Chen X,Xing E P.Group Sparse Additive Models[C]∥Proceedings of the 29th International Conference on Machine Learning.Omnipress.Scotland,UK:2012:871-878
[73] Ravikumar P,Lafferty J,Liu H.Sparse additive models[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2009,71(5):1009-1030
[74] Meinshausen N,Bühlmann P.Stability selection[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2010,72(4):417-473
[75] Borgi M A,Labate D,El Arbi M.Sparse multi-stage regularized feature learning for robust face recognition[J].Expert Systems with Applications,2015,42(1):269-279
[76] Zhang T.Multi-stage convex relaxation for feature selection[J].Bernoulli,2013,19(5B):2277-2293
[77] Asif M S,Romberg J.Fast and accurate algorithms for re-weighted-norm minimization[J].IEEE Transactions on Signal Processing,2013,61(23):5905-5916
[78] Geman D,Yang C.Nonlinear image recovery with half-quadratic regularization[J].IEEE Transactions on Image Processing,1995,4(7):932-946
[79] Zhao Y B,Li D.Reweighted \ell_1-Minimization for Sparse Solutions to Underdetermined Linear Systems[J].SIAM Journal on Optimization,2012,22(3):1065-1088
[80] Simon N,Friedman J,Hastie T.A sparse-group lasso[J].Journal of Computational and Graphical Statistics,2013,22(2):231-245
[81] Bühlmann P.Statistical significance in high-dimensional linearmodels[J].Bernoulli,2013,19(4):1212-1242
[82] Zhang C H,Zhang S S.Confidence intervals for low dimensional parameters in high dimensional linear models[J].Journal of the Royal Statistical Society:Series B (Statistical Methodology),2014,76(1):217-242
[83] Ye F,Zhang C H.Rate Minimaxity of the Lasso and Dantzig Selector for the L_q Loss in L_r Balls[J].The Journal of Machine Learning Research,2010,11:3519-3540

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed