计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 177-182.doi: 10.11896/j.issn.1002-137X.2016.08.036
赵宇,陈锐,刘蔚
ZHAO Yu, CHEN Rui and LIU Wei
摘要: 考虑将特征选择集成到支持向量机分类器中,提出集成特征选择的最优化支持向量机分类器——FS-SDP-SVM(Feature Selection in Semi-definite Program for Support Vector Machine)。该模型将每个特征分别在核空间中做特征映射,然后通过参数组合构成新的核矩阵,将特征选择过程与机器分类过程统一在一个优化目标下,同时达到特征选择与分类最优。在特征筛选方面,根据模型参数提出用于特征筛选的特征支持度和特征贡献度,通过控制二者的上下限可以在最优分类和最少特征之间灵活取舍。实证中分别将最优分类(FS-SDP-SVM1)和最少特征(FS-SDP-SVM2)两类集成化特征选择算法与Relief-F、SFS、SBS算法在UCI机器学习数据和人造数据中进行对比实验。结果表明,提出的FS-SDP-SVM算法在保持较好泛化能力的基础上,在多数实验数据集中实现了最大分类准确率或最少特征数量;在人工数据中,该方法可以准确地选出真正的特征,去除噪声特征。
[1] Zhao Y P,Li C.Feature Selection and Patent Analysis Research in Web Security Information Mining[J].Chinese Journal of Management Science,2004,2(z1):514-518(in Chinese) 赵燕平,李超.网络安全信息挖掘中的特征选择与专利分析研究[J].中国管理科学,2004,2(z1):514-518 [2] Guyon I,Elisseeff A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2002,3(6):1157-1182 [3] Zhang X G.Introdction to Statistical Learning Theory and Support Vector Machines[J].Acta Automatica Sinica,2000,26 (1):32-42(in Chinese) 张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26 (1):32-42 [4] Wei L W,Chen Z Y,Li J P.Evolution strategies based adaptive L-p LS-SVM[J].Information Science,2011,181(14):3000-3016 [5] Lanckriet G,Cristianini N,Bartlett P,et al.Learning the kernel matrix with semidefinite programming[J].Journal of Machine Learning Research,2002,5(1):323-330 [6] Dietterich T G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting and Randomization[J].Machine Learning,2000,40(2):139-157 [7] Mason L,Bartlett P,Baxter J.Improved generalization through explicit optimization of margins[J].Machine Learning,2000,38(3):243-255 [8] Kong E B,Dietterich T G.Error-Correcting Output Coding Corrects Bias and Variance[C]∥Proceedings of the Twelfth International Conference on Machine Learning.Morgan Kaufmann,1995:313-321 [9] Breiman L.Bias,variance and arcing classifiers[J].Additives for Polymers,2002(6):10 [10] Kohavi R,John G H.Wrappers for feature subset selection[J].Artificial Intelligence,1997,97(1/2):273-324 [11] Weston J,Elisseeff A,Schцlkopf B,et al.Use of the zero norm with linear models and kernel methods[J].Journal of Machine Learning Research,2003,3:1439-1461 [12] Tibshirani R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society 1996,58(1):267-288 [13] Wang H Q,Sun F C,Cai Y N,et al.On Multiple Kernel Learning Methods[J].Acta Automatica Sinica,2010,6(8):1037-1050(in Chinese) 汪洪桥,孙富春,蔡艳宁,等.多核学习方法[J].自动化学报,2010,6(8):1037-1050 [14] Kittler J,Hatef M,Duin R P W,et al.On combining classifiers[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(3):226-239 [15] Tsymbal A,Pechenizkiy M,Cunningham P.Diversity in search strategies for ensemble feature selection[J].Information Fusion,2005,6(1):83-98 [16] Li X,Zhang T W,Guo Z,et al.An Novel Ensemble Method of Feature Gene Selection Based on Recursive Partition-tree[J].Chinese Journal of Computers,2004,7(5):675-682(in Chinese) 李霞,张田文,郭政,等.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,7(5):675-682 [17] Sun L,Han C Z,Shen J J,et al.Generalized Rough Set Method for Ensemble Feature Selection and Multiple Classifier Fusion[J].Acta Automatica Sinica,2008,4(3):298-304(in Chinese) 孙亮,韩崇昭,沈建京,等.集成特征选择的广义粗集方法与多分类器融合[J].自动化学报,2008,4(3):298-304 [18] Pan W B,Cheng G,Guo,X J,et al.On Embedded Feature Selection Using Selective Ensemble for Network Traffic[J].Chinese Journal of Computers,2014,7(10):2128-2138(in Chinese) 潘吴斌,程光,郭晓军,等.基于选择性集成策略的嵌入式网络流特征选择[J].计算机学报,2014,7(10):2128-2138 [19] Scholkopf B,Smola A J.Learning with Kernels[M].MIT Press,2002 [20] Wolberg W H,Mangasarian O L.Multisurface method of pattern separation for medical diagnosis applied to breast cytology[J].Proceedings of the National Academy of Sciences, 1990,87(23):9193-9196 |
No related articles found! |
|