Computer Science ›› 2016, Vol. 43 ›› Issue (8): 177-182, 215.doi: 10.11896/j.issn.1002-137X.2016.08.036

Previous Articles     Next Articles

Research on Optimal Support Vector Classifier Model Integrating Feature Selection

ZHAO Yu, CHEN Rui and LIU Wei   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Considering taking the feature selection process into the support vector machine classifier,a new model called feature selection in semi-definite program for support vector machine(FS-SDP-SVM) was proposed in this paper for integrating the target of feature selection and machine classifier.The key to this model is to split the kernel space into several subspace by each feature.With the linear combination of these subspaces,the new kernel matrix was constructed and optimized with the support vector classifier by semi-definite programing.Two parameters for the feature choosing are announced,namely feature supporter and feature contributor,which can be flexibly adjusted for the need of maximizing accurate rate (FS-SDP-SVM1) or minimizing feature quantity (FS-SDP-SVM2).The empirical study analyzed the difference between two model types and other feature selection algorithms Relief-F,SFS and SBS on the UCI machine learning data and man-made data.Results show that FS-SDP-SVM can achieve maximum accurate rate or minimum feature quantity in majority of UCI data in consistent with the good ability of generalization.This method precisely gets rid of the noise data and preserves the real features in man-made data test.

Key words: Feature selection,Ensemble method,Support vector classifier,Sub-kernel space,Semi-definite programming

[1] Zhao Y P,Li C.Feature Selection and Patent Analysis Research in Web Security Information Mining[J].Chinese Journal of Management Science,2004,2(z1):514-518(in Chinese) 赵燕平,李超.网络安全信息挖掘中的特征选择与专利分析研究[J].中国管理科学,2004,2(z1):514-518
[2] Guyon I,Elisseeff A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2002,3(6):1157-1182
[3] Zhang X G.Introdction to Statistical Learning Theory and Support Vector Machines[J].Acta Automatica Sinica,2000,26 (1):32-42(in Chinese) 张学工.关于统计学习理论与支持向量机[J].自动化学报,2000,26 (1):32-42
[4] Wei L W,Chen Z Y,Li J P.Evolution strategies based adaptive L-p LS-SVM[J].Information Science,2011,181(14):3000-3016
[5] Lanckriet G,Cristianini N,Bartlett P,et al.Learning the kernel matrix with semidefinite programming[J].Journal of Machine Learning Research,2002,5(1):323-330
[6] Dietterich T G.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting and Randomization[J].Machine Learning,2000,40(2):139-157
[7] Mason L,Bartlett P,Baxter J.Improved generalization through explicit optimization of margins[J].Machine Learning,2000,38(3):243-255
[8] Kong E B,Dietterich T G.Error-Correcting Output Coding Corrects Bias and Variance[C]∥Proceedings of the Twelfth International Conference on Machine Learning.Morgan Kaufmann,1995:313-321
[9] Breiman L.Bias,variance and arcing classifiers[J].Additives for Polymers,2002(6):10
[10] Kohavi R,John G H.Wrappers for feature subset selection[J].Artificial Intelligence,1997,97(1/2):273-324
[11] Weston J,Elisseeff A,Schцlkopf B,et al.Use of the zero norm with linear models and kernel methods[J].Journal of Machine Learning Research,2003,3:1439-1461
[12] Tibshirani R.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society 1996,58(1):267-288
[13] Wang H Q,Sun F C,Cai Y N,et al.On Multiple Kernel Learning Methods[J].Acta Automatica Sinica,2010,6(8):1037-1050(in Chinese) 汪洪桥,孙富春,蔡艳宁,等.多核学习方法[J].自动化学报,2010,6(8):1037-1050
[14] Kittler J,Hatef M,Duin R P W,et al.On combining classifiers[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(3):226-239
[15] Tsymbal A,Pechenizkiy M,Cunningham P.Diversity in search strategies for ensemble feature selection[J].Information Fusion,2005,6(1):83-98
[16] Li X,Zhang T W,Guo Z,et al.An Novel Ensemble Method of Feature Gene Selection Based on Recursive Partition-tree[J].Chinese Journal of Computers,2004,7(5):675-682(in Chinese) 李霞,张田文,郭政,等.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,7(5):675-682
[17] Sun L,Han C Z,Shen J J,et al.Generalized Rough Set Method for Ensemble Feature Selection and Multiple Classifier Fusion[J].Acta Automatica Sinica,2008,4(3):298-304(in Chinese) 孙亮,韩崇昭,沈建京,等.集成特征选择的广义粗集方法与多分类器融合[J].自动化学报,2008,4(3):298-304
[18] Pan W B,Cheng G,Guo,X J,et al.On Embedded Feature Selection Using Selective Ensemble for Network Traffic[J].Chinese Journal of Computers,2014,7(10):2128-2138(in Chinese) 潘吴斌,程光,郭晓军,等.基于选择性集成策略的嵌入式网络流特征选择[J].计算机学报,2014,7(10):2128-2138
[19] Scholkopf B,Smola A J.Learning with Kernels[M].MIT Press,2002
[20] Wolberg W H,Mangasarian O L.Multisurface method of pattern separation for medical diagnosis applied to breast cytology[J].Proceedings of the National Academy of Sciences, 1990,87(23):9193-9196

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .