计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 16-21.
黄铉
HUANG Xuan
摘要: 数据特征的质量会直接影响模型的准确度。在模式识别领域,特征降维技术一直受到研究者们的关注。随着大数据时代的到来,数据量巨增,数据维度不断升高。在处理高维数据时,传统的数据挖掘方法的性能降低甚至失效。实践表明,在数据分析前先对其特征进行降维是避免“维数灾难”的有效手段。降维技术在各领域被广泛应用,文中详细介绍了特征提取和特征选择两类不同的降维方法,并对其特点进行了比较。通过子集搜索策略和评价准则两个关键过程对特征选择中最具代表性的算法进行了总结和分析。最后从实际应用出发,探讨了特征降维技术值得关注的研究方向。
中图分类号:
[1]SHEIK A.A Survey on Evolutionary Techniques for Feature Selection[C]∥IEEE Conference on Emerging Devices and Smart Systems.Tiruchengode India:IEEE Press,2017.<br /> [2]SAMINA K,TEHMINA K.A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning [C]∥Scien-ce and Information Conference.London:IEEE Press,2014:372-378.<br /> [3]JOLLIFFE I T.Principal component analysis[M].Berlin: Springer-Verlag,1986.<br /> [4]DUDA R O,HART P E,STORK D G.Pattern Classification(2nd Edition) ∥En Broeck the Statistical Mechanics of Learning Rsity.2000:32-39.<br /> [5]COMON P.Independent component analysis,a new concept [J].Signal Processing,1994,36(3):287-314.<br /> [6]BRONSTEIN A M,BRONSTEIN M M,KIMMEL R.Genera- lized multidimensional scaling:a framework for isometry-inva-riant partial surface matching [J].Proceedings of the National Academy of Sciences of the United States of America,2006,103(5):1168-1172.<br /> [7]WANG J Y.Geometric structure of high-dimensional data and dimensionality reduction[M].New York:Springer Heidelberg,2011:131-147.<br /> [8]SCHLKOPF B,SMOLA A,MULLER K R.Nonlinear Component Analysis as a Kernel Eigenvalue Problem [J].Neural Computation,1998,10(5):1299-1319.<br /> [9]MIKA S,R TSCH E,WESTON J,et al.Fisher Discriminant Analysis with Kernels ∥Proceedings of IEEE Workshop Neural Networks for Signal Processing.1999:41-48.<br /> [10]WEINBERGER K Q,SAUL L K.Unsupervised learning of ima- ge manifolds by semidefinite programming .International Journal of Computer Vision,2006,70(1):77-90.<br /> [11]TENENBAUM J B,SILVA V,UNGFORD J C.A global geometric framework for nonlinear dimensionality reduction [J].Science,2000,290(12):2319-2323.<br /> [12]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding [J].Science,2000,290(5500):2323-2326.<br /> [13]BELKIN M.Problems of learning on manifolds[D].Chicago: The University of Chicago,2003.<br /> [14]HE X F,NIYOGI P.Locality preserving projections[C]∥Advances in Neural Information Processing Systems 16.Vancouver,Canada:MIT Press,2003:153.<br /> [15]DONOHO D L,GRIMES C.Hessian Eigenmaps:New Locally Linear Embedding Techniques for High-dimensional Data .Proceedings of the National Academy of Sciences of the Unite States of America,2003,100(10):5591-5596.<br /> [16]MOALLEN P,AYOUGHI S A.Removing potential flat spots on error surface of multilayer perceptron (MLP) neural networks [J].International Journal of Computer Mathematics,2011,88(1/3):21-36.<br /> [17]JUNCHIN A,ANDRI M.Supervised,Unsupervised,and Semi-Supervised Feature Selection:A Review on Gene Selection [J].Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989.<br /> [18]SUN Z H,GEORGE B,RONALD M.Object detection using feature subset selection [J].Pattern Recognition,2004,37(11):2165-2176.<br /> [19]CAI Z Y,YU J G,LI X P,et al.Feature selection algorithm based on kernel distance measure[J].Pattern Recognition and Artificial Intelligence,2010,23(2):235-240.<br /> [20]PUDIL P,NOVOVICOVA J,KITTLER J.Floating Search Me- thods in Feature Selection[J].Pattern Recognition Letters,1994,15(11):1119-1125.<br /> [21]LIU H,YU L.Toward integrating feature selection algorithms for classification and clustering .IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.<br /> [22] KOLLER D,SAHAMI M.Toward optimal feature selection∥ Thirteenth International Conference on International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1996:284-292.<br /> [23]MITRA P,MURTHY C A,SANKAR K P.Unsupervised feature selection using feature similarity .IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(3):301-312.<br /> [24]GUYON I,WESTON J,BARNHILL S,et al.Gene selection for cancer classification using support vector machines [J].Machine Learning,2002,46(1):389-422.<br /> [25]YANG J B,ONG C J.Feature selection for support vector regression using probabilistic prediction[C]∥16 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:343-352.<br /> [26]SHEN K Q,CHONG C J,LI X P,et al.Feature selection via sensitivity analysis of SVM probabilistic outputs[J].Machine Learning,2008,70(1):1-20.<br /> [27]FORMAN G.An extensive empirical study of feature selection metrics for text classification [J].Journal of Machine Learning Research,2003,3:1289-1305.<br /> [28]NG A Y.Feature selection,L1 vs. L2 regularization, and rotational invariance ∥Proceedings of the Twenty-first International Conference on Machine Learning.New York:ACM,2004:78.<br /> [29]MANGASARIAN O L,WILD E W.Feature Selection for Nonlinear Kernel Support Vector Machines [C]∥Seventh IEEE International Conference on Data Mining-workshops.2007:231-236.<br /> [30]WANG L F,SHEN X T.Multi-category support vector ma- chines,feature selection and solution path.Statistica Sinica,2006,16(2):617- 633.<br /> [31]LEUNG Y,HUNG Y.A multiple-filter-multiple-wrapper ap- proach to gene selection and microarray data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2010,7(1):108-117.<br /> [32]LAZAR C,TAMINAU J,MEGANCK S,et al.A survey on filter techniques for feature selection in gene microarray analysis .IEEE/ACM Transactions on computational Biology and Bioinformatics,2012,9(4):1106-1119.<br /> [33]SHEN Q,DIAO R,SU P.Feature Selection Ensemble∥ Turing.2012:289-306.<br /> [34]LI G Z,YANG J Y.Feature selection for ensemble learning and its application∥Machine Learning in Bioinformatics.2008:135-155.<br /> [35]PENG Y H,WU Z Q,JIANG J M.A novel feature selection approach for biomedical data classification .Journal of Biomedi-cal Informatics,2010,43(1):15-23.<br /> [36]CHIN A J,MIRZAL A,et al.Supervised Unsupervised,and Semi-Supervised Feature Selection:A Review on Gene Selection[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989.<br /> [37]OPITZ D W.Feature Selection for Ensembles∥Proceedings of National Conference on Artificial Intelligence.Orlando,FL,1999:379-384.<br /> [38]ABEEL T,HELLEPUTTE T,VAN D P Y,et al.Robust biomarker identification for cancer diagnosis with ensemble feature selection methods .IEEE/ACM Transactions on computational Biology and Bioinformatics,2010,26(3):392-398.<br /> [39]WONG H S,ZHANG S,SHEN Y,et al.A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity.IEEE/ACM Transactions on Computational Biology & Bioinformatics,2012,9(4):1257-1263.<br /> [40]张靖,胡学钢,张玉红,等.K-split Lasso:有效的肿瘤特征基因选择方法.计算机科学与探索,2012,6(12):1136-1143.<br /> [41] JIN L L,LIANG H.Deep Learning for Underwater Image Re- cognition in Small Sample Size Situations [C]∥IEEE Conference on Oceans.Aberdeen UK:IEEE Press,2017.<br /> [42]HINTON G.Reducing the Dimensionality of Data with Neural Networks [J].Science,2016,313(5786):504-507.<br /> [43]孙志远,鲁成祥,史忠植,等.深度学习研究与进展.计算机科学,2016,43(2):1-8. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[3] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[4] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[5] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[6] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[7] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[8] | 高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148 |
[9] | 杨辉, 陶力宏, 朱建勇, 聂飞平. 基于锚点的快速无监督图嵌入 Fast Unsupervised Graph Embedding Based on Anchors 计算机科学, 2022, 49(4): 116-123. https://doi.org/10.11896/jsjkx.210200098 |
[10] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[11] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[12] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[13] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[14] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[15] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
|