计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 53-59.doi: 10.11896/jsjkx.200700211
杨蕾, 降爱莲, 强彦
YANG Lei, JIANG Ai-lian, QIANG Yan
摘要: 高维数据中存在着大量的冗余和不相关特征,严重影响了数据挖掘的效率、质量以及机器学习算法的泛化性能,因此特征选择成为计算机科学与技术领域的重要研究方向。文中利用自编码器的非线性学习能力提出了一种无监督特征选择算法。首先,基于自编码器的重建误差选择出单个特征对数据重建贡献大的特征子集。其次,利用单层自编码器的特征权重最终选择出对其他特征重建贡献大的特征子集,通过流形正则保持原始数据空间的局部与非局部结构,并且对特征权重增加L2/1稀疏正则来提高特征权重的稀疏性,使之选择出更具区别性的特征。最后,构造一个新的目标函数,并利用梯度下降算法对所提目标函数进行优化。在6个不同类型的典型数据集上进行实验,并将所提算法与5个常用的无监督特征选择算法进行对比。实验结果验证了所提算法能够有效地选择出重要特征,显著提高了分类准确率和聚类准确率。
中图分类号:
[1]DPATIL M,SANE S S.Dimension Reduction:A Review[J].International Journal of Computer Applications,1999,92(16):23-29. [2]DY J G,BRODLEY C E,KAK A,et al.Unsupervised feature selection applied to content-based retrieval of lung images[C]//IEEE Trans.Pattern Anal.Mach.Intell.,2003(25):373-378. [3]TANG J,LIU H.Unsupervised feature selection for linked social media data[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.2012:904-912. [4]CAI D,ZHANG C,HE X.Unsupervised feature selection formulti-cluster data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:333-342. [5]DY J G,BRODLEY C E.Feature Selection for UnsupervisedLearning[C]//International Conference on Neural Information Processing.2012:845-889. [6]ZHU P,ZUO W,ZHANG L,et al.Unsupervised feature selection by regularized self-representation[J].Pattern Recognition,2015,48(2):438-446. [7]WANG W,ZHANG H,ZHU P,et al.Non-convex Regularized Self-representation for Unsupervised Feature Selection[M]//Intelligence Science and Big Data Engineering.Big Data and Machine Learning Techniques.Springer International Publishing,2015. [8]TANG C,LIU X,LI M,et al.Robust unsupervised feature selection via dual self-representation and manifold regularization[J].Knowledge-Based Systems,2018,145(1):109-120. [9]LI Y,LEI C,FANG Y,et al.Unsupervised feature selection by combining subspace learning with feature self-representation[J].Pattern Recognition Letters,2017,109(15):35-43. [10]WANG Z Y,JIANG A L,et al.Unsupervised feature selection method based on regularized mutual representation[J].Chinese Journal of Computer Applications,2020,40(7):1896-1900. [11]HAN K,WANG Y,ZHANG C,et al.Autoencoder InspiredUnsupervised Feature Selection[C]//International Conference on Acoustics,Speech and Signal Processing (ICASSP).2017:2941-2945. [12]FENG S W,DUARTE M F.Graph autoencoder-based unsupervised feature selection with broad and local data structure pre-servation[J].Neurocomputing,2018,312(27):310-323. [13]TAHERKHANI A,COSMA G,MCGINNITY T M.Deep-FS:A feature selection algorithm for Deep Boltzmann Machines[J].Neurocomputing,2018,322(17):22-37. [14]SHARIFIPOUR S,FAYYAZI H.Unsupervised feature selection ranking and selection based on autoencoders[C]//IEEE,ICASSP.2019. [15]CHANG T,MEIRU B,LIU X W.Unsupervised feature selection via latent representation learning and manifold regularization[J].Neural Networks,2019,117:163-178. [16]LIU X,WANG L,ZHANG J,et al.Global and Local Structure Preservation for Feature Selection[J].IEEE Transactions on Neural Networks & Learning Systems,2014,25(6):1083-1095. [17]HE X,CAI D,NIYOGI P.Laplacian score for feature selection[C]//Advances in Neural Information Processing Systems.2006:507-514. [18]CAI D,ZHANG C Y,HE X F.Unsupervised feature selection for Multi-Cluster data[C]//Acm Sigkdd International Confe-rence on Knowledge Discovery & Data Mining.ACM,2010. [19]NIE F,ZHU W,LI X.Unsupervised Feature Selection withStructured Graph Optimization[C]//Thirtieth AAAI Confe-rence on Artificial Intelligence.AAAI Press,2016. [20]WANG S,TANG J,LIU H.Embedded Unsupervised Feature Selection[C]//Proceedings of the Twenty-Ninth AAAI Confe-rence on Arttificial Intelligence.2015:470-476. [21]ZHOU N,XU Y,CHENG H,et al.Global and local structure preserving sparse subspace learning:An iterative approach to unsupervised feature selection[J].Pattern Recognition,2016,53:87-101. [22]YU J.Manifold regularized stacked denoising autoencoders with feature selection[J].Neurocomputing,2019,358(17):235-245. [23]HU R,ZHU X,CHENG D,et al.Graph self-representationmethod for unsupervised feature selection[J].Neurocomputing,2017,220(12):130-137. [24]GLOROT X,BENGIO Y.Understanding the difficulty of trai-ning deep feedforward neural networks[J].Journal of Machine Learning Research,2010,9:249-256. |
[1] | 王冠宇, 钟婷, 冯宇, 周帆. 基于矢量量化编码的协同过滤推荐方法 Collaborative Filtering Recommendation Method Based on Vector Quantization Coding 计算机科学, 2022, 49(9): 48-54. https://doi.org/10.11896/jsjkx.210700109 |
[2] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[3] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[4] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[5] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[6] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[7] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[8] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[9] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[10] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[11] | 唐雨潇, 王斌君. 基于深度生成模型的人脸编辑研究进展 Research Progress of Face Editing Based on Deep Generative Model 计算机科学, 2022, 49(2): 51-61. https://doi.org/10.11896/jsjkx.210400108 |
[12] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[13] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[14] | 张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059 |
[15] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
|