计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 72-81.doi: 10.11896/jsjkx.221000143

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于对偶流形重排序的无监督特征选择算法

梁云辉1,2, 甘舰文1,2, 陈艳3, 周芃4, 杜亮1,2   

  1. 1 山西大学计算机与信息技术学院 太原 030006
    2 山西大学大数据科学与产业研究院 太原 030006
    3 四川大学计算机学院 成都 610065
    4 安徽大学计算机科学与技术学院 合肥 230601
  • 收稿日期:2022-10-18 修回日期:2022-11-12 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 杜亮(duliang@sxu.edu.cn)
  • 作者简介:(2350629530@qq.com)
  • 基金资助:
    国家自然科学基金面上项目(61976129,62176001)

Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

LIANG Yunhui1,2, GAN Jianwen1,2, CHEN Yan3, ZHOU Peng4, DU Liang1,2   

  1. 1 College of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China
    3 College of Computer,Sichuan University,Chendu 610065,China
    4 College of Computer Science and Technology,Anhui University,Hefei 230601,China
  • Received:2022-10-18 Revised:2022-11-12 Online:2023-07-15 Published:2023-07-05
  • About author:LIANG Yunhui,born in 1998,postgra-duate,is a member of China Computer Federation.His main research interests include data mining and feature selection.DU Liang,born in 1985,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include data mining,feature selection and clustering analysis.
  • Supported by:
    National Natural Science Foundation of China(61976129,62176001).

摘要: 在许多数据分析任务中,经常会遇到高维数据。特征选择技术旨在从原始高维数据中找到最具代表性的特征,但由于缺乏类标签信息,相比有监督场景,在无监督学习场景中选择合适的特征困难得多。传统的无监督特征选择方法通常依据某些准则对样本的特征进行评分,在这个过程中样本是被无差别看待的。然而这样做并不能完全捕捉数据的内在结构,不同样本的重要性应该是有差异的,并且样本权重与特征权重之间存在一种对偶关系,它们会互相影响。为此,提出了一种基于对偶流形重排序的无监督特征选择算法(Unsupervised Feature Selection Algorithm based on Dual Manifold Re-Ranking,DMRR),分别构建不同的相似性矩阵来刻画样本与样本、特征与特征、样本与特征的流形结构,并结合样本与特征的初始得分进行流形上的重排序。将DMRR与3种原始无监督特征选择算法以及2种无监督特征选择后处理算法进行比较,实验结果表明样本重要性信息、样本与特征之间的对偶关系有助于实现更优的特征选择。

关键词: 对偶, 流形学习, 重排序, 特征选择, 无监督学习

Abstract: High dimensional data is often encountered in many data analysis tasks.Feature selection techniques aim to find the most representative features from the original high-dimensional data.Due to the lack of class label information,it is much more difficult to select suitable features in unsupervised learning scenarios than in supervised scenarios.Traditional unsupervised feature selection methods usually score the features of samples according to certain criteria in which samples are treated indiscriminately.However,these approaches cannot capture the internal structure of data completely.The importance of different samples should vary.There is a dual relationship between weight of sample and feature that will influence each other.Therefore,an unsupervised feature selection algorithm based on dual manifold re-ranking(DMRR) is proposed in this paper.Different similarity matrices are constructed to depict the manifold structures on samples and samples,features and features,and samples and features respectively.Then manifold re-ranking is carried out by combining the initial scores of samples and features.By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms,experimental results verify that importance information of different samples and the dual relationship between sample and feature are helpful to achieve better feature selection.

Key words: Dual, Manifold learning, Re-ranking, Feature selection, Unsupervised learning

中图分类号: 

  • TP181
[1]WU X,XU X,LIU J,et al.Supervised Feature Selection withOrthogonal Regression and Feature Weighting[J].IEEE Tran-sactions on Neural Networks and Learning Systems,2020,32(5):1831-1838.
[2]ZHANG H,GONG M,NIE F,et al.Unified Dual-Label Semi-Supervised Learning with Top-k Feature Selection[J].Neurocomputing,2022,501:875-888.
[3]CHANDRASHEKAR G,SAHIN F.A Survey on Feature Selection Methods[J].Computers & Electrical Engineering,2014,40(1):16-28.
[4]ZHU P,HOU X,WANG Z,et al.Compactness Score:A Fast Filter Method for Unsupervised Feature Selection[J].arXiv:2201.13194,2022.
[5]KOHAVI R,JOHN G H.Wrappers for Feature Subset Selec-tion[J].Artificial Intelligence,1997,97(1/2):273-324.
[6]ZHAO H,DU L,WEI J,et al.Local Sensitive Dual Concept Factorization for Unsupervised Feature Selection[J].IEEE Access,2020,8:133128-133143.
[7]ZHANG Y,ZHANG Z,LI S,et al.Unsupervised NonnegativeAdaptive Feature Extraction for Data Representation[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2423-2440.
[8]LI J,CHENG K,WANG S,et al.Feature Selection:A Data Pers-pective[J].ACM Computing Surveys(CSUR),2017,50(6):1-45.
[9]HE X,CAI D,NIYOGI P.Laplacian Score for Feature Selection[C]//Advances in Neural Information Processing Systems.2005:507-514.
[10]YAO C,LIU Y F,JIANG B,et al.LLE Score:A New Filter-Based Unsupervised Feature Selection Method Based on Nonli-near Manifold Embedding and Its Application to Image Recognition[J].IEEE Transactions on Image Processing,2017,26(11):5257-5269.
[11]LIM H,KIM D W.Pairwise Dependence-Based UnsupervisedFeature Selection[J].Pattern Recognition,2021,111:107663.
[12]CAI D,ZHANG C,HE X.Unsupervised Feature Selection for Multi-Cluster Data[C]//Proceedings of The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:333-342.
[13]HE X,JI M,ZHANG C,et al.A Variance Minimization Crite-rion to Feature Selection Using Laplacian Regularization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(10):2013-2025.
[14]WANG D,NIE F,HUANG H.Feature Selection via Global Redundancy Minimization[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(10):2743-2755.
[15]NIE F,YANG S,ZHANG R,et al.A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization[J].IEEE Transactions on Image Processing,2018,28(5):2428-2438.
[16]KRIEGEL H P,KRÖGER P,SANDER J,et al.Density-Based Clustering[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2011,1(3):231-240.
[17]LI C,WANG X,DONG W,et al.Joint Active Learning with Feature Selection via Cur Matrix Decomposition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(6):1382-1396.
[18]XIA R,PAN Z,XU F.Instance Weighting for Domain Adaptation via Trading off Sample Selection Bias and Variance[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.2018:13-19.
[19]GU Q,ZHOU J.Co-Clustering on Manifolds[C]//Proceedings of The 15th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.2009:359-368.
[20]SONG K,YAO X,NIE F,et al.Weighted Bilateral K-means Algorithm for Fast Co-Clustering and Fast Spectral Clustering[J].Pattern Recognition,2021,109:107560.
[21]LUO M,NIE F,CHANG X,et al.Adaptive Unsupervised Feature Selection with Structure Regularization[J].IEEE Transactions on Neural Networks and Learning Systems,2017,29(4):944-956.
[22]LI J,CHEN J,QI F,et al.Two-Dimensional Unsupervised Feature Selection via Sparse Feature Filter[J/OL].IEEE Transactions on Cybernetics,2022.https://www.researchgate.net/publication/359887581_Two-Dimensional_Unsupervised_Feature_Selection_via_Sparse_Feature_Filter.
[23]LI W,CHEN H,LI T,et al.Unsupervised Feature Selection via Self-Paced Learning and Low-Redundant Regularization[J].Knowledge-Based Systems,2022,240:108150.
[24]WAHID A,KHAN D M,HUSSAIN I,et al.Unsupervised Feature Selection with Robust Data Reconstruction(UFS-RDR) and Outlier Detection[J].Expert Systems with Applications,2022,201:117008.
[25]XIE J,WANG M,XU S,et al.The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis[J].Frontiers in Genetics,2021,12:684100.
[26]BAO F,LIN Y,LI Y,et al.An Online Multi Tag Stream Feature Selection Algorithm Combining Neighborhood Information and Tag Correlation [J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2023,35(1):79-89.
[27]LV X L,DU L,ZHOU P,et al.Unsupervised Feature Selection Algorithm Framework Based on Neighborhood Interval Distur-bance Fusion [J].Journal of Nanjing University of Science and Technology,2021,45(4):420-428.
[28]HAN K,WANG Y,ZHANG C,et al.Autoencoder Inspired Unsupervised Feature Selection[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(IC-ASSP).IEEE,2018:2941-2945.
[29]BEIRANVAND F,MEHRDAD V,DOWLATSHAHI M B.Unsupervised Feature Selection for Image Classification:A Bipartite Matching-Based Principal Component Analysis Approach[J].Knowledge-Based Systems,2022,250:109085.
[30]ZHAO H,LI Q,WANG Z,et al.Joint Adaptive Graph Learning and Discriminative Analysis for Unsupervised Feature Selection[J].Cognitive Computation,2022,14(3):1211-1221.
[31]MIAO J,YANG T,SUN L,et al.Graph Regularized LocallyLinear Embedding for Unsupervised Feature Selection[J].Pattern Recognition,2022,122:108299.
[32]MACQUEEN J.Classification and Analysis of Multivariate Observations[C]//5th Berkeley Symposium on Mathematical,Statistics and Probability.1967:281-297.
[33]LI X,ZHANG H,WANG R,et al.Multiview Clustering:AScalable and Parameter-Free Bipartite Graph Fusion Method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):330-344.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!