计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 37-47.doi: 10.11896/jsjkx.230600038
曹栋涛1, 舒文豪1, 钱进2
CAO Dongtao1, SHU Wenhao1, QIAN Jin2
摘要: 特征选择可以有效地去除高维数据中的冗余和不相关的特征,保留重要的特征,从而降低模型计算的复杂性,提高模型精度。在特征选择过程中,针对数据中存在的离群点和边界点等可能影响分类效果的噪声数据,提出了基于粗糙集与密度峰值聚类的特征选择方法。首先,通过密度峰值聚类方法去除噪声数据,并挑出簇类中心;然后,结合粗糙集理论的思想,按簇类中心划分数据,并根据同一簇类的点应具有相同标签的假设,定义特征重要性评价指标;最后,设计了一种启发式特征选择算法,用于挑选出使簇类结构纯度更高的特征子集。在6个UCI数据集上,与其他算法进行了分类精度、特征选择个数和运行时间的对比实验,实验结果验证了所提算法的有效性和高效性。
中图分类号:
[1]JING Y G,JING L X,WANG B L,et al.Incremental attribute reduction algorithm for attribute values and attribute changes[J].Journal of Shandong University:Science Edition,2020,55(1):62-68. [2]WANG C Z,HUANG Y,SHAO M W,et al.Feature SelectionBased on Neighborhood Self-Information[J].IEEE Transactions on Cybernetics,2020,50(9):4031-4042. [3]WANG Q,QIAN Y H,LIANG X Y,et al.Local neighborhood rough set[J].Knowledge-Based Systems,2018,153:53-64. [4]WANG D,CHEN H M,LI T R,et al.A novel quantum grasshopper optimization algorithm for feature selection[J].International Journal of Approximate Reasoning,2020,127:33-53. [5]PAWLAK Z.Rough set[J].International Journal of Computer and Information Sciences,1982,11(5):341-356. [6]LIU Y,CHENG L,SUN L.Feature selection method based on K-S test and neighborhood rough set[J].Journal of Henan Normal University:Natural Science Edition,2019,47(2):21-28. [7]XUE Z A,PANG W L,YAO S Q,et al.Intuitionistic fuzzy three-branch decision-making model based on prospect theory[J].Journal of Henan Normal University:Natural Science Edition,2020,48(5):31-36,79. [8]YANG X L,CHEN H M,LI T R,et al.Neighborhood rough setswith distance metric learning for feature selection[J].Know-ledge-Based Systems,2021,224:107076. [9]MARIELLO A,BATTITI R.Feature Selection Based on theNeighborhood Entropy[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):6313-6322. [10]WANG C Z,HE Q,SHAO M W,et al.Feature selection based on maximal neighborhood discernibility[J].International Journal of Machine Learning & Cybernetics,2019,9(11):1929-1940. [11]HU Q H,ZHAO H,YU D R.Fast reduction algorithm of symbolic and numerical attributes based on neighborhood rough sets[J].Pattern Recognition and Artificial Intelligence,2008,21(6):730-738. [12]SHENG K,WANG W,BIAN X F,et al.Neighborhood discrimination incremental attribute reduction algorithm for mixed data[J].Acta Electronica,2020,48(4):682-696. [13]RODRIGUEZ A,LAIO A.Clustering by fast searchand find of density peaks[J].Science,2014,344(6191):1492-1496. [14]ZOU X H,YE X D,TAN Z Y.A color image segmentationmethod based on density peak clustering[J].Microcomputer System,2017,38(4):868-871. [15]HUANG L,LI Y,WANG G S,et al.Community discoverymethod based on point distance and density peak clustering[J].Journal of Jilin University:Engineering Edition,2016,46(6):2042-2051. [16]DU M,DING S,XU X,et al.Density peaks clustering using geodesic distances[J].International Journal of Machine Learning & Cybernetics,2018,9(8):1355-1349. [17]BIAN Z K,CHUNG F L,WANG S T.Fuzzy Density Peaks Clustering[J].IEEE Transactions on Fuzzy Systems,2021,29(7):1725-1738. [18]LIU R,HUANG W,FEI Z,et al.Constraint-based clustering by fast search and find of density peaks[J].Neurocomputing,2019,330:223-237. [19]XUE X N,GAO S P,PENG H M,et al.Density peak clusteringalgorithm based on K nearest neighbor and multi-class merging[J].Journal of Jilin University:Science Edition,2019,57(1):111-120. [20]Rosetta:A rough set toolkit for analysis of data [OL].http://www.lcb.uu.se/tools/rosetta/index.php. [21]HU Q H,YU D R,LIU J F,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594. |
|