Computer Science ›› 2023, Vol. 50 ›› Issue (10): 37-47.doi: 10.11896/jsjkx.230600038

• Granular Computing & Knowledge Discovery • Previous Articles     Next Articles

Feature Selection Algorithm Based on Rough Set and Density Peak Clustering

CAO Dongtao1, SHU Wenhao1, QIAN Jin2   

  1. 1 School of Information Engineering,East China Jiaotong University,Nanchang 330013,China
    2 School of software,East China Jiaotong University,Nanchang 330013,China
  • Received:2023-06-04 Revised:2023-07-28 Online:2023-10-10 Published:2023-10-10
  • About author:CAO Dongtao,born in 1997,master.His main research interests include machine learning,data mining,rough set,etc.SHU Wenhao,born in 1985,Ph.D,associate professor,master supervisor.Her main research interests include data mining,knowledge discovery,rough set,etc.
  • Supported by:
    National Natural Science Foundation of China(62266018,61966016),Jiangxi Province Natural Science Foundation(20202BABL202037,20232ACB202013,20232BAB202052) and Jiangxi Postgraduate Innovation Fund Project(YC2022-s547).

Abstract: Feature selection can effectively remove redundant and irrelevant features from high-dimensional data and retain important features,thus reducing the complexity of model computation and improving model accuracy.While in feature selection process,to deal with these noisy data that may affect the classification effect,such as outlier points and boundary points,a feature selection method based on rough set and density peak clustering is proposed.At first,noisy data are removed by density peak clustering method and cluster class centers are picked out.Then,the data are divided by cluster class centers by combining the idea of rough set theory,and the feature importance evaluation measure is defined according to the assumption that the data points of same cluster have same label.Finally,a heuristic feature selection algorithm is designed to pick up the feature subset that can makes for a purer homogeneous cluster structure.Experimental comparisons of classification accuracy,number of selected features and running time are conducted with other algorithms on six UCI datasets,and the experimental results verify the effectiveness and efficiency of the proposed algorithm.

Key words: Feature selection, High-dimensional data, Noisy data, Rough sets, Density peak clustering

CLC Number: 

  • TP391
[1]JING Y G,JING L X,WANG B L,et al.Incremental attribute reduction algorithm for attribute values and attribute changes[J].Journal of Shandong University:Science Edition,2020,55(1):62-68.
[2]WANG C Z,HUANG Y,SHAO M W,et al.Feature SelectionBased on Neighborhood Self-Information[J].IEEE Transactions on Cybernetics,2020,50(9):4031-4042.
[3]WANG Q,QIAN Y H,LIANG X Y,et al.Local neighborhood rough set[J].Knowledge-Based Systems,2018,153:53-64.
[4]WANG D,CHEN H M,LI T R,et al.A novel quantum grasshopper optimization algorithm for feature selection[J].International Journal of Approximate Reasoning,2020,127:33-53.
[5]PAWLAK Z.Rough set[J].International Journal of Computer and Information Sciences,1982,11(5):341-356.
[6]LIU Y,CHENG L,SUN L.Feature selection method based on K-S test and neighborhood rough set[J].Journal of Henan Normal University:Natural Science Edition,2019,47(2):21-28.
[7]XUE Z A,PANG W L,YAO S Q,et al.Intuitionistic fuzzy three-branch decision-making model based on prospect theory[J].Journal of Henan Normal University:Natural Science Edition,2020,48(5):31-36,79.
[8]YANG X L,CHEN H M,LI T R,et al.Neighborhood rough setswith distance metric learning for feature selection[J].Know-ledge-Based Systems,2021,224:107076.
[9]MARIELLO A,BATTITI R.Feature Selection Based on theNeighborhood Entropy[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(12):6313-6322.
[10]WANG C Z,HE Q,SHAO M W,et al.Feature selection based on maximal neighborhood discernibility[J].International Journal of Machine Learning & Cybernetics,2019,9(11):1929-1940.
[11]HU Q H,ZHAO H,YU D R.Fast reduction algorithm of symbolic and numerical attributes based on neighborhood rough sets[J].Pattern Recognition and Artificial Intelligence,2008,21(6):730-738.
[12]SHENG K,WANG W,BIAN X F,et al.Neighborhood discrimination incremental attribute reduction algorithm for mixed data[J].Acta Electronica,2020,48(4):682-696.
[13]RODRIGUEZ A,LAIO A.Clustering by fast searchand find of density peaks[J].Science,2014,344(6191):1492-1496.
[14]ZOU X H,YE X D,TAN Z Y.A color image segmentationmethod based on density peak clustering[J].Microcomputer System,2017,38(4):868-871.
[15]HUANG L,LI Y,WANG G S,et al.Community discoverymethod based on point distance and density peak clustering[J].Journal of Jilin University:Engineering Edition,2016,46(6):2042-2051.
[16]DU M,DING S,XU X,et al.Density peaks clustering using geodesic distances[J].International Journal of Machine Learning & Cybernetics,2018,9(8):1355-1349.
[17]BIAN Z K,CHUNG F L,WANG S T.Fuzzy Density Peaks Clustering[J].IEEE Transactions on Fuzzy Systems,2021,29(7):1725-1738.
[18]LIU R,HUANG W,FEI Z,et al.Constraint-based clustering by fast search and find of density peaks[J].Neurocomputing,2019,330:223-237.
[19]XUE X N,GAO S P,PENG H M,et al.Density peak clusteringalgorithm based on K nearest neighbor and multi-class merging[J].Journal of Jilin University:Science Edition,2019,57(1):111-120.
[20]Rosetta:A rough set toolkit for analysis of data [OL].http://www.lcb.uu.se/tools/rosetta/index.php.
[21]HU Q H,YU D R,LIU J F,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[1] LIU Peigang, SUN Jie, YANG Chaozhi, LI Zongmin. Crowd Counting Based on Multi-scale Feature Aggregation in Dense Scenes [J]. Computer Science, 2023, 50(9): 235-241.
[2] LIANG Yunhui, GAN Jianwen, CHEN Yan, ZHOU Peng, DU Liang. Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking [J]. Computer Science, 2023, 50(7): 72-81.
[3] HUANG Yuhang, SONG You, WANG Baohui. Improved Forest Optimization Feature Selection Algorithm for Credit Evaluation [J]. Computer Science, 2023, 50(6A): 220600241-6.
[4] YANG Ye, WU Weizhi, ZHANG Jiaru. Optimal Scale Selection and Rule Acquisition in Inconsistent Generalized Decision Multi-scale Ordered Information Systems [J]. Computer Science, 2023, 50(6): 131-141.
[5] YIN Xingzi, PENG Ningning, ZHAN Xueyan. Filtered Feature Selection Algorithm Based on Persistent Homology [J]. Computer Science, 2023, 50(6): 159-166.
[6] YANG Jie, KUANG Juncheng, WANG Guoyin, LIU Qun. Cost-sensitive Multigranulation Approximation of Neighborhood Rough Fuzzy Sets [J]. Computer Science, 2023, 50(5): 137-145.
[7] SUN Lin, LI Mengmeng, XU Jiucheng. Binary Harris Hawk Optimization and Its Feature Selection Algorithm [J]. Computer Science, 2023, 50(5): 277-291.
[8] CHEN Yijun, GAO Haoran, DING Zhijun. Credit Evaluation Model Based on Dynamic Machine Learning [J]. Computer Science, 2023, 50(1): 59-68.
[9] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[10] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[11] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[12] XU Si-yu, QIN Ke-yun. Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices [J]. Computer Science, 2022, 49(6A): 140-143.
[13] FANG Lian-hua, LIN Yu-mei, WU Wei-zhi. Optimal Scale Selection in Random Multi-scale Ordered Decision Systems [J]. Computer Science, 2022, 49(6): 172-179.
[14] ZHAO Liang, ZHANG Jie, CHEN Zhi-kui. Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization [J]. Computer Science, 2022, 49(4): 124-133.
[15] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!