计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 159-166.doi: 10.11896/jsjkx.220500169
殷杏子, 彭宁宁, 詹学燕
YIN Xingzi, PENG Ningning, ZHAN Xueyan
摘要: 现有的过滤式特征选择算法忽略了特征之间的关联性。鉴于此,提出了一种新的过滤式特征选择算法——基于持续同调的特征选择算法(Rel-Betti算法),该算法能够识别特征之间的关联性以及组合效果。通过提出相关贝蒂数概念,筛选出数据集中重要的拓扑特征信息。该算法对数据集进行预处理后,根据类标签将数据集分类,计算不同类中的相关贝蒂数,获得数据信息的特征均值,按特征均值差值大小对特征进行重要性排序。利用UCI数据集中的8个数据,将该算法与其他常见算法在决策树、随机森林、K近邻和支持向量机这4种学习模型下进行比较实验。结果表明,该算法是一种有效的特征选择算法,其能够提高分类的准确率和F1值,并且不依赖于特定的机器学习模型。
中图分类号:
[1]SHI Q J,PAN F,LONG F H,et al.A Review of Research on Feature Selection Methods[J].Microelectronics and Computers,2022,39(3):1-8. [2]YU L,LIU H.Efficient feature selection via analysis of rele-vance and redundancy[J].The Journal of Machine Learning Research,2004,5:1205-1224. [3]DROTÁR P,GAZDA J,SMÉKAL Z.An experimental comparison of feature selection methods on two-class biomedical datasets[J].Computers in Biology and Medicine,2015,66:1-10. [4]BOMMERT A,SUN X,BISCHL B,et al.Benchmark for filter methods for feature selection in high-dimensional classification data[J].Computational Statistics & Data Analysis,2020,143:106839. [5]JI Z W,HU M.A Double Filtering Feature Selection Algorithm [J].Computer Engineering and Applications,2011,47(19):190-193,206. [6]XU Y,HU X G,LI P P.A Filtered Feature Selection Algorithm Based on Group Policy[J].Application Research of Computers,2016,33(5):1322-1326. [7]SINGH N,SINGH P.A hybrid ensemble-filter wrapper feature selection approach for medical data classification[J].Chemome-trics and Intelligent Laboratory Systems,2021,217:104-131. [8]PRISCILLA C V,PRABHA D P.A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection[J].International Journal of Advanced Technology and Engineering Exploration,2021,8(85):1656. [9]PASHAEI E,PASHAEI E.Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data[J].The Journal of Supercomputing,2022:78(13):15598-15637. [10]YANG Y K.Research on filtering feature selection algorithm based on mutual information[D].Changchun:Jilin University,2022. [11]LUM P Y,SINGH G,LEHMAN A,et al.Extracting insights from the shape of complex data using topology[J].Scientific Reports,2013,3(1):1-8. [12]ZOMORODIAN A,CARLSSON G.Computing persistent ho-mology[C]//Proceedings of the twentieth Annual Symposium on Computational Geometry.2004:347-356.. [13]CARLSSON G.Topology And Data[J].Bulletin of the American Mathematical Society,2009,46(2):255-308. [14]LOCKWOOD S,KRISHNAMOORTHY B.Topological Fea-tures in Cancer Gene Expression Data[J].Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing,2015,20(7):108-119. [15]CANG Z X,MU L,WEI G W,et al.Representability of alge-braic topology for biomolecules in machine learning based scoring and virtual screening[J].Plos Computational Biology,2018,14(1):e1005929. [16]CHI S P,XIA K,SI X L.Persistent-Homology-based Machine Learning and its Applications-A Survey[J].Artificial Intelligence Review,2022,55(7):5169-5213. [17]OTTER N,PORTER M A,TILLMANN U,et al.A roadmap for the computation of persistent homology[J].Epj Data Science,2017,6(1):1-38. [18]GHRIST R.Barcodes:The persistent topology of data[J].Bulletin of the American Mathematical Society,2008,45(1):61-75. [19]GAO B L,ZHOU Z G,YANG W W,et,al.A Feature Selection Method Combining Category-Based and Improved CHI[J].Application Research of Computers,2018,35(6):1660-1662. [20]WANG Y R,TANG M.Evaluation of side channel leakage basedon Bartlett and multi-class F-test [J].Journal of Communications,2021,42(12):35-43. [21]WU Y,LIU Y H,YANG W W,et,al.Feature Selection Method Based on Nearest Farthest Neighbor and Mutual Information[J].Research of Computers 2017,34(12):3713-3716. [22]WANG W Y,LIU C,ZHAO Q,et,al.Direct Verified Encapsulated Feature Selection Method [J].University of Electronic Science and Technology of China Journal,2016,45(4):607-615. |
|