Computer Science ›› 2023, Vol. 50 ›› Issue (6): 159-166.doi: 10.11896/jsjkx.220500169

• Database & Big Data & Data Science • Previous Articles     Next Articles

Filtered Feature Selection Algorithm Based on Persistent Homology

YIN Xingzi, PENG Ningning, ZHAN Xueyan   

  1. School of Science,Wuhan University of Technology,Wuhan 430070,China
  • Received:2022-05-18 Revised:2022-08-11 Online:2023-06-15 Published:2023-06-06
  • About author:YIN Xingzi,born in 1998,postgraduate.Her main research interest is topology data analysis.PENG Ningning,born in 1985,associate professor,Ph.D.His main research interests include mathematical logic,computability theory and topology data analysis.
  • Supported by:
    National Natural Science Foundation of China(11701438).

Abstract: The existing filtering feature selection algorithm ignores the correlation between features.This paper proposes a new filtering feature selection algorithm —the feature selection algorithm based on persistent homology(Rel-Betti algorithm),which can consider the correlation between features and the combined effect.This paper gives a new definition named by relevant Betti numbers,which can filter out the important topological features in the dataset.The algorithm first preprocesses the data set,classifies the data set according to the class labels,calculates the relevant Betti numbers in different classes,obtains the feature mean of the data information,and uses the feature mean difference to sort the importance of the features.Using eight data in UCI,the algorithm is compared with other common algorithms under four learning models:decision tree,random forest,K-nearest neighbor and support vector machine.Experimental results show that the Rel-Betti algorithm is an effective method that can improve classification accuracy and F1 value,and does not depend on a specific machine learning model.

Key words: Feature selection, Persistent homology, Barcode, Betti number, Machine learning

CLC Number: 

  • O189.22
[1]SHI Q J,PAN F,LONG F H,et al.A Review of Research on Feature Selection Methods[J].Microelectronics and Computers,2022,39(3):1-8.
[2]YU L,LIU H.Efficient feature selection via analysis of rele-vance and redundancy[J].The Journal of Machine Learning Research,2004,5:1205-1224.
[3]DROTÁR P,GAZDA J,SMÉKAL Z.An experimental comparison of feature selection methods on two-class biomedical datasets[J].Computers in Biology and Medicine,2015,66:1-10.
[4]BOMMERT A,SUN X,BISCHL B,et al.Benchmark for filter methods for feature selection in high-dimensional classification data[J].Computational Statistics & Data Analysis,2020,143:106839.
[5]JI Z W,HU M.A Double Filtering Feature Selection Algorithm [J].Computer Engineering and Applications,2011,47(19):190-193,206.
[6]XU Y,HU X G,LI P P.A Filtered Feature Selection Algorithm Based on Group Policy[J].Application Research of Computers,2016,33(5):1322-1326.
[7]SINGH N,SINGH P.A hybrid ensemble-filter wrapper feature selection approach for medical data classification[J].Chemome-trics and Intelligent Laboratory Systems,2021,217:104-131.
[8]PRISCILLA C V,PRABHA D P.A two-phase feature selection technique using mutual information and XGB-RFE for credit card fraud detection[J].International Journal of Advanced Technology and Engineering Exploration,2021,8(85):1656.
[9]PASHAEI E,PASHAEI E.Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data[J].The Journal of Supercomputing,2022:78(13):15598-15637.
[10]YANG Y K.Research on filtering feature selection algorithm based on mutual information[D].Changchun:Jilin University,2022.
[11]LUM P Y,SINGH G,LEHMAN A,et al.Extracting insights from the shape of complex data using topology[J].Scientific Reports,2013,3(1):1-8.
[12]ZOMORODIAN A,CARLSSON G.Computing persistent ho-mology[C]//Proceedings of the twentieth Annual Symposium on Computational Geometry.2004:347-356..
[13]CARLSSON G.Topology And Data[J].Bulletin of the American Mathematical Society,2009,46(2):255-308.
[14]LOCKWOOD S,KRISHNAMOORTHY B.Topological Fea-tures in Cancer Gene Expression Data[J].Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing,2015,20(7):108-119.
[15]CANG Z X,MU L,WEI G W,et al.Representability of alge-braic topology for biomolecules in machine learning based scoring and virtual screening[J].Plos Computational Biology,2018,14(1):e1005929.
[16]CHI S P,XIA K,SI X L.Persistent-Homology-based Machine Learning and its Applications-A Survey[J].Artificial Intelligence Review,2022,55(7):5169-5213.
[17]OTTER N,PORTER M A,TILLMANN U,et al.A roadmap for the computation of persistent homology[J].Epj Data Science,2017,6(1):1-38.
[18]GHRIST R.Barcodes:The persistent topology of data[J].Bulletin of the American Mathematical Society,2008,45(1):61-75.
[19]GAO B L,ZHOU Z G,YANG W W,et,al.A Feature Selection Method Combining Category-Based and Improved CHI[J].Application Research of Computers,2018,35(6):1660-1662.
[20]WANG Y R,TANG M.Evaluation of side channel leakage basedon Bartlett and multi-class F-test [J].Journal of Communications,2021,42(12):35-43.
[21]WU Y,LIU Y H,YANG W W,et,al.Feature Selection Method Based on Nearest Farthest Neighbor and Mutual Information[J].Research of Computers 2017,34(12):3713-3716.
[22]WANG W Y,LIU C,ZHAO Q,et,al.Direct Verified Encapsulated Feature Selection Method [J].University of Electronic Science and Technology of China Journal,2016,45(4):607-615.
[1] HUANG Yuhang, SONG You, WANG Baohui. Improved Forest Optimization Feature Selection Algorithm for Credit Evaluation [J]. Computer Science, 2023, 50(6A): 220600241-6.
[2] WANG Xiya, ZHANG Ning, CHENG Xin. Review on Methods and Applications of Text Fine-grained Emotion Recognition [J]. Computer Science, 2023, 50(6A): 220900137-7.
[3] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
[4] WANG Jinjin, CHENG Yinhui, NIE Xin, LIU Zheng. Fast Calculation Method of High-altitude Electromagnetic Pulse Environment Based on Machine Learning [J]. Computer Science, 2023, 50(6A): 220500046-5.
[5] SUN Lin, LI Mengmeng, XU Jiucheng. Binary Harris Hawk Optimization and Its Feature Selection Algorithm [J]. Computer Science, 2023, 50(5): 277-291.
[6] CHEN Jinjie, HE Chao, XIAO Xiao, LEI Yinjie. Optical Performance Monitoring Method Based on Fine-grained Constellation Diagram Recognition [J]. Computer Science, 2023, 50(4): 220-225.
[7] PENG Yuefeng, ZHAO Bo, LIU Hui, AN Yang. Survey on Membership Inference Attacks Against Machine Learning [J]. Computer Science, 2023, 50(3): 351-359.
[8] XU Xia, ZHANG Hui, YANG Chunming, LI Bo, ZHAO Xujian. Fair Method for Spectral Clustering to Improve Intra-cluster Fairness [J]. Computer Science, 2023, 50(2): 158-165.
[9] CHEN Depeng, LIU Xiao, CUI Jie, HE Daojing. Survey of Membership Inference Attacks for Machine Learning [J]. Computer Science, 2023, 50(1): 302-317.
[10] WANG Yitan, WANG Yishu, YUAN Ye. Survey of Learned Index [J]. Computer Science, 2023, 50(1): 1-8.
[11] CHEN Yijun, GAO Haoran, DING Zhijun. Credit Evaluation Model Based on Dynamic Machine Learning [J]. Computer Science, 2023, 50(1): 59-68.
[12] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[13] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[14] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[15] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!