Computer Science ›› 2021, Vol. 48 ›› Issue (4): 91-96.doi: 10.11896/jsjkx.200800025

• Database & Big Data & Data Science • Previous Articles     Next Articles

Relief Feature Selection Algorithm Based on Label Correlation

DING Si-fan, WANG Feng, WEI Wei   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2020-06-24 Revised:2020-09-22 Online:2021-04-15 Published:2021-04-09
  • About author:DING Si-fan,born in 1999,bachelor,is a student member of China Computer Federation.His main research interests include data mining and machine learning.(sifan_ding_0718@163.com)
    WANG Feng,born in 1984,Ph.D.Her mian research interestsinclude the areasof feature selection,rough set theory, granular computing and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61772323) and Basic Applied Research Project of Shanxi Province of China(201801D221170).

Abstract: Feature selection plays a vital role in machine learning and data mining.Relief,as an efficient filtering feature selection algorithm,is widely used because it can process multiple types of data and has a strong tolerance for noise.However,classic Relief algorithm provides a relatively simple evaluation to discrete features.In actual feature selection,the potential relationship between features and class labels is not fully explored,and there is a lot of room for improvement.Aiming at the shortcomings of classic Relief algorithm’s simple evaluation method for discrete features,a discrete feature evaluation method based on label correlation is proposed.The algorithm fully considers the characteristics of different features and gives a distance measurement method for mixed features.At the same time,starting from the correlation between discrete features and tags,it redefines the Relief algorithm’s evaluation system for discrete features.Experimental results show that,compared with the classic Relief algorithm and some existing feature selection algorithms for mixed data,the classification accuracy of the improved Relief algorithm has been improved to varying degrees and has a good performance.

Key words: Decision tree, Feature selection, Label correlation, Relief, VDM

CLC Number: 

  • TP181
[1]LIU H,YU L.Toward integrating feature selection algorithmsfor classification and clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.
[2]WANG S,LI T R,LUO C,et al.Domain-wise approaches for updating approximations with multi-dimensional variation of ordered information systems[J].Information Sciences,2019,478:100-124.
[3]ZENG A P,LI T R,HU J,et al.Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values[J].Information Sciences,2017,378:363-388.
[4]DASH M,CHOI K,SCHEUERMANN P,et al.Feature selection for clustering - a filter solution[C]//2002 IEEE International Conference on Data Mining.Maebashi City,Japan,2002:115-122.
[5]ZHU Z,ONG Y,DASH M.Wrapper-Filter Feature SelectionAlgorithm Using a Memetic Framework[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2007,37(1):70-76.
[6]LIU Y F,YE D Y,LI W B,et al.Robust neighborhood embedding for unsupervised feature selection[J].Knowledge-Based Systems,2020,193:105462.
[7]HUANG D,CHOW T W S.Effective feature selection scheme using mutual information[J].Neurocomputing,2005,63(Jan):325-343.
[8]XU J L,ZHOU Y M,CHEN L,et al.Unsupervised feature selection based on mutual information[J].Journal of Computer Research and Development,2012,49(2):372-382.
[9]HUANG X J.Research on Relief Algorithm for Feature Selection[D].Suzhou:Suzhou University,2018.
[10]WANG F,LIANG J Y,QIAN Y H.Attribute reduction:a dimension incremental strategy[J].Knowledge-Based Systems,2013,39(2):95-108.
[11]LIANG J Y,WANG F,DANG C Y,et al.A group incremental approach to feature selection applying rough set technique[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):294-308.
[12]ISLAM M J,WU Q M J,AHMADI M,et al.Investigating the Performance of Naive-Bayes Classifiers and K-Nearest Neighbor Classifiers[J].Journal of Convergence Information Technology,2010,5(2):133-137.
[13]WANG G C.Research and Application of Naive Bayes Classifier[D].Chongqing:Chongqing Jiaotong University,2010.
[14]SAFAVIAN S R,LANDGREBE D.A survey of decision tree classifier methodology[J].IEEE Transactions on Systems,Man,and Cybernetics,1991,21(3):660-674.
[15]ZHOU X,TUCK D P.MSVM-RFE:extensions of SVM-RFE for multiclass gene selection on DNA microarray data[J].Bioinformatics,2007,23(9):1106-1114.
[16]KIRA K,RENDELL L A.The feature selection problem:Traditional methods and a new algorithm[C]//AAAI.1992:129-134.
[17]KONONENKO I.Estimating attributes:analysis and extensions of Relief[C]//Maching Learning:ECML-94.1994:171-182.
[18]SUN Y.Iterative RELIEF for Feature Weighting:Algorithms,Theories,and Applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1035-1051.
[19]DRAPER B,KAITO C,BINS J.Iterative Relief[C]//2003 Conference on Computer Vision and Pattern Recognition Workshop,Madison,Wisconsin.USA,2003:62-62.
[20]GREENE C S,PENROD N M,KIRALIS J,et al.Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions[J].BioData Mining,2009,2(1):5.
[21]URBANOWICZ R J,MEEKER M,LA CAVA W,et al.Relief-based feature selection:Introduction and review[J].Journal of biomedical informatics,2018,85:189-203.
[22]KONONENKO I,ŠIMEC E,ROBNIK-ŠIKONJA M.Overco-ming the myopia of inductive learning algorithms with RELIEFF[J].Applied Intelligence,1997,7(1):39-55.
[23]TODOROV A.Statistical Approaches to Gene X Environment Interactions for Complex Phenotypes[M].MIT Press,2016:95-116.
[24]KONONENKO I,ŠIKONJA M R.Non-myopic feature quality evaluation with (R) ReliefF[J].Computational methods of feature selection,2008,7(10):169-191.
[25]HONG S J.Use of contextual information for feature ranking and discretization[J].IEEE transactions on knowledge and data engineering,1997,9(5):718-730.
[26]ROBNIK-ŠIKONJA M,KONONENKO I.Theoretical and empirical analysis of ReliefF and RReliefF[J].Machine learning,2003,53(1/2):23-69.
[27]WANG J,WANG S T.Double exponential fuzzy C-means algorithm based on mixed distance learning [J].Journal of Software,2010,21(8):1878-1888.
[28]LI H L,GUO C H.Review of feature representation and similarity measurement in time series data mining[J].Computer Application Research,2013,30(5):1285-1291.
[29]XIE M X,GUO J Z,ZHANG H B,et al.Research on similarity measurement method of high-dimensional data[J].Computer Engineering and Science,2010,32(5):92-96.
[30]LIU J,JIN D,DU H J,et al.A new hybrid feature selection method RRK[J].Journal of Jilin University (Engineering Science Edition),2009,39(2):419-423.
[31]ZHANG L X,WANG J Y,ZHAO Y N,et al.Combined feature selection based on Relief[J].Fudan Journal (Natural Science Edition),2004(5):893-898.
[32]WANG J,CI L L,YAO K Z.A Summary of Feature Selection Methods[J].Computer Engineering and Science,2005(12):72-75.
[33]DING X M,WANG H J,WANG Y G,et al.Unsupervised feature selection method based on improved ReliefF[J].Application of Computer Systems,2018,27(3):149-155.
[34]STANFILL C,WALTZ D.Toward memory-based reasoning[J].Communications of the ACM,1986,29(12):1213-1228.
[35]WANG F,LIANG J Y.An efficient feature selection algorithm for hybrid data[J].Neurocomputing,2016,193:33-41.
[36]BHARGAVA N,SHARMA G,BHARGAVA R,et al.Decision tree analysis on j48 algorithm for data mining[J].Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering,2013,3(6):1114-1119.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[4] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[5] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[6] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[7] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[8] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[9] WU Lin, BAI Lan, SUN Meng-wei, GOU Zheng-wei. Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm [J]. Computer Science, 2021, 48(9): 194-199.
[10] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[11] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[12] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[13] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[14] CAO Yang-chen, ZHU Guo-sheng, QI Xiao-yun, ZOU Jie. Research on Intrusion Detection Classification Based on Random Forest [J]. Computer Science, 2021, 48(6A): 459-463.
[15] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!