计算机科学 ›› 2021, Vol. 48 ›› Issue (4): 91-96.doi: 10.11896/jsjkx.200800025

• 数据库&大数据&数据科学 • 上一篇    下一篇

一种基于标签相关度的Relief特征选择算法

丁思凡, 王锋, 魏巍   

  1. 山西大学计算机与信息技术学院 太原030006
  • 收稿日期:2020-06-24 修回日期:2020-09-22 出版日期:2021-04-15 发布日期:2021-04-09
  • 通讯作者: 王锋(sxuwangfeng@126.com)
  • 基金资助:
    国家自然科学基金(61772323);山西省应用基础研究项目(201801D221170)

Relief Feature Selection Algorithm Based on Label Correlation

DING Si-fan, WANG Feng, WEI Wei   

  1. School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2020-06-24 Revised:2020-09-22 Online:2021-04-15 Published:2021-04-09
  • About author:DING Si-fan,born in 1999,bachelor,is a student member of China Computer Federation.His main research interests include data mining and machine learning.(sifan_ding_0718@163.com)
    WANG Feng,born in 1984,Ph.D.Her mian research interestsinclude the areasof feature selection,rough set theory, granular computing and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61772323) and Basic Applied Research Project of Shanxi Province of China(201801D221170).

摘要: 特征选择在机器学习和数据挖掘中起到了至关重要的作用。Relief作为一种高效的过滤式特征选择算法,能处理多种类型的数据,且对噪声的容忍力较强,因此被广泛应用。然而,经典的Relief算法对离散特征的评价较为简单,在实际进行特征选择时并未充分挖掘特征与类标签之间的潜在关系,具有很大的改进空间。针对经典的Relief算法对离散特征的评价方式较为简单这一不足,提出了一种基于标签相关度的离散特征评价方法。该算法充分考虑了不同特征的特性,给出了一种面向混合特征的距离度量方式,同时从离散特征与标签之间的相关度出发,重新定义了Relief算法对离散特征的评价体系。实验结果表明,改进后的Relief算法与经典的Relief算法和现有的一些面向混合数据的特征选择算法相比,其分类精度均有不同程度的提升,具有良好的性能。

关键词: Relief, VDM, 标签相关度, 决策树, 特征选择

Abstract: Feature selection plays a vital role in machine learning and data mining.Relief,as an efficient filtering feature selection algorithm,is widely used because it can process multiple types of data and has a strong tolerance for noise.However,classic Relief algorithm provides a relatively simple evaluation to discrete features.In actual feature selection,the potential relationship between features and class labels is not fully explored,and there is a lot of room for improvement.Aiming at the shortcomings of classic Relief algorithm’s simple evaluation method for discrete features,a discrete feature evaluation method based on label correlation is proposed.The algorithm fully considers the characteristics of different features and gives a distance measurement method for mixed features.At the same time,starting from the correlation between discrete features and tags,it redefines the Relief algorithm’s evaluation system for discrete features.Experimental results show that,compared with the classic Relief algorithm and some existing feature selection algorithms for mixed data,the classification accuracy of the improved Relief algorithm has been improved to varying degrees and has a good performance.

Key words: Decision tree, Feature selection, Label correlation, Relief, VDM

中图分类号: 

  • TP181
[1]LIU H,YU L.Toward integrating feature selection algorithmsfor classification and clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.
[2]WANG S,LI T R,LUO C,et al.Domain-wise approaches for updating approximations with multi-dimensional variation of ordered information systems[J].Information Sciences,2019,478:100-124.
[3]ZENG A P,LI T R,HU J,et al.Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values[J].Information Sciences,2017,378:363-388.
[4]DASH M,CHOI K,SCHEUERMANN P,et al.Feature selection for clustering - a filter solution[C]//2002 IEEE International Conference on Data Mining.Maebashi City,Japan,2002:115-122.
[5]ZHU Z,ONG Y,DASH M.Wrapper-Filter Feature SelectionAlgorithm Using a Memetic Framework[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B (Cybernetics),2007,37(1):70-76.
[6]LIU Y F,YE D Y,LI W B,et al.Robust neighborhood embedding for unsupervised feature selection[J].Knowledge-Based Systems,2020,193:105462.
[7]HUANG D,CHOW T W S.Effective feature selection scheme using mutual information[J].Neurocomputing,2005,63(Jan):325-343.
[8]XU J L,ZHOU Y M,CHEN L,et al.Unsupervised feature selection based on mutual information[J].Journal of Computer Research and Development,2012,49(2):372-382.
[9]HUANG X J.Research on Relief Algorithm for Feature Selection[D].Suzhou:Suzhou University,2018.
[10]WANG F,LIANG J Y,QIAN Y H.Attribute reduction:a dimension incremental strategy[J].Knowledge-Based Systems,2013,39(2):95-108.
[11]LIANG J Y,WANG F,DANG C Y,et al.A group incremental approach to feature selection applying rough set technique[J].IEEE Transactions on Knowledge and Data Engineering,2012,26(2):294-308.
[12]ISLAM M J,WU Q M J,AHMADI M,et al.Investigating the Performance of Naive-Bayes Classifiers and K-Nearest Neighbor Classifiers[J].Journal of Convergence Information Technology,2010,5(2):133-137.
[13]WANG G C.Research and Application of Naive Bayes Classifier[D].Chongqing:Chongqing Jiaotong University,2010.
[14]SAFAVIAN S R,LANDGREBE D.A survey of decision tree classifier methodology[J].IEEE Transactions on Systems,Man,and Cybernetics,1991,21(3):660-674.
[15]ZHOU X,TUCK D P.MSVM-RFE:extensions of SVM-RFE for multiclass gene selection on DNA microarray data[J].Bioinformatics,2007,23(9):1106-1114.
[16]KIRA K,RENDELL L A.The feature selection problem:Traditional methods and a new algorithm[C]//AAAI.1992:129-134.
[17]KONONENKO I.Estimating attributes:analysis and extensions of Relief[C]//Maching Learning:ECML-94.1994:171-182.
[18]SUN Y.Iterative RELIEF for Feature Weighting:Algorithms,Theories,and Applications[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(6):1035-1051.
[19]DRAPER B,KAITO C,BINS J.Iterative Relief[C]//2003 Conference on Computer Vision and Pattern Recognition Workshop,Madison,Wisconsin.USA,2003:62-62.
[20]GREENE C S,PENROD N M,KIRALIS J,et al.Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions[J].BioData Mining,2009,2(1):5.
[21]URBANOWICZ R J,MEEKER M,LA CAVA W,et al.Relief-based feature selection:Introduction and review[J].Journal of biomedical informatics,2018,85:189-203.
[22]KONONENKO I,ŠIMEC E,ROBNIK-ŠIKONJA M.Overco-ming the myopia of inductive learning algorithms with RELIEFF[J].Applied Intelligence,1997,7(1):39-55.
[23]TODOROV A.Statistical Approaches to Gene X Environment Interactions for Complex Phenotypes[M].MIT Press,2016:95-116.
[24]KONONENKO I,ŠIKONJA M R.Non-myopic feature quality evaluation with (R) ReliefF[J].Computational methods of feature selection,2008,7(10):169-191.
[25]HONG S J.Use of contextual information for feature ranking and discretization[J].IEEE transactions on knowledge and data engineering,1997,9(5):718-730.
[26]ROBNIK-ŠIKONJA M,KONONENKO I.Theoretical and empirical analysis of ReliefF and RReliefF[J].Machine learning,2003,53(1/2):23-69.
[27]WANG J,WANG S T.Double exponential fuzzy C-means algorithm based on mixed distance learning [J].Journal of Software,2010,21(8):1878-1888.
[28]LI H L,GUO C H.Review of feature representation and similarity measurement in time series data mining[J].Computer Application Research,2013,30(5):1285-1291.
[29]XIE M X,GUO J Z,ZHANG H B,et al.Research on similarity measurement method of high-dimensional data[J].Computer Engineering and Science,2010,32(5):92-96.
[30]LIU J,JIN D,DU H J,et al.A new hybrid feature selection method RRK[J].Journal of Jilin University (Engineering Science Edition),2009,39(2):419-423.
[31]ZHANG L X,WANG J Y,ZHAO Y N,et al.Combined feature selection based on Relief[J].Fudan Journal (Natural Science Edition),2004(5):893-898.
[32]WANG J,CI L L,YAO K Z.A Summary of Feature Selection Methods[J].Computer Engineering and Science,2005(12):72-75.
[33]DING X M,WANG H J,WANG Y G,et al.Unsupervised feature selection method based on improved ReliefF[J].Application of Computer Systems,2018,27(3):149-155.
[34]STANFILL C,WALTZ D.Toward memory-based reasoning[J].Communications of the ACM,1986,29(12):1213-1228.
[35]WANG F,LIANG J Y.An efficient feature selection algorithm for hybrid data[J].Neurocomputing,2016,193:33-41.
[36]BHARGAVA N,SHARMA G,BHARGAVA R,et al.Decision tree analysis on j48 algorithm for data mining[J].Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering,2013,3(6):1114-1119.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[7] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[8] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[9] 毋琳, 白澜, 孙梦伟, 郭拯危.
基于特征优化的SAR图像水华识别方法
Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm
计算机科学, 2021, 48(9): 194-199. https://doi.org/10.11896/jsjkx.200800142
[10] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[11] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[12] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[13] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[14] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[15] 曹扬晨, 朱国胜, 祁小云, 邹洁.
基于随机森林的入侵检测分类研究
Research on Intrusion Detection Classification Based on Random Forest
计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!