计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 152-160.doi: 10.11896/jsjkx.210300094

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于邻域粗糙集和Relief的弱标记特征选择方法

孙林1,2, 黄苗苗1,3, 徐久成1,2   

  1. 1 河南师范大学计算机与信息工程学院 河南 新乡 453007;
    2 教育人工智能与个性化学习河南省重点实验室 河南 新乡 453007;
    3 东北大学计算机科学与工程学院 沈阳 110819
  • 收稿日期:2021-03-09 修回日期:2021-07-29 发布日期:2022-04-01
  • 通讯作者: 孙林(sunlin@htu.edu.cn)
  • 基金资助:
    国家自然科学基金(62076089,61772176,61976082); 河南省科技攻关计划(212102210136)

Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief

SUN Lin1,2, HUANG Miao-miao1,3, XU Jiu-cheng1,2   

  1. 1 College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453007, China;
    2 Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, Henan 453007, China;
    3 School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2021-03-09 Revised:2021-07-29 Published:2022-04-01
  • About author:SUN Lin,born in 1979,Ph.D,associate professor,master supervisor.His main research interests include granular computing,big data mining,machine lear-ning and bioinformatics.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(62076089,61772176,61976082) and Key Science and Technology Program of Henan Province,China(212102210136).

摘要: 在多标记学习与分类中,现有邻域粗糙集特征选择算法若将样本的分类间隔作为邻域半径,则会出现分类间隔过大导致分类无意义、样本距离过大容易造成异类样本和同类样本失效,以及无法处理弱标记数据等情况。为解决这些问题,提出一种基于多标记邻域粗糙集和多标记Relief的弱标记特征选择方法。首先,引入异类样本数和同类样本数来改进分类间隔,在此基础上定义邻域半径,构造新的邻域近似精度与多标记邻域粗糙集模型,并有效度量边界域引起的集合不确定性。其次,利用迭代更新权重公式填补大部分缺失标记信息,将邻域近似精度与互信息相结合,以构造新的标记相关性,填补剩余的缺失标记信息。然后,使用异类样本数和同类样本数,以构造新的标记权重和特征权重计算公式,进而提出多标记Relief模型,并将其应用于多标记特征选择。最后,结合多标记邻域粗糙集模型和多标记Relief算法,设计一种新的弱标记特征选择算法,以处理带有缺失标记的高维数据,并有效地提升多标记分类性能。在11个公共多标记数据集上进行仿真实验,结果验证了所提出的弱标记特征选择算法的有效性。

关键词: Relief, 多标记学习, 邻域粗糙集, 缺失标记, 特征选择

Abstract: In multi-label learning and classification, existing feature selection algorithms based on neighborhood rough sets will use classification margin of samples as the neighborhood radius.However, when the margin is too large, the classification may be meaningless.When the distances of samples are too large, it will easily result in the abnormal heterogeneous or similar samples, and these existing feature selection algorithms cannot deal with the weak label data.To address these issues, a weak label feature selection method based on multi-label neighborhood rough sets and multi-label Relief is proposed.First, the number of heterogeneous and similar samples is introduced to improve the classification margin, based on which, the neighborhood radius is defined, a new formula of neighborhood approximation accuracy is presented, and then the multi-label neighborhood rough sets model is constructed and can effectively measure the uncertainty of sets in the boundary region.Second, the iterative updated weight formula is employed to fill in most of the missing labels, and then by combining the neighborhood approximation accuracy with the mutual information, a new correlation between labels is developed to fill in the remaining information of missing labels.Third, the number of heterogeneous and similar samples continues to be used to improve the label weighting and feature weighting formulas, and then the multi-label Relief model is proposed for multi-label feature selection.Finally, based on the multi-label neighborhood rough sets model and the multi-label Relief algorithm, a weak label feature selection algorithm is designed to process high-dimensional data sets with missing labels and effectively improve the performance of multi-label classification.The simulation tests are carried out on eleven public multi-label data sets, and experimental results verify the effectiveness of the proposed weak label feature selection algorithm.

Key words: Feature selection, Missing labels, Multi-label learning, Neighborhood rough sets, Relief

中图分类号: 

  • TP181
[1] KASHEF S,NEZAMABADI-POUR H.A label-specific multi-label feature selection algorithm based on the Pareto dominance concept[J].Pattern Recognition,2019,88:654-667.
[2] SUN L,YIN T Y,DING W P,et al.Feature selection with mis-sing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J/OL].IEEE Tran-sactions on Fuzzy Systems,2021.https://ieeexplore.ieee.org/abstract/document/9333666.
[3] GONZÁLEZ-LÓPEZ J,VENTURA S,CANO A.Distributedselection of continuous features in multilabel classification using mutual information[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(7):2280-2293.
[4] DENGW,GUO Y X,LI Y,et al.Power losses prediction based on feature selection and Stacking integrated learning[J].Power System Protection and Control,2020,28(15):108-115.
[5] CHEN C Y,LIN Y J,TANG L,et al.Streaming multi-label feature selection based on neighborhood interaction gain information[J].Journal of Nanjing University (Natural Science),2020,56(1):30-40.
[6] LI Y C,YANG Y L,QIU H Q.Label embedding for weak label classification[J].Journal of Nanjing University(Natural Science),2020,56(4):549-560.
[7] SUN L,YIN T Y,DING W P,et al.Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J].Information Sciences,2020,537:401-424.
[8] LIU Y,CHENG L,SUN L.Feature selection method based on K-S test and neighborhood rough set[J].Journal of Henan Normal University (Natural Science Edition),2019,47(2):21-28.
[9] XUE Z A,PANG W L,YAO S Q,et al.The prospect theory based intuitionistic fuzzy three-way decisions model[J].Journal of Henan Normal University(Natural Science Edition),2020,48(5):31-36.
[10] SUN L,WANG L Y,DING W P,et al.Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119306240.
[11] LIU K,FENG S.An improved artificial bee colony algorithm for enhancing local search ability[J].Journal of Henan Normal University (Natural Science Edition),2021,49(2):15-24.
[12] SUN L,ZHANG X Y,QIAN Y H,et al.Joint neighborhood entropy-based gene selection method with Fisher score for tumor classification[J].Applied Intelligence,2019,49(4):1245-1259.
[13] LIN Y J,LI Y W,WANG C X,et al.Attribute reduction formulti-label learning with fuzzy rough set[J].Knowledge-Based Systems,2018,152:51-61.
[14] SUN L,WANG L Y,QIAN Y H,et al.Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119303818.
[15] ZHU P F,XU Q,HU Q H,et al.Multi-label feature selection with missing labels[J].Pattern Recognition,2018,74:488-502.
[16] SUN L,WANGL Y,DING W P,et al.Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets[J].IEEE Transactions on Fuzzy Systems,2021,29(1):19-33.
[17] HAN S M,ZHENG S Q,HE Y S.Open circuit fault diagnosis for inverters based on a greedy algorithm of a rough set[J].Power System Protection and Control,2020,48(17):122-130.
[18] LIU J H,LIN Y J,LI Y W,et al.Online multi-label streaming feature selection based on neighborhood rough set[J].Pattern Recognition,2018,84:273-287.
[19] HU Q H,YU D R,LIU J F,et al.Neighborhood rough setbased heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[20] DUAN J,HU Q H,ZHANG L J,et al.Feature selection formulti-label classification based on neighborhood rough sets[J].Journal of Computer Research and Development,2015,52(1):56-65.
[21] YU Y,PEDRYCZ W,MIAO D Q.Neighborhood rough setsbased multi-label classification for automatic image annotation[J].International Journal of Approximate Reasoning,2013,54(9):1373-1387.
[22] YU Y,PEDRYCZ W,MIAO D Q.Multi-label classification by exploiting label correlations[J].Expert Systems with Applications,2014,41(6):2989-3004.
[23] SUN L,YIN T Y,DING W P,et al.Hybrid multilabel feature selection using BPSO and neighborhood rough sets for multilabel neighborhood decision systems[J].IEEE Access,2019,7:175793-175815.
[24] WANG C X,LIN Y J,LIU J H.Feature selection for multi-label learning with missing labels[J].Applied Intelligence,2019,49(8):3027-3042.
[25] JIANG L,YU G X,GUO M Z,et al.Feature selection withmissing labels based on label compression and local feature correlation[J].Neurocomputing,2020,395:95-106.
[26] WANG J J,YANG Y L.Multi-label classification algorithm for weak-label data[J].Computer Engineering and Applications,2020,56(5):65-73.
[27] YILMAZ T,YAZICI A,KITSUREGAWA M.RELIEF-MM:Effective modality weighting for multimedia information retrie-val[J].Multimedia Systems,2014,20(4):389-413.
[28] SPOLAOR N,CHERMAN E A,MONARD M C,et al.ReliefF for multi-label feature selection[C]//Proceedings of the IEEE Brazilian Conference on Intelligent Systems.2013:6-11.
[29] REYES O,MORELL C,VENTURA S.Scalable extensions ofthe ReliefF algorithm for weighting and selecting features on the multi-label learning context[J].Neurocomputing,2015,161:168-182.
[30] KONG D G,DING C,HUANG H,et al.Multi-label ReliefF and F-statistic feature selections for image annotation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2352-2359.
[31] LIN M L,LIU J H,WANG C X,et al.Multi-label feature selection algorithm based on label weighting[J].Computer Science,2017,44(10):289-295.
[32] CHEN S B,ZHANG Y M,DING C H Q,et al.Extended adaptive lasso for multi-class and multi-label feature selection[J].Knowledge-Based Systems,2019,173:28-36.
[33] HE Z F,YANG M,GAO Y,et al.Joint multi-label classification and label correlations with missing labels and feature selection[J].Knowledge-Based Systems,2019,163:145-158.
[34] CHUNG C H,DAI B R.A framework of the semi-supervisedmulti-label classification with non-uniformly distributed incomplete labels[C]//Proceedings of International Confe-rence on Big Data Analytics and Knowledge Discovery.2016:267-280.
[35] SUN Y Y,ZHANG Y,ZHOU Z H.Multi-label learning with weak label[C]//Proceedings of Twenty-Fourth AAAI Confe-rence on Artificial Intelligence.2010:593-598.
[36] BRAYTEE A,LIU W,CATCHPOOLE D R,et al.Multi-label feature selection using correlation information[C]//Proceedings of the ACM Conference on Information and Knowledge Management.2017:1649-1656.
[37] CAI Z L,ZHU W.Multi-label feature selection via feature manifold learning and sparsity regularization[J].International Journal of Machine Learning and Cybernetics,2018,9(8):1321-1334.
[38] CHANG X J,NIE F P,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2014:1171-1177.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 陈于思, 艾志华, 张清华.
基于三角不等式判定和局部策略的高效邻域覆盖模型
Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy
计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302
[5] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[6] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[7] 毋琳, 白澜, 孙梦伟, 郭拯危.
基于特征优化的SAR图像水华识别方法
Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm
计算机科学, 2021, 48(9): 194-199. https://doi.org/10.11896/jsjkx.200800142
[8] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[9] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[10] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[11] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[12] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[13] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[14] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[15] 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁.
混合人工化学反应优化和狼群算法的特征选择
Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection
计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!