Computer Science ›› 2022, Vol. 49 ›› Issue (4): 152-160.doi: 10.11896/jsjkx.210300094

• Database & Big Data & Data Science • Previous Articles     Next Articles

Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief

SUN Lin1,2, HUANG Miao-miao1,3, XU Jiu-cheng1,2   

  1. 1 College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453007, China;
    2 Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, Henan 453007, China;
    3 School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Received:2021-03-09 Revised:2021-07-29 Published:2022-04-01
  • About author:SUN Lin,born in 1979,Ph.D,associate professor,master supervisor.His main research interests include granular computing,big data mining,machine lear-ning and bioinformatics.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(62076089,61772176,61976082) and Key Science and Technology Program of Henan Province,China(212102210136).

Abstract: In multi-label learning and classification, existing feature selection algorithms based on neighborhood rough sets will use classification margin of samples as the neighborhood radius.However, when the margin is too large, the classification may be meaningless.When the distances of samples are too large, it will easily result in the abnormal heterogeneous or similar samples, and these existing feature selection algorithms cannot deal with the weak label data.To address these issues, a weak label feature selection method based on multi-label neighborhood rough sets and multi-label Relief is proposed.First, the number of heterogeneous and similar samples is introduced to improve the classification margin, based on which, the neighborhood radius is defined, a new formula of neighborhood approximation accuracy is presented, and then the multi-label neighborhood rough sets model is constructed and can effectively measure the uncertainty of sets in the boundary region.Second, the iterative updated weight formula is employed to fill in most of the missing labels, and then by combining the neighborhood approximation accuracy with the mutual information, a new correlation between labels is developed to fill in the remaining information of missing labels.Third, the number of heterogeneous and similar samples continues to be used to improve the label weighting and feature weighting formulas, and then the multi-label Relief model is proposed for multi-label feature selection.Finally, based on the multi-label neighborhood rough sets model and the multi-label Relief algorithm, a weak label feature selection algorithm is designed to process high-dimensional data sets with missing labels and effectively improve the performance of multi-label classification.The simulation tests are carried out on eleven public multi-label data sets, and experimental results verify the effectiveness of the proposed weak label feature selection algorithm.

Key words: Feature selection, Missing labels, Multi-label learning, Neighborhood rough sets, Relief

CLC Number: 

  • TP181
[1] KASHEF S,NEZAMABADI-POUR H.A label-specific multi-label feature selection algorithm based on the Pareto dominance concept[J].Pattern Recognition,2019,88:654-667.
[2] SUN L,YIN T Y,DING W P,et al.Feature selection with mis-sing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J/OL].IEEE Tran-sactions on Fuzzy Systems,2021.https://ieeexplore.ieee.org/abstract/document/9333666.
[3] GONZÁLEZ-LÓPEZ J,VENTURA S,CANO A.Distributedselection of continuous features in multilabel classification using mutual information[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(7):2280-2293.
[4] DENGW,GUO Y X,LI Y,et al.Power losses prediction based on feature selection and Stacking integrated learning[J].Power System Protection and Control,2020,28(15):108-115.
[5] CHEN C Y,LIN Y J,TANG L,et al.Streaming multi-label feature selection based on neighborhood interaction gain information[J].Journal of Nanjing University (Natural Science),2020,56(1):30-40.
[6] LI Y C,YANG Y L,QIU H Q.Label embedding for weak label classification[J].Journal of Nanjing University(Natural Science),2020,56(4):549-560.
[7] SUN L,YIN T Y,DING W P,et al.Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J].Information Sciences,2020,537:401-424.
[8] LIU Y,CHENG L,SUN L.Feature selection method based on K-S test and neighborhood rough set[J].Journal of Henan Normal University (Natural Science Edition),2019,47(2):21-28.
[9] XUE Z A,PANG W L,YAO S Q,et al.The prospect theory based intuitionistic fuzzy three-way decisions model[J].Journal of Henan Normal University(Natural Science Edition),2020,48(5):31-36.
[10] SUN L,WANG L Y,DING W P,et al.Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119306240.
[11] LIU K,FENG S.An improved artificial bee colony algorithm for enhancing local search ability[J].Journal of Henan Normal University (Natural Science Edition),2021,49(2):15-24.
[12] SUN L,ZHANG X Y,QIAN Y H,et al.Joint neighborhood entropy-based gene selection method with Fisher score for tumor classification[J].Applied Intelligence,2019,49(4):1245-1259.
[13] LIN Y J,LI Y W,WANG C X,et al.Attribute reduction formulti-label learning with fuzzy rough set[J].Knowledge-Based Systems,2018,152:51-61.
[14] SUN L,WANG L Y,QIAN Y H,et al.Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119303818.
[15] ZHU P F,XU Q,HU Q H,et al.Multi-label feature selection with missing labels[J].Pattern Recognition,2018,74:488-502.
[16] SUN L,WANGL Y,DING W P,et al.Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets[J].IEEE Transactions on Fuzzy Systems,2021,29(1):19-33.
[17] HAN S M,ZHENG S Q,HE Y S.Open circuit fault diagnosis for inverters based on a greedy algorithm of a rough set[J].Power System Protection and Control,2020,48(17):122-130.
[18] LIU J H,LIN Y J,LI Y W,et al.Online multi-label streaming feature selection based on neighborhood rough set[J].Pattern Recognition,2018,84:273-287.
[19] HU Q H,YU D R,LIU J F,et al.Neighborhood rough setbased heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[20] DUAN J,HU Q H,ZHANG L J,et al.Feature selection formulti-label classification based on neighborhood rough sets[J].Journal of Computer Research and Development,2015,52(1):56-65.
[21] YU Y,PEDRYCZ W,MIAO D Q.Neighborhood rough setsbased multi-label classification for automatic image annotation[J].International Journal of Approximate Reasoning,2013,54(9):1373-1387.
[22] YU Y,PEDRYCZ W,MIAO D Q.Multi-label classification by exploiting label correlations[J].Expert Systems with Applications,2014,41(6):2989-3004.
[23] SUN L,YIN T Y,DING W P,et al.Hybrid multilabel feature selection using BPSO and neighborhood rough sets for multilabel neighborhood decision systems[J].IEEE Access,2019,7:175793-175815.
[24] WANG C X,LIN Y J,LIU J H.Feature selection for multi-label learning with missing labels[J].Applied Intelligence,2019,49(8):3027-3042.
[25] JIANG L,YU G X,GUO M Z,et al.Feature selection withmissing labels based on label compression and local feature correlation[J].Neurocomputing,2020,395:95-106.
[26] WANG J J,YANG Y L.Multi-label classification algorithm for weak-label data[J].Computer Engineering and Applications,2020,56(5):65-73.
[27] YILMAZ T,YAZICI A,KITSUREGAWA M.RELIEF-MM:Effective modality weighting for multimedia information retrie-val[J].Multimedia Systems,2014,20(4):389-413.
[28] SPOLAOR N,CHERMAN E A,MONARD M C,et al.ReliefF for multi-label feature selection[C]//Proceedings of the IEEE Brazilian Conference on Intelligent Systems.2013:6-11.
[29] REYES O,MORELL C,VENTURA S.Scalable extensions ofthe ReliefF algorithm for weighting and selecting features on the multi-label learning context[J].Neurocomputing,2015,161:168-182.
[30] KONG D G,DING C,HUANG H,et al.Multi-label ReliefF and F-statistic feature selections for image annotation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2352-2359.
[31] LIN M L,LIU J H,WANG C X,et al.Multi-label feature selection algorithm based on label weighting[J].Computer Science,2017,44(10):289-295.
[32] CHEN S B,ZHANG Y M,DING C H Q,et al.Extended adaptive lasso for multi-class and multi-label feature selection[J].Knowledge-Based Systems,2019,173:28-36.
[33] HE Z F,YANG M,GAO Y,et al.Joint multi-label classification and label correlations with missing labels and feature selection[J].Knowledge-Based Systems,2019,163:145-158.
[34] CHUNG C H,DAI B R.A framework of the semi-supervisedmulti-label classification with non-uniformly distributed incomplete labels[C]//Proceedings of International Confe-rence on Big Data Analytics and Knowledge Discovery.2016:267-280.
[35] SUN Y Y,ZHANG Y,ZHOU Z H.Multi-label learning with weak label[C]//Proceedings of Twenty-Fourth AAAI Confe-rence on Artificial Intelligence.2010:593-598.
[36] BRAYTEE A,LIU W,CATCHPOOLE D R,et al.Multi-label feature selection using correlation information[C]//Proceedings of the ACM Conference on Information and Knowledge Management.2017:1649-1656.
[37] CAI Z L,ZHU W.Multi-label feature selection via feature manifold learning and sparsity regularization[J].International Journal of Machine Learning and Cybernetics,2018,9(8):1321-1334.
[38] CHANG X J,NIE F P,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2014:1171-1177.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[4] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[5] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[6] WU Lin, BAI Lan, SUN Meng-wei, GOU Zheng-wei. Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm [J]. Computer Science, 2021, 48(9): 194-199.
[7] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[8] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[9] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[10] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[11] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[12] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[13] TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song. Noise Tolerable Feature Selection Method for Software Defect Prediction [J]. Computer Science, 2021, 48(12): 131-139.
[14] ZHANG Ya-chuan, LI Hao, SONG Chen-ming, BU Rong-jing, WANG Hai-ning, KANG Yan. Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection [J]. Computer Science, 2021, 48(11A): 93-101.
[15] DONG Ming-gang, HUANG Yu-yang, JING Chao. K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection [J]. Computer Science, 2020, 47(8): 178-184.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!