计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 96-107.doi: 10.11896/jsjkx.230400018

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于中心偏移的Fisher score与直觉邻域模糊熵的多标记特征选择

孙林1, 马天娇2   

  1. 1 天津科技大学人工智能学院 天津 300457
    2 河南师范大学计算机与信息工程学院 河南 新乡 453007
  • 收稿日期:2023-04-04 修回日期:2023-10-25 出版日期:2024-07-15 发布日期:2024-07-10
  • 通讯作者: 孙林(slinok@126.com)
  • 基金资助:
    国家自然科学基金(62076089,61772176)

Multilabel Feature Selection Based on Fisher Score with Center Shift and Neighborhood IntuitionisticFuzzy Entropy

SUN Lin1, MA Tianjiao2   

  1. 1 College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China
    2 College of Computer and Information Engineering,Henan Normal University,Xinxiang,Henan 453007,China
  • Received:2023-04-04 Revised:2023-10-25 Online:2024-07-15 Published:2024-07-10
  • About author:SUN Lin,born in 1979,Ph.D,professor,doctoral supervisor,is a member of CCF(No.74144M).His main research interests include machine learning,data mining and bioinformatics.
  • Supported by:
    National Natural Science Foundation of China(62076089,61772176).

摘要: 现有多标记Fisher score模型中边缘样本会影响算法分类效果。鉴于邻域直觉模糊熵处理不确定信息时具有更强的表达能力与分辨能力的优势,文中提出了一种基于中心偏移的Fisher score与邻域直觉模糊熵的多标记特征选择方法。首先,根据标记将多标记论域划分为多个样本集,计算样本集的特征均值作为标记下样本的原始中心点,以最远样本的距离乘以距离系数,去除边缘样本集,定义了新的有效样本集,计算中心偏移处理后的标记下每个特征的得分以及标记集的特征得分,进而建立了基于中心偏移的多标记Fisher score模型,预处理多标记数据。然后,引入多标记分类间隔作为自适应模糊邻域半径参数,定义了模糊邻域相似关系和模糊邻域粒,由此构造了多标记模糊邻域粗糙集的上、下近似集;在此基础上提出了多标记邻域粗糙直觉隶属度函数和非隶属度函数,定义了多标记邻域直觉模糊熵。最后,给出了特征的外部和内部重要度的计算公式,设计了基于邻域直觉模糊熵的多标记特征选择算法,筛选出最优特征子集。在多标记K近邻分类器下、9个多标记数据集上的实验结果表明,所提算法选择的最优子集具有良好的分类性能。

关键词: 多标记学习, 特征选择, Fisher score, 多标记模糊邻域粗糙集, 邻域直觉模糊熵

Abstract: The edge samples in the existing multilabel Fisher score models affect the classification effect of the algorithm.It has the available virtues of stronger expression and resolution when using neighborhood intuitive fuzzy entropy to deal with uncertain information.Therefore,this paper develops a multilabel feature selection based on the Fisher score with center shift and neighborhood intuitionistic fuzzy entropy.Firstly,the multilabel domain is divided into multiple sample sets according to the labels,the feature mean of the sample set is calculated as the original center point of the samples under the labels,and the distance of the furthest samples is multiplied by the distance coefficient,the edge sample set is removed,and then a new effective sample set is defined.The score of each feature under the labels is calculated after center migration processing and the feature score of the label set.Then,a multilabel Fisher score model is established based on center migration to preprocess multilabel data.Secondly,the multilabel classification interval is introduced as the adaptive fuzzy neighborhood radius parameter,the fuzzy neighborhood similarity relation and fuzzy neighborhood particle are defined,and the upper and lower approximate sets of the multilabel fuzzy neighborhood rough sets are constructed.On this basis,the rough intuitive membership function and non-membership function of multilabel neighborhood are proposed,and the multilabel neighborhood intuitionistic fuzzy entropy is defined.Finally,the formulas for calculating the external and internal significance of features are obtained,and a multilabel feature selection algorithm based on neighborhood intuitive fuzzy entropy is designed to screen the optimal feature subset.Under the multilabel K-nearest neighbor classifier,experimental results on nine multilabel datasets show that the optimal subset selected by the proposed algorithm has great classification effect.

Key words: Multilabel learning, Feature selection, Fisher score, Multilabel fuzzy neighborhood rough sets, Neighborhood intuitio-nistic fuzzy entropy

中图分类号: 

  • TP181
[1]SUN L,HUANG M M,XU J C.Weak label feature selection method based on neighborhood rough sets and relief[J].Computer Science,2022,49(4):152-160.
[2]LIU Y,CHENG L,SUN L.Feature selection method based on k-s test and neighborhood rough set[J].Journal of Henan Normal University(Natural Science Edition),2019,47(2):21-28.
[3]SUN L,XU F,LI S,et al.Multilabel feature selection algorithm using ReliefF and mRMR[J].Journal of Henan Normal University(Natural Science Edition),2023,51(6):21-29.
[4]CAO D T,SHU W H,QIAN J.Feature selection algorithmbased on rough set and density peak clustering[J].Computer Science,2023,50(10):37-47.
[5]WANG Z K,SHEN D S,WANG C X.Fisher Score Fast Multi-Label Feature Selection Algorithm Based on Text Classification[J].Computer Engineering,2022,48(2):113-124.
[6]SUN L,WANG T X,DING W P,et al.Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification[J].Information Sciences,2021,578:887-912.
[7]HUANG S L,LIU Z,JIN W,et al.A fisher score-based multi-instance learning method assisted by mixture of factor analysis[J].Neurocomputing,2022,507:358-368.
[8]XU J C,YANG J,MA Y Y,et al.Feature selection method for color image steganalysis based on fuzzy neighborhood condi-tional entropy[J].Applied Intelligence,2022,52(8):9388-9405.
[9]WU D,GUO S Z.An improved Fisher Score feature selectionmethod and its application[J].Journal of Liaoning Technical University(Natural Science),2019,38(5):472-479.
[10]ZHU J X,ZHU Z,AU S.Accelerating computations in two-slabele bayesian system identification with fisher information matrixand eigenvalue sensitivity[J].Mechanical Systems and Signal Processing,2023,186:109843.
[11]XU S X,MUSELET D,TREMEAU A.Sparse coding and normalization for deep fisher score representation[J].Computer Vision and Image Understanding,2022,220:103436.
[12]WANG Z K,SHEN D S,WANG C X.Fisher score fast multilabel feature selection algorithm based on text classification[J].Computer Engineering,2022,48(2):113-124.
[13]SUN L,WANG L Y,DING W P,et al.Feature selection using fuzzy neighborhood entropy-based uncertainty measuresfor fuzzy neighborhood multigranulation rough sets[J].IEEE Transactions on Fuzzy Systems,2021,29(1):19-33.
[14]LIU J H,LIN Y J,DU J X,et al.ASFS:A novel streaming feature selection for multilabel data based on neighborhood rough set[J].Applied Intelligence,2023,53(2):1707-1724.
[15]WU Y L,LIU J H,YU X H,et al.Neighborhood rough setbased multilabel feature selection with label correlation[J].Concurrency and Computation:Practice and Experience,2022,34(22):e7162.
[16]CAO J F,TIAN X D,JIA Y M,et al.Segmentation method of ancient murals based on improved PSPNet[J].Journal of Henan Normal University(Natural Science Edition),2022,50(4):65-75.
[17]XU J C,SHEN K L,SUN L.Multilabel feature selection based on fuzzy neighborhood rough sets[J].Complex & Intelligent Systems,2022,8(3):2105-2129.
[18]SUN L,CHEN Y S,DING W P,et al.AMFSA:Adaptive fuzzy neighborhood-based multilabel feature selection with ant colony optimization[J].Applied Soft Computing,2023,138:110211.
[19]YIN T Y,CHEN H M,YUAN Z,et al.Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection[J].Information Sciences,2023,621:200-226.
[20]XUE Z A,PANG W L,YAO S Q,et al.Theproposed theory based intuitionistic fuzzy three-way decisions model[J].Journal of Henan Normal University(Natural Science Edition),2020,48(5):31-36.
[21]ZHANG X Y,HOU J L,LI J L.Multigranulation rough set methods and applications based on neighborhood dominance relation in intuitionistic fuzzy datasets[J].International Journal of Fuzzy Systems,2022,24(8):3602-3625.
[22]XIN X W,SHI C L,SUN J B,et al.A novel attribute reduction method based on intuitionistic fuzzy three-way cognitive clustering[J].Applied Intelligence,2023,53(2):1744-1758.
[23]MAO P D,XU D L.Efficient single-image super-resolution:deeply-supervised symmetric distillation network[J].Journal of Henan Normal University(Natural Science Edition),2023,51(6):57-65.
[24]ZHENG T T,ZHU L Y.Uncertainty measures of neighborhood system-based rough sets[J].Knowledge-Based Systems,2015,86:57-65.
[25]YAO S,XU F,ZHAO P,et al.Intuitionistic Fuzzy Entropy Feature Selection Algorithm Based on AdaptiveNeighborhood Space Rough Set Model[J].Journal of Computer Research and Deve-lopment,2018,55(4):802-814.
[26]ATANASSOV K T.Intuitionistic fuzzy sets[J].Fuzzy Sets and Systems,1986,20(1):87-96.
[27]JAIN P,SOM T.Multigranular rough set model based on robust intuitionistic fuzzy covering with application tofeature selection[J].International Journal of Approximate Reasoning,2023,156:16-37.
[28]LI B X,WAN R Z,ZHU Y J,et al.Multi-strategy comprehensive article swarm optimization algorithm based on population partition[J].Journal of Henan Normal University(Natural Science Edition),2022,50(3):85-94.
[29]WAN F,WANG M S,HAN Y P,et al.Research and application of reservoir flood risk early warning and ecological dispatching[J].Journal of Henan Normal University(Natural Science Edition),2022,50(3):20-28.
[30]AMIN H,MOHAMMAD B D,HOSSEIN N.MFS-MCDM:Multilabel feature selection using multi-criteria decision making[J].Knowledge-Based Systems,2020,206:106365.
[31]GAO W F,LI Y H,HU L.Multilabel feature selection withconstrained latent structure shared term[J].IEEE Transactions on Neural Networks and Learning Systems,2021,34:1253-1262.
[32]LEE J,KIM D.SCLS:Multilabel feature selection based on sca-lable criterion for large label set[J].Pattern Recognition,2017,66:342-352.
[33]LIN Y J,HUQ H,LIU J H,et al.Multilabel feature selection based on max-dependency and min-redundancy[J].Neurocomputing,2015,168:92-103.
[34]LEE J,KIM D.Feature selection for multilabel classificationusing multivariate mutual information[J].Pattern Recognition Letters,2013,34(3):349-357.
[35]LEE J,KIM D.Fast multilabel feature selection based on information-theoretic feature ranking[J].Pattern Recognition,2015,48(9):2761-2771.
[36]SUN L,CHEN Y S,XU J C.Multilabel feature selection algorithm based on improved ReliefF[J].Journal of Shandong University(Natural Science),2022,57(4):1-11.
[37]SUN L,MA T J,XUE Z A.Multilabel feature selection algorithm based on fisher score and fuzzy neighborhood entropy[J].Journal of Computer Applications,2023,43(12):3779-3789.
[38]ZHANG Y,MA Y.Non-negative multilabel feature selectionwith dynamic graph constraints[J].Knowledge-Based Systems,2022,238:107924.
[39]SUN L,SI S S,DING W P,et al.Multiobjective sparrow search feature selection with sparrow ranking and preference information and its applications for high-dimensional data[J].Applied Soft Computing,2023,147:110837.
[40]SUN L,YIN T Y,DING W P,et al.Feature selection with mis-sing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J].IEEE Transactions on Fuzzy Systems,2022,30(5):1197-1211.
[41]HOU T B,WANG A Y.Personal credit evaluation based onStacking feature enhancing multi-grained cascade logistic[J].Journal of Henan Normal University(Natural Science Edition),2023,51(3):111-122.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!