计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 161-168.doi: 10.11896/jsjkx.240600008

• 数据库&大数据&数据科学 • 上一篇    下一篇

半监督偏多标签特征选择

武优1,2, 王静1,2, 李培培1,2, 胡学钢1,2,3   

  1. 1 合肥工业大学计算机与信息学院 合肥 230601
    2 大数据知识工程教育部重点实验室(合肥工业大学) 合肥 230601
    3 工业安全与应急技术安徽省重点实验室(合肥工业大学) 合肥 230601
  • 收稿日期:2024-05-30 修回日期:2024-08-21 出版日期:2025-04-15 发布日期:2025-04-14
  • 通讯作者: 李培培(peipeili@hfut.edu.cn)
  • 作者简介:(wuyou@mail.hfut.edu.cn)
  • 基金资助:
    国家自然科学基金(62376085,62076085,62120106008);大健康研究院健康大数据与群体医学研究所专项资金(JKS2023003)

Semi-supervised Partial Multi-label Feature Selection

WU You1,2, WANG Jing1,2, LI Peipei1,2, HU Xuegang1,2,3   

  1. 1 School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China
    2 Key Laboratory of Knowledge Engineering with Big Data,Hefei University of Technology,Hefei 230601,China
    3 Anhui Province Key Laboratory of Industry Safety and Emergency Technology,Hefei University of Technology,Hefei 230601,China
  • Received:2024-05-30 Revised:2024-08-21 Online:2025-04-15 Published:2025-04-14
  • About author:WU You,born in 2002,master candidate.His main research interests include feature selection and multi-label learning.
    LI Peipei,born in 1982,Ph.D,professor,Ph.D supervisor.Her main research interests include data stream mining and knowledge engineering.
  • Supported by:
    National Natural Science Foundation of China(62376085,62076085,62120106008) and Research Funds of Center for Big Data and Population Health of IHM(JKS2023003).

摘要: 多标签特征选择是一种有效的特征降维技术,旨在从原始特征空间中筛选出具有区分力的特征子集。然而,传统的多标签特征选择方法面临着标注精度下降的问题。在真实的数据中,实例被候选标签集标注,候选标签除相关标签外,还混杂着噪声标签,即偏多标签数据。现有的多标签特征选择算法通常假设训练样本被精确标注,或者只考虑标签缺失的情况。并且,在现实情形中,大规模高维多标签数据集往往只有小部分数据被标注。因此,文中提出一种新颖的半监督偏多标签特征选择方法。首先,针对偏多标签问题,从已知标签的样本中学习标签之间的真实关系,然后利用流形正则化技术维持特征空间与标签空间的结构一致性。其次,针对标签缺失问题,通过标签传播算法来增强标签信息。另外,针对高维特征问题,对映射矩阵施加低秩约束,以揭示标签间的隐性联系,并通过引入 l2,1 范数约束来选择具有较强区分能力的特征。实验结果表明,与现有的半监督多标签特征选择方法相比,所提方法在性能上存在显著的优势。

关键词: 多标签特征选择, 偏多标签学习, 半监督学习, 特征降维, 噪声标签

Abstract: Multi-label feature selection is a technique for reducing feature dimensionality by filtering out a subset of features with distinguishing power from the original feature space.However,the traditional method faces the problem of labeling accuracy degradation.Real data instances are labeled with a set of candidate labels,which may include noise labels in addition to relevant labels,resulting in biased multi-label data.Existing multi-label feature selection algorithms typically assume accurate labeling of training samples or only consider missing labels.Furthermore,large-scale high-dimensional multi-labeled datasets in real situations often have only a small portion of labeled data.Therefore,this paper presents a new semi-supervised biased multi-label feature selection method.Firstly,considering the partial multi-label issue,this paper learns the true relationships between labels from samples with known labels.Then,the structural consistency between the feature space and the label space is maintained by using the stream regularization technique.Secondly,considering the label missing issue,this paper considers unlabeled data and enhance the label information by a label propagation algorithm.Additionally,considering the high-dimensional feature,this paper applies low-rank constraints to the mapping matrix to expose implicit connections between labels.It also selects features with strong distinguishing ability by introducing l2,1 norm constraints.Experimental results demonstrate significant performance advantages of our method compared to existing semi-supervised multi-label feature selection methods.

Key words: Multi-label feature selection, Partial multi-label learning, Semi-supervised learning, Feature dimension reduction, Noisylabels

中图分类号: 

  • TP181
[1]ZHANG M L,ZHOU Z H.A review on multi-label learning algorithms [J].IEEE Transactions on Knowledge and Data Engineering,2013,26(8):1819-1837.
[2]DONOHO D L.High-dimensional data analysis:The curses and blessings of dimensionality [J].AMS Math Challenges Lecture,2000,1(2000):32.
[3]LI Z Q,DU J Q,NIE B,et al.Summary of Feature Selection Methods [J].Computer Engineering and Applications,2019,55(24):10-19.
[4]HUANG Q,YAMADA M,TIAN Y,et al.Graph LIME:Local Interpretable Model Explanations for Graph Neural Networks [J].IEEE Transactions on Knowledge and Data Engineering,2023,35(7):6968-6972.
[5]XIE M K,HUANG S J.Partial multi-label learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:4302-4309.
[6]SHI D,ZHU L,LI J,et al.Binary label learning for semi-supervised feature selection [J].IEEE Transactions on Knowledge and Data Engineering,2021,35(3):2299-2312.
[7]WANG J,LI P P,YU K.Partial multi-label feature selection[C]//International Joint Conference on Neural Networks.IEEE,2022:1-9.
[8]ZHANG M L,FANG J P.Partial multi-label learning via credible label elicitation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,43(10):3587-359.
[9]WANG H,LIU W,ZHAO Y,et al.Discriminative and correlative partial multi-label learning [C]//Proceedings of the International Joint Conferences on Artificial Intelligence.2019:3691-3697.
[10]YU G,CHEN X,DOMENICONI C,et al.Feature-induced partial multi-label learning [C]//IEEE International Conference on Data Mining.2018:1398-1403.
[11]LI Z,LYU G,FENG S.Partial multi-label learning via multi-subspace representation [C]//Proceedings of the International Joint Conference on Artificial Intelligence.2021:2612-2618.
[12]LV S,SHI S,WANG H,LI F.Semi-supervised multi-label feature selection with adaptive structure learning and manifold learning [J].Knowledge-Based Systems,2021,214:106757.
[13]XU Y,WANG J,AN S,et al.Semi-supervised multi-label fea-ture selection by preserving feature-label space consistency [C]//Proceedings of the International Conference on Information and Knowledge Management.2018:783-792.
[14]WANG X,CHEN R,HONG C,et al.Semi-supervised multi-label feature selection via label correlation analysis with l1-norm graph embedding [J].Image and Vision Computing,2017,63:10-23.
[15]ALALGA A,BENABDESLEM K,TALEB N.Soft-constrainedlaplacian score for semi-supervised multi-label feature selection [J].Knowledge and Information Systems,2016,47(1):75-98.
[16]WANG P,XIN P,LIU Y,et al.Extracting node center coordinates of point clouds in reticulated shell structure using least squares method [J].Journal of Graphics,2024,45(1):183-190.
[17]WU H X,HAN M,CHEN Z Q,et al.Survey of multi-label classification based on supervised and semi-supervised learning [J].Computer Science,2022,49(8):12-25.
[18]TAN C,CHEN S,GENG X,et al.A label distribution manifold learning algorithm [J].Pattern Recognition,2023,135:109112.
[19]FAN Y,LIU J,TANG J,et al.Learning correlation information for multi-label feature selection [J].Pattern Recognition,2024,145:109899.
[20]CHEN X,YUAN G,NIE F,et al.Semi-supervised feature selection via sparse rescaled linear square regression [J].IEEE Transactions on Knowledge and Data Engineering,2018,32(1):165-176.
[21]LIN Z,GANESH A,WRIGHT J,et al.Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix [R].Coordinated Science Laboratory,2009,UILU-ENG-09-2214:DC-24.
[22]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning [J].Pattern recognition,2007,40(7):2038-2048.
[23]CAI Z,ZHU W.Multi-label feature selection via feature manifold learning and sparsity regularization [J].International Journal of Machine Learning and Cybernetics,2018,9(8):1321-1334.
[24]ZHANG M L,ZHOU Z H.A review on multi-label learning algorithms [J].IEEE transactions on knowledge and data engineering,2013,26(8):1819-1837.
[25]MA Z,NIE F,YANG Y,et al.Discriminating joint feature analysis for multimedia data understanding [J].IEEE Transactions on Multimedia,2012,14(6):1662-1672.
[26]CHANG X,SHEN H,WANG S,et al.Semi-supervised feature analysis for multimedia annotation by mining label correlation [C]//Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.2014:74-85.
[27]CHANG X,NIE F,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:1171-1177.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!