计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 151-157.doi: 10.11896/jsjkx.250600149
葛泽庆, 黄圣君
GE Zeqing, HUANG Shengjun
摘要: 表格数据在医学、金融和制造业等领域具有广泛应用,其多标记分类任务对揭示现实世界中复杂的关联特性至关重要。然而,获取大规模标记数据集往往成本高昂,这给研究带来了挑战。虽然半监督学习利用未标记样本在图像和文本数据中取得了成功,但由于表格数据缺乏固有的空间或语义结构,使得传统方法效率较低。为了应对这些挑战,提出了一种针对多标记表格数据的半监督学习框架。该方法引入了一种结构保留的数据增强方法,在特征表示空间内添加高斯噪声保留原始数据结构,与基于一致性的正则化技术,在样本及其扰动版本之间进行正则化,以增强泛化能力。此外,还开发了一种基于注意力机制的机制,有选择地从标记数据中聚合邻域信息,从而使模型能够有效地利用局部特征相关性。在10个公共多标记表格数据集上进行了广泛的实验,结果证明了该方法的有效性。
中图分类号:
| [1]SOMVANSHI S,DAS S,JAVED S A,et al.A survey on deep tabular learning[J].arXiv:2410.12034,2024. [2]TAREKEGN A N,ULLAH M,CHEIKH F A.Deep learningfor multi-label learning:a comprehensive survey[J].arXiv:2401.16549,2024. [3]OUALI Y,HUDELOT C,TAMI M.An overview of deep semi-supervised learning[J].arXiv:2006.05278,2020. [4]LEE D H.Pseudo-label:The simple and efficient semi-super-vised learning method for deep neural networks[C]//Workshop on Challenges in Representation Learning.New York:ICML,2013:896. [5]XIE Q,DAI Z,HOVY E,et al.Unsupervised data augmentation for consistency training[J].Advances in Neural Information Processing Systems,2020,33:6256-6268. [6]LAINE S,AILA T.Temporal ensembling for semi-supervised learning[J].arXiv:1610.02242,2016. [7]JIA S,WANG P,JIA P,et al.Research on data augmentation for image classification based on convolution neural networks[C]//2017 Chinese Automation Congress(CAC).Piscataway,NJ:IEEE,2017:4165-4170. [8]SHORTEN C,KHOSHGOFTAAR T M,FURHT B.Text data augmentation for deep learning[J].Journal of big Data,2021,8(1):101. [9]LAINE S,AILA T.Temporal ensembling for semi-supervisedlearning[J].arXiv:1610.02242,2016. [10]YOON J,ZHANG Y,JORDON J,et al.Vime:Extending the success of self-and semi-supervised learning to tabular domain[J].Advances in Neural Information Processing Systems,2020,33:11033-11043. [11]BAHRI D,JIANG H,TAY Y,et al.Scarf:Self-supervised contrastive learning using random feature corruption[J].arXiv:2106.15147,2021. [12]SOMEPALLI G,GOLDBLUM M,SCHWARZSCHILD A,et al.Saint:Improved neural networks for tabular data via row attention and contrastive pre-training[J].arXiv:2106.01342,2021. [13]CHEN J,YAN J,CHEN Q,et al.Excelformer:A neural network surpassing gbdts on tabular data[J].arXiv:2301.02819,2023. [14]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048. [15]HANG J Y,ZHANG M L.Dual perspective of label-specific feature learning for multi-label classification[J].ACM Transactions on Knowledge Discovery from Data,2024,19(1):1-30. [16]LI G Z,YANG J Y,LU W C,et al.Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection[J].International Journal of Computational Biology and Drug Design,2008,1(1):1-13. [17]XIE M K,XIAO J,LIU H Z,et al.Class-distribution-awarepseudo-labeling for semi-supervised multi-label learning[J].Advances in Neural Information Processing Systems,2023,36:25731-25747. [18]LIU B,XU N,FANG X,et al.Correlation-induced label prior for semi-supervised multi-label learning[C]//Forty-first International Conference on Machine Learning.2024. [19]GOODFELLOW I,BENGIO Y,COURVILLE A,et al.Deeplearning[M].Cambridge:MIT press,2016. [20]RIDNIK T,BEN-BARUCH E,ZAMIR N,et al.Asymmetricloss for multi-label classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:82-91. [21]AMARI S.Backpropagation and stochastic gradient descent me-thod[J].Neurocomputing,1993,5(4/5):185-196. [22]PETERSON L E.K-nearest neighbor[J].Scholarpedia,2009,4(2):1883. [23]ZHANG M L,ZHOU Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2013,26(8):1819-1837. [24]FANG J,TANG C,CUI Q,et al.Semi-supervised learning with data augmentation for tabular data[C]//Proceedings of the 31st ACM International Conference on Information & Knowledge Management.New York:ACM,2022:3928-3932. [25]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017. [26]DEVRIES T,TAYLOR G W.Improved regularization of convolutional neural networks with cutout[J].arXiv:1708.04552,2017. [27]HANG J Y,ZHANG M L.Collaborative learning of label se-mantics and deep label-specific features for multi-label classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(12):9860-9871. [28]HANG J Y,ZHANG M L,FENG Y,et al.End-to-end probabilistic label-specific feature learning for multi-label classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:6847-6855. |
|
||