Computer Science ›› 2026, Vol. 53 ›› Issue (3): 151-157.doi: 10.11896/jsjkx.250600149

• Database & Big Data & Data Science • Previous Articles     Next Articles

Semi-supervised Learning Method for Multi-label Tabular Data

GE Zeqing, HUANG Shengjun   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Received:2025-06-24 Revised:2025-08-22 Published:2026-03-12
  • About author:GE Zeqing,born in 2000,postgraduate.His main research interests include multi-label learing and semi-supervised learning.
    HUANG Shengjun,born in 1987,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.42916S).His main research interests include machine lear-ning and pattern recognition.
  • Supported by:
    Excellent Young Scientists Found of the National Natural Science Foundation of China(62222605) and YQS Foundation(U2441285).

Abstract: Tabular data is ubiquitous in industrial applications,spanning fields such as medicine,finance,and manufacturing,where each sample is characterized by heterogeneous features.Multi-label classification for tabular data is crucial for capturing the complex,interconnected nature of real-world phenomena,yet obtaining large-scale labeled datasets is often costly.While semi-supervised learning has shown success in image and text data by leveraging unlabeled samples,its application to tabular data remains challenging due to the lack of inherent spatial or semantic structures,making conventional augmentation and consistency-based methods less effective.To address these challenges,this paper proposes a novel semi-supervised learning frameworktai-lored for multi-label tabular data.This approach introduces a structure-preserving data augmentation method that adds Gaussian noise to the feature representation space preserving the original data structure,and a consistency-based regularization technique between samples and their perturbed versions to enhance generalization.Additionally,an attention-based mechanism is developed to selectively aggregate neighborhood information from labeled data,allowing the model to leverage local feature correlations effectively.For unlabeled data,a state-of-the-art pseudo-labeling strategy is employed to enable iterative refinement of model predictions.Extensive experiments are conducted on ten public multi-label tabular datasets,covering various domains to validate the robustness of the proposed method.Results demonstrate the effectiveness of the proposed method,advancing the state of semi-supervised multi-label learning for tabular data.

Key words: Tabular data, Multi-label classification, Semi-supervised learning, Data augmentation, Attention mechanism

CLC Number: 

  • TP181
[1]SOMVANSHI S,DAS S,JAVED S A,et al.A survey on deep tabular learning[J].arXiv:2410.12034,2024.
[2]TAREKEGN A N,ULLAH M,CHEIKH F A.Deep learningfor multi-label learning:a comprehensive survey[J].arXiv:2401.16549,2024.
[3]OUALI Y,HUDELOT C,TAMI M.An overview of deep semi-supervised learning[J].arXiv:2006.05278,2020.
[4]LEE D H.Pseudo-label:The simple and efficient semi-super-vised learning method for deep neural networks[C]//Workshop on Challenges in Representation Learning.New York:ICML,2013:896.
[5]XIE Q,DAI Z,HOVY E,et al.Unsupervised data augmentation for consistency training[J].Advances in Neural Information Processing Systems,2020,33:6256-6268.
[6]LAINE S,AILA T.Temporal ensembling for semi-supervised learning[J].arXiv:1610.02242,2016.
[7]JIA S,WANG P,JIA P,et al.Research on data augmentation for image classification based on convolution neural networks[C]//2017 Chinese Automation Congress(CAC).Piscataway,NJ:IEEE,2017:4165-4170.
[8]SHORTEN C,KHOSHGOFTAAR T M,FURHT B.Text data augmentation for deep learning[J].Journal of big Data,2021,8(1):101.
[9]LAINE S,AILA T.Temporal ensembling for semi-supervisedlearning[J].arXiv:1610.02242,2016.
[10]YOON J,ZHANG Y,JORDON J,et al.Vime:Extending the success of self-and semi-supervised learning to tabular domain[J].Advances in Neural Information Processing Systems,2020,33:11033-11043.
[11]BAHRI D,JIANG H,TAY Y,et al.Scarf:Self-supervised contrastive learning using random feature corruption[J].arXiv:2106.15147,2021.
[12]SOMEPALLI G,GOLDBLUM M,SCHWARZSCHILD A,et al.Saint:Improved neural networks for tabular data via row attention and contrastive pre-training[J].arXiv:2106.01342,2021.
[13]CHEN J,YAN J,CHEN Q,et al.Excelformer:A neural network surpassing gbdts on tabular data[J].arXiv:2301.02819,2023.
[14]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[15]HANG J Y,ZHANG M L.Dual perspective of label-specific feature learning for multi-label classification[J].ACM Transactions on Knowledge Discovery from Data,2024,19(1):1-30.
[16]LI G Z,YANG J Y,LU W C,et al.Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection[J].International Journal of Computational Biology and Drug Design,2008,1(1):1-13.
[17]XIE M K,XIAO J,LIU H Z,et al.Class-distribution-awarepseudo-labeling for semi-supervised multi-label learning[J].Advances in Neural Information Processing Systems,2023,36:25731-25747.
[18]LIU B,XU N,FANG X,et al.Correlation-induced label prior for semi-supervised multi-label learning[C]//Forty-first International Conference on Machine Learning.2024.
[19]GOODFELLOW I,BENGIO Y,COURVILLE A,et al.Deeplearning[M].Cambridge:MIT press,2016.
[20]RIDNIK T,BEN-BARUCH E,ZAMIR N,et al.Asymmetricloss for multi-label classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:82-91.
[21]AMARI S.Backpropagation and stochastic gradient descent me-thod[J].Neurocomputing,1993,5(4/5):185-196.
[22]PETERSON L E.K-nearest neighbor[J].Scholarpedia,2009,4(2):1883.
[23]ZHANG M L,ZHOU Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering,2013,26(8):1819-1837.
[24]FANG J,TANG C,CUI Q,et al.Semi-supervised learning with data augmentation for tabular data[C]//Proceedings of the 31st ACM International Conference on Information & Knowledge Management.New York:ACM,2022:3928-3932.
[25]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711.05101,2017.
[26]DEVRIES T,TAYLOR G W.Improved regularization of convolutional neural networks with cutout[J].arXiv:1708.04552,2017.
[27]HANG J Y,ZHANG M L.Collaborative learning of label se-mantics and deep label-specific features for multi-label classification[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(12):9860-9871.
[28]HANG J Y,ZHANG M L,FENG Y,et al.End-to-end probabilistic label-specific feature learning for multi-label classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:6847-6855.
[1] WANG Xinyu, GAO Donghuai, NING Yuwen, XU Hao, QI Haonan. Student Behavior Detection Method Based on Improved YOLO Algorithm [J]. Computer Science, 2026, 53(3): 246-256.
[2] QIAN Qing, CHEN Huicheng, CUI Yunhe, TANG Ruixue, FU Jinmei. Joint Entity and Relation Extraction Method with Multi-scale Collaborative Aggregation and Axial-semantic Guidance [J]. Computer Science, 2026, 53(3): 97-106.
[3] WANG Yiming, JIAO Min, ZHAO Suyun, CHEN Hong, LI Cuiping. Prompt-conditioned Representation Learning with Diffusion Models for Semi-supervised Clustering [J]. Computer Science, 2026, 53(3): 158-165.
[4] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[5] ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341.
[6] ZHUO Tienong, YING Di, ZHAO Hui. Research on Student Classroom Concentration Integrating Cross-modal Attention and Role
Interaction
[J]. Computer Science, 2026, 53(2): 67-77.
[7] XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169.
[8] HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244.
[9] GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[10] JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263.
[11] LYU Jinggang, GAO Shuo, LI Yuzhi, ZHOU Jin. Facial Expression Recognition with Channel Attention Guided Global-Local Semantic Cooperation [J]. Computer Science, 2026, 53(1): 195-205.
[12] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[13] KALZANG Gyatso, NYIMA Tashi, QUN Nuo, GAMA Tashi, DORJE Tashi, LOBSANG Yeshi, LHAMO Kyi, ZOM Kyi. Data Augmentation Methods for Tibetan-Chinese Machine Translation Based on Long-tail Words [J]. Computer Science, 2026, 53(1): 224-230.
[14] WANG Haoyan, LI Chongshou, LI Tianrui. Reinforcement Learning Method for Solving Flexible Job Shop Scheduling Problem Based onDouble Layer Attention Network [J]. Computer Science, 2026, 53(1): 231-240.
[15] CHEN Qian, CHENG Kaixuan, GUO Xin, ZHANG Xiaoxia, WANG Suge, LI Yanhong. Bidirectional Prompt-Tuning for Event Argument Extraction with Topic and Entity Embeddings [J]. Computer Science, 2026, 53(1): 278-284.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!