Computer Science ›› 2023, Vol. 50 ›› Issue (6): 251-260.doi: 10.11896/jsjkx.220500100

• Artificial Intelligence • Previous Articles     Next Articles

Text Classification Method Based on Anti-noise and Double Distillation Technology

GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin   

  1. College of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2022-05-12 Revised:2022-10-12 Online:2023-06-15 Published:2023-06-06
  • About author:GUO Wei,born in 2001,postgraduate,is a member of China Computer Federation.Her main research interest is natural language processing.HOU Chenyu,born in 1994,Ph.D,lecturer,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining and so on.
  • Supported by:
    National Natural Science Foundation of China(62276233) and Key R&D Program of Zhejiang Province(2022C01145).

Abstract: Text classification is an important and classic problem in the field of natural language processing,and it is often used in news classification,sentiment analysis and other scenarios.The existing deep learning-based classification methods have the following three problems:1)There are a large number of noisy labels in real-life datasets,and directly using these data to train the model will seriously affect the performance of the model.2)With the introduction of the pre-training model,the accuracy of model classification has improved,but the scale of the model and the number of inference calculations have also increased significantly,which make it a challenge to use pre-training models on devices with limited resources.3)The pre-training model has a large number of redundant calculations,which will lead to low prediction efficiency when the amount of data is large.To address these issues,this paper proposes a text classification method that combines anti-noise and double distillation(including knowledge distillation and self-distillation).Through the threshold anti-noise method based on confidence learning and a new active learning sample selection algorithm,the quality of the data is improved with a small amount of labeling cost.Meanwhile,the combination of knowledge distillation and self-fistillation reduces the scale of the model and redundant calculation,thereby it can flexibly adjust the inference speed according to the demand.Extensive experiments are performed on real datasets to evaluate the performance of the proposed method.Experimental results show that the accuracy of the proposed method after anti-noise increases by 1.18%,and it can be 4~8 times faster than BERT under small accuracy losses.

Key words: Noise label, Confidence learning, Active learning, Knowledge distillation, Self-distillation

CLC Number: 

  • TP391
[1]HAN X,ZHAO W,DING N,et al.Ptr:Prompt tuning with rules for text classification[J].AI Open,2022,3,182-192.
[2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26,3111-3119.
[3]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30,6000-6010.
[4]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pretraining ofdeep bidirectionaltransformers for language understanding[J].arXiv:1810.04805,2019.
[5]KOVALEVA O,ROMANOV A,ROGERS A,et al.Revealing the Dark Secrets of BERT[C]//Proceedings of the 2019 Confe-rence on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hongkong:Association for Computational Linguistics,2019:4365-4374.
[6]NORTHCUTT C,JIANG L,CHUANG I.Confident learning:Estimating uncertainty in dataset labels[J].Journal of Artificial Intelligence Research,2021,70:1373-1411.
[7]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[C]//International Conference on Learning Representations.Canada:OpenReview.net,2018:1-13.
[8]TIAN X X.An Improved Algorithm of Active Learning Based on Multiclass Classification[D].Baoding:Hebei University,2017.
[9]GORDON M,DUH K,ANDREWS N.Compressing BERT:Studying the Effects of Weight Pruning on Transfer Learning[C]//Proceedings of the 5thWorkshop on Representation Learning for NLP.On-line:Association for Computational Linguistics,2020:143-155.
[10]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of La-nguage Representations[C]//International Conference on Learning Representations.Formerly Addis Ababa ETHIOPIA:2019:1-17.
[11]GOU J,YU B,MAYBANK S J,et al.Knowle-dge distillation:A survey[J].International Journal of Computer Vision,2021,34:1-31.
[12]LIU W,ZHOU P,ZHAO Z,et al.Fastbert:a self-distilling bert with adaptive inference time[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:Association for Computational Linguistics,2020:6035-6044.
[13]TANAKA D,IKAMI D,YAMASAKI T,et al.Joint optimization framework for learning withnois-y labels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:5552-5560.
[14]LIN S,JI R,CHEN C,et al.ESPACE:Acceler-ating convolu-tional neural networks via eliminating spatial and channel redundancy[C]//Thirty-First AAAI Conference on Artificial Intelligence.San Francisco:AAAI Press,2017:1424-1430.
[15]ZAFRIR O,BOUDOUKH G,IZSAK P,et al.Q8bert:Quantized 8bit bert[C]//2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition(EMC2-NIPS).Vancouver:IEEE,2019:36-39.
[16]JIAO X Q,YIN Y C,SHANG L F,et al.TinyBERT:Distilling BERT for Natural Language Understanding[C]//Findings of the Association for Computational Linguistics(EMNLP 2020)2020:4163-4174.
[17]SUN S,CHENG Y,GAN Z,et al.Patient Knowledge Distil-lation for BERT Model Compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hongkong:Association for Computational Linguistics,2019:4323-4332.
[18]SANH V,DEBUTL,CHAUMONDJ,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter[J].arXiv:1910.01108,2019.
[19]QIU Y Y,LI H Z,LI S,et al.Revisiting correl-ations between intrinsic and extrinsic evaluations of word embeddings[C]//Chinese Computational Linguistics and Natural Language Proces-sing Based on Naturally Annotated Big Data.Cham:Springer,2018.209-221.
[20]SCHÜTZE H,MANNING C D,RAGHAVAN P.Introductionto information retrieval[M].Cambridge:Cambridge University Press,2008.
[21]CUI Y,CHE W,LIU T,et al.Pre-training with whole wordmasking for chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[22]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[J].arXiv:1907.11692,2019.
[23]LI J,LIU X,ZHAO H,et al.BERT-EMD:Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).Online:Association for Computational Linguistics,2020:3009-3018.
[1] QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[2] ZHAO Jiangjiang, WANG Yang, XU Yingying, GAO Yang. Extractive Automatic Summarization Model Based on Knowledge Distillation [J]. Computer Science, 2023, 50(6A): 210300179-7.
[3] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[4] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[5] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[6] CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[7] HUANG Yu-jiao, ZHAN Li-chao, FAN Xing-gang, XIAO Jie, LONG Hai-xia. Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM [J]. Computer Science, 2022, 49(11A): 211200181-6.
[8] ZHANG Da-lin, ZHANG Zhe-wei, WANG Nan, LIU Ji-qiang. AutoUnit:Automatic Test Generation Based on Active Learning and Prediction Guidance [J]. Computer Science, 2022, 49(11): 39-48.
[9] XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan. Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition [J]. Computer Science, 2022, 49(11): 156-162.
[10] MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin. Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation [J]. Computer Science, 2022, 49(10): 159-168.
[11] HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network [J]. Computer Science, 2022, 49(10): 169-175.
[12] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[13] YU Liang, WEI Yong-feng, LUO Guo-liang, WU Chang-xing. Knowledge Distillation Based Implicit Discourse Relation Recognition [J]. Computer Science, 2021, 48(11): 319-326.
[14] WANG Ti-shuang, LI Pei-feng, ZHU Qiao-ming. Chinese Implicit Discourse Relation Recognition Based on Data Augmentation [J]. Computer Science, 2021, 48(10): 85-90.
[15] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!