计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 251-260.doi: 10.11896/jsjkx.220500100
郭伟, 黄嘉晖, 侯晨煜, 曹斌
GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin
摘要: 文本分类是自然语言处理中重要且经典的问题,常被应用于新闻分类、情感分析等场景。目前,基于深度学习的分类方法已经取得了较大的成功,但在实际应用中仍然存在以下3个方面的问题:1)现实生活中的文本数据存在大量的噪声标签,直接用这些数据训练模型会严重影响模型的性能;2)随着预训练模型的提出,模型分类准确率有所提升,但模型的规模和推理计算量也随之提升明显,使得在资源有限的设备上使用预训练模型成为一项挑战;3)预训练模型存在大量的冗余计算,当数据量较大时会导致模型出现预测效率低下的问题。针对上述问题,提出了一个融合抗噪和双重蒸馏(包括知识蒸馏和自蒸馏)的文本分类方法,通过基于置信学习的阈值抗噪方法和一种新的主动学习样例选择算法,以少量的标注成本提升数据的质量。同时,通过知识蒸馏结合自蒸馏的方式,减小了模型规模和冗余计算,进而使其可以根据需求灵活调整推理速度。在真实数据集上进行了大量实验来评估该方法的性能,实验结果表明所提方法在抗噪后准确率提升了1.18%,在较小的精度损失下相比BERT可以加速4~8倍。
中图分类号:
[1]HAN X,ZHAO W,DING N,et al.Ptr:Prompt tuning with rules for text classification[J].AI Open,2022,3,182-192. [2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26,3111-3119. [3]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30,6000-6010. [4]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pretraining ofdeep bidirectionaltransformers for language understanding[J].arXiv:1810.04805,2019. [5]KOVALEVA O,ROMANOV A,ROGERS A,et al.Revealing the Dark Secrets of BERT[C]//Proceedings of the 2019 Confe-rence on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hongkong:Association for Computational Linguistics,2019:4365-4374. [6]NORTHCUTT C,JIANG L,CHUANG I.Confident learning:Estimating uncertainty in dataset labels[J].Journal of Artificial Intelligence Research,2021,70:1373-1411. [7]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[C]//International Conference on Learning Representations.Canada:OpenReview.net,2018:1-13. [8]TIAN X X.An Improved Algorithm of Active Learning Based on Multiclass Classification[D].Baoding:Hebei University,2017. [9]GORDON M,DUH K,ANDREWS N.Compressing BERT:Studying the Effects of Weight Pruning on Transfer Learning[C]//Proceedings of the 5thWorkshop on Representation Learning for NLP.On-line:Association for Computational Linguistics,2020:143-155. [10]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of La-nguage Representations[C]//International Conference on Learning Representations.Formerly Addis Ababa ETHIOPIA:2019:1-17. [11]GOU J,YU B,MAYBANK S J,et al.Knowle-dge distillation:A survey[J].International Journal of Computer Vision,2021,34:1-31. [12]LIU W,ZHOU P,ZHAO Z,et al.Fastbert:a self-distilling bert with adaptive inference time[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:Association for Computational Linguistics,2020:6035-6044. [13]TANAKA D,IKAMI D,YAMASAKI T,et al.Joint optimization framework for learning withnois-y labels[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:5552-5560. [14]LIN S,JI R,CHEN C,et al.ESPACE:Acceler-ating convolu-tional neural networks via eliminating spatial and channel redundancy[C]//Thirty-First AAAI Conference on Artificial Intelligence.San Francisco:AAAI Press,2017:1424-1430. [15]ZAFRIR O,BOUDOUKH G,IZSAK P,et al.Q8bert:Quantized 8bit bert[C]//2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition(EMC2-NIPS).Vancouver:IEEE,2019:36-39. [16]JIAO X Q,YIN Y C,SHANG L F,et al.TinyBERT:Distilling BERT for Natural Language Understanding[C]//Findings of the Association for Computational Linguistics(EMNLP 2020)2020:4163-4174. [17]SUN S,CHENG Y,GAN Z,et al.Patient Knowledge Distil-lation for BERT Model Compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hongkong:Association for Computational Linguistics,2019:4323-4332. [18]SANH V,DEBUTL,CHAUMONDJ,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter[J].arXiv:1910.01108,2019. [19]QIU Y Y,LI H Z,LI S,et al.Revisiting correl-ations between intrinsic and extrinsic evaluations of word embeddings[C]//Chinese Computational Linguistics and Natural Language Proces-sing Based on Naturally Annotated Big Data.Cham:Springer,2018.209-221. [20]SCHÜTZE H,MANNING C D,RAGHAVAN P.Introductionto information retrieval[M].Cambridge:Cambridge University Press,2008. [21]CUI Y,CHE W,LIU T,et al.Pre-training with whole wordmasking for chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514. [22]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[J].arXiv:1907.11692,2019. [23]LI J,LIU X,ZHAO H,et al.BERT-EMD:Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).Online:Association for Computational Linguistics,2020:3009-3018. |
|