计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211200181-6.doi: 10.11896/jsjkx.211200181

• 人工智能 • 上一篇    下一篇

基于知识蒸馏模型ELECTRA-base-BiLSTM的文本分类

黄玉娇1, 詹李超1, 范兴刚1, 肖杰2, 龙海霞2   

  1. 1 浙江工业大学之江学院 浙江 绍兴 312030
    2 浙江工业大学计算机科学与技术学院 杭州 310000
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 黄玉娇(huangyuajiao@zjut.edu.cn)
  • 基金资助:
    国家自然科学基金(61972354,62106225);浙江省自然科学基金(LY20F020024,LZ22F020011)

Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM

HUANG Yu-jiao1, ZHAN Li-chao1, FAN Xing-gang1, XIAO Jie2, LONG Hai-xia2   

  1. 1 College of Zhijiang,Zhejiang University of Technology,Shaoxing,Zhejiang 312030,China
    2 College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310000,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HUANG Yu-jiao,born in 1985,Ph.D,associate professor.Her main research interests include deep learning,text data analysis and dynamic characteristics of neural networks.
  • Supported by:
    National Natural Science Foundation of China(61972354,62106225) and Natural Science Foundation of Zhejiang Pvovince,China(LY20F020024,LZ22F020011).

摘要: 文本情感分析常用于口碑分析、话题监控、舆情分析,是自然语言处理最活跃的研究领域之一。深度学习中的预训练语言模型能解决文本情感分类任务中的一词多义、受词性及其位置影响等问题。但其模型复杂,参数繁多,导致消耗巨大资源,模型难部署。针对上述问题,采用知识蒸馏的思想,使用ELECTRA预训练模型作为教师模型,BiLSTM作为学生模型,提出基于ELECTRA-base-BiLSTM的蒸馏模型。将文本“one-hot”编码的词向量表示作为蒸馏模型的输入,进行中文文本情感分类。通过实验验证,分别比较加入ALBERT-tiny,ALBERT-base,BERT-base,BERT-wwm-ext,ERNIE-1.0,ERNIE-GRAM,ELECTRA-base这7种教师模型的蒸馏结果。研究发现ELECTRA-base-BiLSTM蒸馏模型的准确率、精确率和综合评价指标最高,情感分类效果最好,可以获得接近ELECTRA语言模型的文本情感分类结果,比轻量级浅层网络BiLSTM模型的分类准确率高5.58%。此模型在降低ELECTRA模型复杂度,减少资源消耗的同时,提升了轻量级BiLSTM模型的中文文本情感分类效果,对后续文本情感分类的研究具有一定的参考价值。

关键词: 文本情感分类, 预训练语言模型, 模型压缩, 知识蒸馏, ELECTRA-base-BiLSTM

Abstract: Text emotion analysis is often used in word-of-mouth analysis,topic monitoring and public opinion analysis.It is one of the most active research fields of natural language processing.The pre-training language model in deep learning can solve the problems of polysemy,part of speech and its position in text emotion classification task.However,its model is complex and has many parameters,which leads to huge consumption of resources and difficult deployment of the model.To solve the above problems,using the idea of knowledge distillation,using ELECTRA pre-training model as teacher model and BiLSTM as student model,a distillation model based on ELECTRA-base-BiLSTM is proposed.The word vector representation encoded by text “one-hot” is used as the input of distillation model to classify Chinese text emotion.Through experimental verification,the distillation results of seven teacher models including ALBERT-tiny,ALBERT-base,BERT-base,BERT-wwm-ext,ERNIE-1.0,ERNIE-GRAM and ELECTRA-base are compared respectively.It is found that ELECTRA-base-BiLSTM distillation model has the highest accuracy,precision and comprehensive evaluation indicators,and the best emotion classification effect,which can obtain text emotion classification results close to ELECTRA language model.Its classification accuracy is 5.58% higher than that of lightweight shallow network BiLSTM model.This model not only reduces the complexity of ELECTRA model and reduces resource consumption,but also improves the effect of Chinese text emotion classification of lightweight BiLSTM model,which has a certain reference value for the subsequent research of text emotion classification.

Key words: Text emotion classification, Pre-training language model, Model compression, Knowledge distillation, ELECTRA-base-BiLSTM

中图分类号: 

  • TP391
[1]DAI A M,LE Q V.Semi-supervised Sequence Learning[EB/OL].(2015-11-04) [2021-11-28].https://arxiv.org/ pdf/ 1511.01432.pdf.
[2]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of NAACL- HLT.2018:2227-2237.
[3]ZHENG X,LIANG P J.Chinese Sentiment Analysis Using Bidirectional LSTM with Word Embedding[C]//ICCCS.Proceedings of the 2nd International Conference on Cloud Computing and Security.Springer,2016:601-610.
[4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of IEEE NAACL’19.IEEE Press,2019:4171-4186.
[5]SUN Y,WANG S,LI Y,et al.ERNIE:Enhanced Representa-tion through Knowledge Integration[EB/OL].(2019-04-19) [2021-11-28].https://arxiv.org/ pdf/1904.09223v1.pdf.
[6]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of Language Representations[EB/OL].(2019-09-26) [2021-11-28].https://arxiv.org/pdf/1909.11942v3.pdf.
[7]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pretrain-ing Text Encoders as Discriminators Rather Than Generators[EB/OL].(2020-03-23) [2021-11-28].https://arxiv.org/pdf/2003.10555.pdf.
[8]HINTON G,VINYALS O,DEAN J.Distilling the Knowledge in a Neural Network[J].Computer Science,2015,14(7):38-39.
[9]TANG R,LU Y,LIU L,et al.Distilling Task-Specific Know-ledge from BERT into Simple Neural Networks[EB/OL].(2019-03-28) [2021-11-28].https://arxiv.org/pdf/1903.12136.pdf.
[10]SUN S,CHENG Y,GAN Z,et al.Patient Knowledge Distillation for BERT Model Compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:4314-4323.
[11]CUI Y,CHE W,LIU T,et al.Revisiting Pre-Trained Models for Chinese Natural Language Processing[EB/OL].(2020-11-02) [2021-11-28].https:// arxiv.org/ pdf/ 2004.13922v2.pdf.
[12]TAN S,ZHANG J.An empirical study of sentiment analysis for chinese documents[J].Expert Systems with Applications,2008,34(4):2622-2629.
[1] 楚玉春, 龚航, 王学芳, 刘培顺.
基于YOLOv4的目标检测知识蒸馏算法研究
Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4
计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204
[2] 程祥鸣, 邓春华.
基于无标签知识蒸馏的人脸识别模型的压缩算法
Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation
计算机科学, 2022, 49(6): 245-253. https://doi.org/10.11896/jsjkx.210400023
[3] 邓朋飞, 官铮, 王宇阳, 王学.
基于迁移学习和模型压缩的玉米病害识别方法
Identification Method of Maize Disease Based on Transfer Learning and Model Compression
计算机科学, 2022, 49(11A): 211200009-6. https://doi.org/10.11896/jsjkx.211200009
[4] 肖正业, 林世铨, 万修安, 方昱春, 倪兰.
基于时序信息对齐的连续手语跨模态知识蒸馏
Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition
计算机科学, 2022, 49(11): 156-162. https://doi.org/10.11896/jsjkx.220600036
[5] 张洲, 朱俊国, 余正涛.
融合词性与声调特征的越南语语法错误检测
Incorporating Part of Speech and Tonal Features for Vietnamese Grammatical Error Detection
计算机科学, 2022, 49(11): 221-227. https://doi.org/10.11896/jsjkx.210900247
[6] 苗壮, 王亚鹏, 李阳, 王家宝, 张睿, 赵昕昕.
一种鲁棒的双教师自监督蒸馏哈希学习方法
Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation
计算机科学, 2022, 49(10): 159-168. https://doi.org/10.11896/jsjkx.210800050
[7] 黄仲浩, 杨兴耀, 于炯, 郭亮, 李想.
基于多阶段多生成对抗网络的互学习知识蒸馏方法
Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network
计算机科学, 2022, 49(10): 169-175. https://doi.org/10.11896/jsjkx.210800250
[8] 冯钧, 魏大保, 苏栋, 杭婷婷, 陆佳民.
文档级实体关系抽取方法研究综述
Survey of Document-level Entity Relation Extraction Methods
计算机科学, 2022, 49(10): 224-242. https://doi.org/10.11896/jsjkx.211000057
[9] 陈志文, 王坤, 周广蕴, 王旭, 张晓丹, 朱虎明.
基于胶囊网络及其权重剪枝的SAR图像变化检测方法
SAR Image Change Detection Method Based on Capsule Network with Weight Pruning
计算机科学, 2021, 48(7): 190-198. https://doi.org/10.11896/jsjkx.200800225
[10] 潘芳, 张会兵, 董俊超, 首照宇.
基于高效Transformer的中文在线课程评论方面情感分析
Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer
计算机科学, 2021, 48(6A): 264-269. https://doi.org/10.11896/jsjkx.200800116
[11] 丁玲, 向阳.
基于分层次多粒度语义融合的中文事件检测
Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion
计算机科学, 2021, 48(5): 202-208. https://doi.org/10.11896/jsjkx.200800038
[12] 邹傲, 郝文宁, 靳大尉, 陈刚, 田媛.
基于预训练和深度哈希的大规模文本检索研究
Study on Text Retrieval Based on Pre-training and Deep Hash
计算机科学, 2021, 48(11): 300-306. https://doi.org/10.11896/jsjkx.210300266
[13] 俞亮, 魏永丰, 罗国亮, 邬昌兴.
基于知识蒸馏的隐式篇章关系识别
Knowledge Distillation Based Implicit Discourse Relation Recognition
计算机科学, 2021, 48(11): 319-326. https://doi.org/10.11896/jsjkx.201000099
[14] 王润正, 高见, 黄淑华, 仝鑫.
基于知识蒸馏的恶意代码家族检测方法
Malicious Code Family Detection Method Based on Knowledge Distillation
计算机科学, 2021, 48(1): 280-286. https://doi.org/10.11896/jsjkx.200900099
[15] 周志一, 宋冰, 段鹏松, 曹仰杰.
基于WiFi信号的轻量级步态识别模型LWID
LWID:Lightweight Gait Recognition Model Based on WiFi Signals
计算机科学, 2020, 47(11): 25-31. https://doi.org/10.11896/jsjkx.200200044
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!