Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 211200181-6.doi: 10.11896/jsjkx.211200181

• Artificial Intelligence • Previous Articles     Next Articles

Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM

HUANG Yu-jiao1, ZHAN Li-chao1, FAN Xing-gang1, XIAO Jie2, LONG Hai-xia2   

  1. 1 College of Zhijiang,Zhejiang University of Technology,Shaoxing,Zhejiang 312030,China
    2 College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310000,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HUANG Yu-jiao,born in 1985,Ph.D,associate professor.Her main research interests include deep learning,text data analysis and dynamic characteristics of neural networks.
  • Supported by:
    National Natural Science Foundation of China(61972354,62106225) and Natural Science Foundation of Zhejiang Pvovince,China(LY20F020024,LZ22F020011).

Abstract: Text emotion analysis is often used in word-of-mouth analysis,topic monitoring and public opinion analysis.It is one of the most active research fields of natural language processing.The pre-training language model in deep learning can solve the problems of polysemy,part of speech and its position in text emotion classification task.However,its model is complex and has many parameters,which leads to huge consumption of resources and difficult deployment of the model.To solve the above problems,using the idea of knowledge distillation,using ELECTRA pre-training model as teacher model and BiLSTM as student model,a distillation model based on ELECTRA-base-BiLSTM is proposed.The word vector representation encoded by text “one-hot” is used as the input of distillation model to classify Chinese text emotion.Through experimental verification,the distillation results of seven teacher models including ALBERT-tiny,ALBERT-base,BERT-base,BERT-wwm-ext,ERNIE-1.0,ERNIE-GRAM and ELECTRA-base are compared respectively.It is found that ELECTRA-base-BiLSTM distillation model has the highest accuracy,precision and comprehensive evaluation indicators,and the best emotion classification effect,which can obtain text emotion classification results close to ELECTRA language model.Its classification accuracy is 5.58% higher than that of lightweight shallow network BiLSTM model.This model not only reduces the complexity of ELECTRA model and reduces resource consumption,but also improves the effect of Chinese text emotion classification of lightweight BiLSTM model,which has a certain reference value for the subsequent research of text emotion classification.

Key words: Text emotion classification, Pre-training language model, Model compression, Knowledge distillation, ELECTRA-base-BiLSTM

CLC Number: 

  • TP391
[1]DAI A M,LE Q V.Semi-supervised Sequence Learning[EB/OL].(2015-11-04) [2021-11-28].https://arxiv.org/ pdf/ 1511.01432.pdf.
[2]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of NAACL- HLT.2018:2227-2237.
[3]ZHENG X,LIANG P J.Chinese Sentiment Analysis Using Bidirectional LSTM with Word Embedding[C]//ICCCS.Proceedings of the 2nd International Conference on Cloud Computing and Security.Springer,2016:601-610.
[4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of IEEE NAACL’19.IEEE Press,2019:4171-4186.
[5]SUN Y,WANG S,LI Y,et al.ERNIE:Enhanced Representa-tion through Knowledge Integration[EB/OL].(2019-04-19) [2021-11-28].https://arxiv.org/ pdf/1904.09223v1.pdf.
[6]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of Language Representations[EB/OL].(2019-09-26) [2021-11-28].https://arxiv.org/pdf/1909.11942v3.pdf.
[7]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pretrain-ing Text Encoders as Discriminators Rather Than Generators[EB/OL].(2020-03-23) [2021-11-28].https://arxiv.org/pdf/2003.10555.pdf.
[8]HINTON G,VINYALS O,DEAN J.Distilling the Knowledge in a Neural Network[J].Computer Science,2015,14(7):38-39.
[9]TANG R,LU Y,LIU L,et al.Distilling Task-Specific Know-ledge from BERT into Simple Neural Networks[EB/OL].(2019-03-28) [2021-11-28].https://arxiv.org/pdf/1903.12136.pdf.
[10]SUN S,CHENG Y,GAN Z,et al.Patient Knowledge Distillation for BERT Model Compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:4314-4323.
[11]CUI Y,CHE W,LIU T,et al.Revisiting Pre-Trained Models for Chinese Natural Language Processing[EB/OL].(2020-11-02) [2021-11-28].https:// arxiv.org/ pdf/ 2004.13922v2.pdf.
[12]TAN S,ZHANG J.An empirical study of sentiment analysis for chinese documents[J].Expert Systems with Applications,2008,34(4):2622-2629.
[1] CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 [J]. Computer Science, 2022, 49(6A): 337-344.
[2] CHENG Xiang-ming, DENG Chun-hua. Compression Algorithm of Face Recognition Model Based on Unlabeled Knowledge Distillation [J]. Computer Science, 2022, 49(6): 245-253.
[3] DENG Peng-fei, GUAN Zheng, WANG Yu-yang, WANG Xue. Identification Method of Maize Disease Based on Transfer Learning and Model Compression [J]. Computer Science, 2022, 49(11A): 211200009-6.
[4] XIAO Zheng-ye, LIN Shi-quan, WAN Xiu-an, FANGYu-chun, NI Lan. Temporal Relation Guided Knowledge Distillation for Continuous Sign Language Recognition [J]. Computer Science, 2022, 49(11): 156-162.
[5] MIAO Zhuang, WANG Ya-peng, LI Yang, WANG Jia-bao, ZHANG Rui, ZHAO Xin-xin. Robust Hash Learning Method Based on Dual-teacher Self-supervised Distillation [J]. Computer Science, 2022, 49(10): 159-168.
[6] HUANG Zhong-hao, YANG Xing-yao, YU Jiong, GUO Liang, LI Xiang. Mutual Learning Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network [J]. Computer Science, 2022, 49(10): 169-175.
[7] CHEN Zhi-wen, WANG Kun, ZHOU Guang-yun, WANG Xu, ZHANG Xiao-dan, ZHU Hu-ming. SAR Image Change Detection Method Based on Capsule Network with Weight Pruning [J]. Computer Science, 2021, 48(7): 190-198.
[8] PAN Fang, ZHANG Hui-bing, DONG Jun-chao, SHOU Zhao-yu. Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer [J]. Computer Science, 2021, 48(6A): 264-269.
[9] YU Liang, WEI Yong-feng, LUO Guo-liang, WU Chang-xing. Knowledge Distillation Based Implicit Discourse Relation Recognition [J]. Computer Science, 2021, 48(11): 319-326.
[10] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
[11] ZHOU Zhi-yi, SHONG Bing, DUAN Peng-song, CAO Yang-jie. LWID:Lightweight Gait Recognition Model Based on WiFi Signals [J]. Computer Science, 2020, 47(11): 25-31.
[12] LI Qing-hua, LI Cui-ping, ZHANG Jing, CHEN Hong, WANG Shao-qing. Survey of Compressed Deep Neural Network [J]. Computer Science, 2019, 46(9): 1-14.
[13] ZENG Yan, CHEN Yue-lin, CAI Xiao-dong. Deep Face Recognition Algorithm Based on Weighted Hashing [J]. Computer Science, 2019, 46(6): 277-281.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!