中文预训练模型研究进展

doi:10.11896/jsjkx.211200018

摘要/Abstract

摘要： 近年来,预训练模型在自然语言处理领域蓬勃发展,旨在对自然语言隐含的知识进行建模和表示,但主流预训练模型大多针对英文领域。中文领域起步相对较晚,鉴于其在自然语言处理过程中的重要性,学术界和工业界都开展了广泛的研究,提出了众多的中文预训练模型。文中对中文预训练模型的相关研究成果进行了较为全面的回顾,首先介绍预训练模型的基本概况及其发展历史,对中文预训练模型主要使用的两种经典模型Transformer和BERT进行了梳理,然后根据不同模型所属类别提出了中文预训练模型的分类方法,并总结了中文领域的不同评测基准,最后对中文预训练模型未来的发展趋势进行了展望。旨在帮助科研工作者更全面地了解中文预训练模型的发展历程,继而为新模型的提出提供思路。

关键词: 词向量, 深度学习, 预处理, 中文预训练模型, 自然语言处理

Abstract: In recent years,pre-training models have flourished in the field of natural language processing,aiming at modeling and representing the implicit knowledge of natural language.However,most of the mainstream pre-training models target at the English domain,and the Chinese domain starts relatively late.Given its importance in the natural language processing process,extensive research has been conducted in both academia and industry,and numerous Chinese pre-training models have been proposed.This paper presents a comprehensive review of the research results related to Chinese pre-training models,firstly introducing the basic overview of pre-training models and their development history,then sorting out the two classical models Transformer and BERT that are mainly used in Chinese pre-training models,then proposing a classification method for Chinese pre-training models according to model categories,and summarizes the different evaluation benchmarks in the Chinese domain.Finally,the future development trend of Chinese pre-training models is prospected.It aims to help researchers to gain a more comprehensive understanding of the development of Chinese pre-training models,and then to provide some ideas for the proposal of new models.

Key words: Chinese pre-training models, Deep learning, Natural language processing, Pre-training, Word embedding

中图分类号:

TP391

侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展[J]. 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018

HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models[J]. Computer Science, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018

参考文献

[1]LIU P F,QIU X P,HUANG X J.Recurrent neura lnetwork for text classification with multi-task learning[C]//Proceedings of the 2016 Conference on IJCAI.2016:2073-2879.
[2]KRIZHEVSKY A,SUSKEVER I,HINTON G E.ImageNetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.London:MIT Press,2012:1097-1105.
[3]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473v7,2014.
[4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781v1,2013.
[6]PENNINGTON J,SOCHER R,MANNING C D.GloVe:Global Vectors for Word Representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[7]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[J].arXiv:1607.01759,2016.
[8]PETERS M,NEUMANN M,LYYER M,et al.Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237.
[9]SHI X,CHEN Z,WANG H,et al.Convolutional LSTM Net-work:A Machine Learning Approach for Precipitation Nowcas-ting[J].arXiv:1506.04214,2015.
[10]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by g-enerative pre-training[OL].[2022-04-15].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[11]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[J].arXiv:1804.07461,2018.
[12]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[13]CHO K,MERRIENBOER B V,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Confe-rence on Empirical Methods in Natural Language Processing (EMNLP).2014:1724-1734.
[14]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[15]WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[16]SUN Y,WANG SH,LI Y K,et al.ERNIE:enha-nced representation through knowledge integration[J].arXiv:1904.09223,2019.
[17]WEI J,REN X,LI X,et al.NEZHA:Neural Co-ntextualizedRepresentation for Chinese Language Understanding[J].arXiv:1909.00204,2019.
[18]CUI Y,CHE W,LIU T,et al.Revisiting Pre-Trained Models for Chinese Natural Language Processing[J].arXiv:2004.13922,2020.
[19]LIU Y H,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach.[J].arXiv:1907.11692,2019.
[20]ERLANGSHEN Pre-training model [OL].[2021-11-15].https://huggingface.co/IDEA-CCNL/Erlangshen-1.3B.
[21]PEKRS Pre-training model [OL].[2021-11-16].https://mp.weixin.qq.com/s/r85W7T26vy6_IIRAWY1ZKA.
[22]LAI Y,LIU Y,FENG Y,et al.Lattice-BERT:Leveraging Multi-Granularity Representations in Chinese Pre-Trained Language Models[J].arXiv:2104.07204,2021.
[23]ZHANG Z,GU Y,HAN X,et al.CPM-2:Large-Scale Cost-Effective Pre-Trained Language Models[J].arXiv:2106.10715,2021.
[24]MOTIAN Pre-training model[OL].[2021-06-24]https://mp.weixin.qq.com/s/HQL0Hk49UR6kVNtrvcXEGA.
[25]ZHANG R,PANG C,ZHANG C,et al.Correcting ChineseSpelling Errors with Phonetic Pre-Training[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP.2021:2250-2261.
[26]SHAW P,USZKOREIT J,VASWANI A.Self-Attention withRelative Position Representations[J].arXiv:1803.02155,2018.
[27]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing int-ernal covariate shift[C]//International Conference on Machine Learning.2015:448-456.
[28]BA J L,KIROS J R,HINTON G E.Layer norm-alization[J].arXiv:1607.06450,2016.
[29]BERTSG Pre-training model [OL].[2021-03-15].https://baijiahao.baidu.com/s?id=1695185167027662850&wfr=spider&for=pc.
[30]DING M,YANG Z,HONG W,et al.CogView:Mastering Text-to-Image Generation via Transf- ormers[J].arXiv:2105.13290,2021.
[31]SHAZEER N,MISHOSEINI N,MAZIARZ K,et al.Outra-geously Large Neural Networks:The Sparsely-Gated Mixture-of-Experts Layer[J].arXiv:1701.06538,2017.
[32]LIN J,MEN R,YANG A,et al.M6:A Chinese Multimodal Pretrainer[J].arXiv:2103.00823,2021.
[33]YANG A,LIN J,MEN R,et al.M6-T:Exploring Sparse Expert Models and Beyond[J].arXiv:2105.15082.2021.
[34]DIAO S Z,BAI J X,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Fin-dings of the Association for Computational Linguistics.2020:4729-4740.
[35]SONG Y,ZHANG T,WANG Y,et al.ZEN 2.0:Continue Trainingand Adaption for N-gram En-hanced Text Encoders[J].arXiv:2105.01279,2021.
[36]ZHANG X,LI P,LI H.AMBERT:A Pre-Trained LanguageModel withMulti-Grained Tokenization[J].arXiv:2008.11869,2020.
[37]GUO W,ZHAO M,ZHANG L,et al.LICHEE:Improving Language Model Pre-Training with Multi-Grained Tokenization[J].arXiv:2108.00801,2021.
[38]WoBERT Pre-training model [OL].[2020-09-18].https://ke-xue.fm/archives/7758.
[39]PLUG Pre-training model [OL].[2021-04-19].https://mp.weixin.qq.com/s/-aV6Hh-BFoW41HQop_Z02w.
[40]WANG W,BI B,YAN M,et al.StructBERT:IncorporatingLanguage Structures into Pre-Training for Deep Language Understanding[J].arXiv:1908.04577,2019.
[41]BI B,LI C,WU C,et al.PALM:Pre-training an Autoencoding &Autoregressive Language Model for Context-conditionedGene-ration[J].arXiv:2004.07159,2020.
[42]SHAO Y,GENG Z,LIU Y,et al.CPT:A Pre-Trained Unba-lanced Transformer for Both Chinese Language Understanding and Generation[J].arXiv:2109.05729,2021.
[43]SUN Y,WANG S,FENG S,et al.ERNIE 3.0:Large-ScaleKnowledge Enhanced Pre-Training for Language Understanding and Generation[J].arXiv:2107.02137,2021.
[44]WANG S,SUN Y,XIANG Y,et al.ERNIE 3.0 Titan:Exploring Larger-scale Knowledge Enhan-ced Pre-training for Language Understanding and Generation[J].arXiv:2112.12731,2021.
[45]SHEN Z.Pre-training model [OL].[2021-09-30].https://www.jiqizhixin.com/articles/2021-09-30-2.
[46]SUN Z,LI X,SUN X,et al.ChineseBERT:Chi-nese Pretraining Enhanced by Glyph and Pinyin Information[C]//Proceedings of the 59th Annual Meeting of the Association for Computational L-inguistics and the 11th International Joint Conference on Na-tural Language Processing(Volume 1:Long Papers).2021:2065-2075.
[47]ZHANG Z,ZHANG H,CHEN K,et al.Mengzi:TowardsLightweight yet Ingenious Pre-Trained Models for Chinese[J].arXiv:2110.06696,2021.
[48]SHEN N.Pre-training model [OL].[2021-10-20].https://mp.weixin.qq.com/s/coW_OIbRA4lwVLZaRyxO9Q.
[49]HUO Y,ZHANG M,LIU G,et al.WenLan:Bri-dging Visionand Language by Large-Scale Multi-Modal Pre-Training[J].arXiv:2103.06561,2021.
[50]OORD A,VINYALS O,KAVUKCUOGLU K.Neural discreterepresentation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6309-6318.
[51]LIU J,ZHU X,LIU F,et al.OPT:Omni-Percept-ion Pre-Trai-ner for Cross-Modal Understanding and Generation[J].arXiv:2107.00249,2021.
[52]ZHANG Z,HAN X,ZHOU H,et al.CPM:A Large-Scale Ge-nerative Chinese Pre-Trained Language Model[J].arXiv:2012.00413,2020.
[53]WU S,ZHAO X,YU T,et al.Yuan 1.0:Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning[J].arXiv:2110.04725,2021.
[54]SUN Y,WANG S,LI Y,et al.ERNIE 2.0:AContinual Pre-Training Framework for Language Understanding[J].Procee-dings of the AAAI Co-nference on Artificial Intelligence,2020,34(5):8968-8975.
[55]XIAO C,HU X,LIU Z,et al.Lawformer:A Pre-Trained Language Model for Chinese Legal Long Documents[J].arXiv:2105.03887,2021.
[56]BELTAGY I,PETERS M E,COHAN A.Long-former:TheLong-Document Transformer[J].arXiv:2004.05150,2020.
[57]ZENG W,REN X,SU T,et al.PanGu-$\alpha$:Large-Scale Autoregressive Pretrained Chinese Language Models with Auto-Parallel Computation[J].arXiv:2104.12369,2021.
[58]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2017.
[59]LESTER B,AL-RFOU R,CONSTANT N.Thepower of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021.
[60]BAO S,HE H,WANG F,et al.PLATO-2:Tow-ards Building an Open-Domain Chatbot via Curr-iculum Learning[J].arXiv:2006.16779,2020.
[61]BAO S,HE H,WANG F,et al.PLATO-XL:Ex-ploring theLarge-Scale Pre-Training of Dialogue Generation[J].arXiv:2109.09519,2021.
[62]WANG Y,KE P,ZHENG Y,et al.A Large-Scale ChineseShort-Text Conversation Dataset[J].arXiv:2008.03946,2020.
[63]ZHOU H,KE P,ZHANG Z,et al.EVA:An Op-en-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training[J].arXiv:2108.01547,2021.
[64]LIU Z,HUANG D,HUANG D,et al.FinBERT:A Pre-trained Financial Language Representati- on Model for Financial Text Mining[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence.2021:4513-4519.
[65]TAL-EduBERT Pre-training model [OL].[2021-01-26].https://github.com/tal-tech/edu-bert.
[66]GUWENBERT Pre-training model [OL].[2021-08-31]https://github.com/ethan-yt/guwenbert.
[67]BERT-CCPoem Pre-training model [OL].[2021-7-5]https://github.com/THUNLP-AIPoet/BERT-CCPoem.
[68]ZHANG N,JIA Q,YIN K,et al.Conceptualized Representation Learning for Chinese Biomedical Text Mining[J].arXiv:2008.10813,2020.
[69]HUI B,SHI X,GENG R,et al.Improving Text-to-SQL withSchema Dependency Learning[J].arXiv:2103.04399,2021.
[70]LAN Z,CHEN M,GOODMAN S,et al.ALBE-RT:A LiteBERT for Self-Supervised Learning of Language Representations[J].arXiv:1909.11942,2019.
[71]YANG Z,DAI Z,YANG Y,et al.XLNet:Gene-ralized Auto-regressive Pretraining for Language Understanding[J].arXiv:1906.08237,2019.
[72]CLARK K,LUONG M T,LE Q V,et al.ELEC-TRA:Pre-Training Text Encoders as Discrimina-tors Rather Than Gene-rators[J].arXiv:2003.10555,2020.
[73]SU J,LU Y,PAN S,et al.RoFormer:Enhanced Transformer with Rotary Position Embedding[J].arXiv:2104.09864,2021.
[74]DAI Z,YANG Z,YANG,Y,et al.Transformer-XL:Attentive Language Models Beyond a Fixed-Length Context[J].arXiv:1901.02860,2019.
[75]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J].arXiv:1910.10683,2019.
[76]ZHANG J,ZHAO Y,SALEH M,et al.PEGASU-S:Pre-Trai-ning with Extracted Gap-Sentences for Abstractive Summarization[J].arXiv:1912.08777,2019.
[77]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[78]DONG L,YANG N,WANG W,et al.Unified Language ModelPre-Training for Natural Lang-uage Understanding and Generation[J].arXiv:1905.03197,2019.
[79]SU J L.SimBERT Pretraining model [OL].[2020-05-18].https://www.spaces.ac.cn/archives/7427.
[80]SU J L.RoFormer-Sim Pretraining model [OL].[2021-06-11].https://www.spaces.ac.cn/archives/8454.
[81]XU L,ZHANG X,DONG Q.CLUECorpus2020:A Large-Scale Chinese Corpus for Pre-Training Language Model[J].arXiv:2003.01355,2020.
[82]XU L,HU H,ZHANG X,et al.CLUE:A Chinese Language Understanding Evaluation Benchmark[J].arXiv:2004.05986,2020.
[83]ZHANG N,CHEN M,BI Z,et al.CBLUE:A Chinese Bio-medical Language Understanding Evaluation Benchmark[J].ar-Xiv:2106.08087,2021.
[84]YAO Y,DONG Q,GUAN J,et al.CUGE:A Chinese Language Understanding and Generation Evaluation Benchmark[J].ar-Xiv:2112.13610,2021.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[9]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[10]	苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[11]	姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071
[12]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14]	王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124
[15]	楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究 Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed