中文预训练模型研究进展

doi:10.11896/jsjkx.211200018

Abstract

Abstract: In recent years,pre-training models have flourished in the field of natural language processing,aiming at modeling and representing the implicit knowledge of natural language.However,most of the mainstream pre-training models target at the English domain,and the Chinese domain starts relatively late.Given its importance in the natural language processing process,extensive research has been conducted in both academia and industry,and numerous Chinese pre-training models have been proposed.This paper presents a comprehensive review of the research results related to Chinese pre-training models,firstly introducing the basic overview of pre-training models and their development history,then sorting out the two classical models Transformer and BERT that are mainly used in Chinese pre-training models,then proposing a classification method for Chinese pre-training models according to model categories,and summarizes the different evaluation benchmarks in the Chinese domain.Finally,the future development trend of Chinese pre-training models is prospected.It aims to help researchers to gain a more comprehensive understanding of the development of Chinese pre-training models,and then to provide some ideas for the proposal of new models.

Key words: Chinese pre-training models, Deep learning, Natural language processing, Pre-training, Word embedding

CLC Number:

TP391

HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models[J].Computer Science, 2022, 49(7): 148-163.

References

[1]LIU P F,QIU X P,HUANG X J.Recurrent neura lnetwork for text classification with multi-task learning[C]//Proceedings of the 2016 Conference on IJCAI.2016:2073-2879.
[2]KRIZHEVSKY A,SUSKEVER I,HINTON G E.ImageNetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.London:MIT Press,2012:1097-1105.
[3]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473v7,2014.
[4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.2019:4171-4186.
[5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781v1,2013.
[6]PENNINGTON J,SOCHER R,MANNING C D.GloVe:Global Vectors for Word Representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[7]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[J].arXiv:1607.01759,2016.
[8]PETERS M,NEUMANN M,LYYER M,et al.Deep Contextualized Word Representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.2018:2227-2237.
[9]SHI X,CHEN Z,WANG H,et al.Convolutional LSTM Net-work:A Machine Learning Approach for Precipitation Nowcas-ting[J].arXiv:1506.04214,2015.
[10]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by g-enerative pre-training[OL].[2022-04-15].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
[11]WANG A,SINGH A,MICHAEL J,et al.GLUE:A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[J].arXiv:1804.07461,2018.
[12]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[13]CHO K,MERRIENBOER B V,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Confe-rence on Empirical Methods in Natural Language Processing (EMNLP).2014:1724-1734.
[14]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[15]WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[16]SUN Y,WANG SH,LI Y K,et al.ERNIE:enha-nced representation through knowledge integration[J].arXiv:1904.09223,2019.
[17]WEI J,REN X,LI X,et al.NEZHA:Neural Co-ntextualizedRepresentation for Chinese Language Understanding[J].arXiv:1909.00204,2019.
[18]CUI Y,CHE W,LIU T,et al.Revisiting Pre-Trained Models for Chinese Natural Language Processing[J].arXiv:2004.13922,2020.
[19]LIU Y H,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach.[J].arXiv:1907.11692,2019.
[20]ERLANGSHEN Pre-training model [OL].[2021-11-15].https://huggingface.co/IDEA-CCNL/Erlangshen-1.3B.
[21]PEKRS Pre-training model [OL].[2021-11-16].https://mp.weixin.qq.com/s/r85W7T26vy6_IIRAWY1ZKA.
[22]LAI Y,LIU Y,FENG Y,et al.Lattice-BERT:Leveraging Multi-Granularity Representations in Chinese Pre-Trained Language Models[J].arXiv:2104.07204,2021.
[23]ZHANG Z,GU Y,HAN X,et al.CPM-2:Large-Scale Cost-Effective Pre-Trained Language Models[J].arXiv:2106.10715,2021.
[24]MOTIAN Pre-training model[OL].[2021-06-24]https://mp.weixin.qq.com/s/HQL0Hk49UR6kVNtrvcXEGA.
[25]ZHANG R,PANG C,ZHANG C,et al.Correcting ChineseSpelling Errors with Phonetic Pre-Training[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP.2021:2250-2261.
[26]SHAW P,USZKOREIT J,VASWANI A.Self-Attention withRelative Position Representations[J].arXiv:1803.02155,2018.
[27]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing int-ernal covariate shift[C]//International Conference on Machine Learning.2015:448-456.
[28]BA J L,KIROS J R,HINTON G E.Layer norm-alization[J].arXiv:1607.06450,2016.
[29]BERTSG Pre-training model [OL].[2021-03-15].https://baijiahao.baidu.com/s?id=1695185167027662850&wfr=spider&for=pc.
[30]DING M,YANG Z,HONG W,et al.CogView:Mastering Text-to-Image Generation via Transf- ormers[J].arXiv:2105.13290,2021.
[31]SHAZEER N,MISHOSEINI N,MAZIARZ K,et al.Outra-geously Large Neural Networks:The Sparsely-Gated Mixture-of-Experts Layer[J].arXiv:1701.06538,2017.
[32]LIN J,MEN R,YANG A,et al.M6:A Chinese Multimodal Pretrainer[J].arXiv:2103.00823,2021.
[33]YANG A,LIN J,MEN R,et al.M6-T:Exploring Sparse Expert Models and Beyond[J].arXiv:2105.15082.2021.
[34]DIAO S Z,BAI J X,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Fin-dings of the Association for Computational Linguistics.2020:4729-4740.
[35]SONG Y,ZHANG T,WANG Y,et al.ZEN 2.0:Continue Trainingand Adaption for N-gram En-hanced Text Encoders[J].arXiv:2105.01279,2021.
[36]ZHANG X,LI P,LI H.AMBERT:A Pre-Trained LanguageModel withMulti-Grained Tokenization[J].arXiv:2008.11869,2020.
[37]GUO W,ZHAO M,ZHANG L,et al.LICHEE:Improving Language Model Pre-Training with Multi-Grained Tokenization[J].arXiv:2108.00801,2021.
[38]WoBERT Pre-training model [OL].[2020-09-18].https://ke-xue.fm/archives/7758.
[39]PLUG Pre-training model [OL].[2021-04-19].https://mp.weixin.qq.com/s/-aV6Hh-BFoW41HQop_Z02w.
[40]WANG W,BI B,YAN M,et al.StructBERT:IncorporatingLanguage Structures into Pre-Training for Deep Language Understanding[J].arXiv:1908.04577,2019.
[41]BI B,LI C,WU C,et al.PALM:Pre-training an Autoencoding &Autoregressive Language Model for Context-conditionedGene-ration[J].arXiv:2004.07159,2020.
[42]SHAO Y,GENG Z,LIU Y,et al.CPT:A Pre-Trained Unba-lanced Transformer for Both Chinese Language Understanding and Generation[J].arXiv:2109.05729,2021.
[43]SUN Y,WANG S,FENG S,et al.ERNIE 3.0:Large-ScaleKnowledge Enhanced Pre-Training for Language Understanding and Generation[J].arXiv:2107.02137,2021.
[44]WANG S,SUN Y,XIANG Y,et al.ERNIE 3.0 Titan:Exploring Larger-scale Knowledge Enhan-ced Pre-training for Language Understanding and Generation[J].arXiv:2112.12731,2021.
[45]SHEN Z.Pre-training model [OL].[2021-09-30].https://www.jiqizhixin.com/articles/2021-09-30-2.
[46]SUN Z,LI X,SUN X,et al.ChineseBERT:Chi-nese Pretraining Enhanced by Glyph and Pinyin Information[C]//Proceedings of the 59th Annual Meeting of the Association for Computational L-inguistics and the 11th International Joint Conference on Na-tural Language Processing(Volume 1:Long Papers).2021:2065-2075.
[47]ZHANG Z,ZHANG H,CHEN K,et al.Mengzi:TowardsLightweight yet Ingenious Pre-Trained Models for Chinese[J].arXiv:2110.06696,2021.
[48]SHEN N.Pre-training model [OL].[2021-10-20].https://mp.weixin.qq.com/s/coW_OIbRA4lwVLZaRyxO9Q.
[49]HUO Y,ZHANG M,LIU G,et al.WenLan:Bri-dging Visionand Language by Large-Scale Multi-Modal Pre-Training[J].arXiv:2103.06561,2021.
[50]OORD A,VINYALS O,KAVUKCUOGLU K.Neural discreterepresentation learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6309-6318.
[51]LIU J,ZHU X,LIU F,et al.OPT:Omni-Percept-ion Pre-Trai-ner for Cross-Modal Understanding and Generation[J].arXiv:2107.00249,2021.
[52]ZHANG Z,HAN X,ZHOU H,et al.CPM:A Large-Scale Ge-nerative Chinese Pre-Trained Language Model[J].arXiv:2012.00413,2020.
[53]WU S,ZHAO X,YU T,et al.Yuan 1.0:Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning[J].arXiv:2110.04725,2021.
[54]SUN Y,WANG S,LI Y,et al.ERNIE 2.0:AContinual Pre-Training Framework for Language Understanding[J].Procee-dings of the AAAI Co-nference on Artificial Intelligence,2020,34(5):8968-8975.
[55]XIAO C,HU X,LIU Z,et al.Lawformer:A Pre-Trained Language Model for Chinese Legal Long Documents[J].arXiv:2105.03887,2021.
[56]BELTAGY I,PETERS M E,COHAN A.Long-former:TheLong-Document Transformer[J].arXiv:2004.05150,2020.
[57]ZENG W,REN X,SU T,et al.PanGu-$\alpha$:Large-Scale Autoregressive Pretrained Chinese Language Models with Auto-Parallel Computation[J].arXiv:2104.12369,2021.
[58]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2017.
[59]LESTER B,AL-RFOU R,CONSTANT N.Thepower of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021.
[60]BAO S,HE H,WANG F,et al.PLATO-2:Tow-ards Building an Open-Domain Chatbot via Curr-iculum Learning[J].arXiv:2006.16779,2020.
[61]BAO S,HE H,WANG F,et al.PLATO-XL:Ex-ploring theLarge-Scale Pre-Training of Dialogue Generation[J].arXiv:2109.09519,2021.
[62]WANG Y,KE P,ZHENG Y,et al.A Large-Scale ChineseShort-Text Conversation Dataset[J].arXiv:2008.03946,2020.
[63]ZHOU H,KE P,ZHANG Z,et al.EVA:An Op-en-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training[J].arXiv:2108.01547,2021.
[64]LIU Z,HUANG D,HUANG D,et al.FinBERT:A Pre-trained Financial Language Representati- on Model for Financial Text Mining[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence.2021:4513-4519.
[65]TAL-EduBERT Pre-training model [OL].[2021-01-26].https://github.com/tal-tech/edu-bert.
[66]GUWENBERT Pre-training model [OL].[2021-08-31]https://github.com/ethan-yt/guwenbert.
[67]BERT-CCPoem Pre-training model [OL].[2021-7-5]https://github.com/THUNLP-AIPoet/BERT-CCPoem.
[68]ZHANG N,JIA Q,YIN K,et al.Conceptualized Representation Learning for Chinese Biomedical Text Mining[J].arXiv:2008.10813,2020.
[69]HUI B,SHI X,GENG R,et al.Improving Text-to-SQL withSchema Dependency Learning[J].arXiv:2103.04399,2021.
[70]LAN Z,CHEN M,GOODMAN S,et al.ALBE-RT:A LiteBERT for Self-Supervised Learning of Language Representations[J].arXiv:1909.11942,2019.
[71]YANG Z,DAI Z,YANG Y,et al.XLNet:Gene-ralized Auto-regressive Pretraining for Language Understanding[J].arXiv:1906.08237,2019.
[72]CLARK K,LUONG M T,LE Q V,et al.ELEC-TRA:Pre-Training Text Encoders as Discrimina-tors Rather Than Gene-rators[J].arXiv:2003.10555,2020.
[73]SU J,LU Y,PAN S,et al.RoFormer:Enhanced Transformer with Rotary Position Embedding[J].arXiv:2104.09864,2021.
[74]DAI Z,YANG Z,YANG,Y,et al.Transformer-XL:Attentive Language Models Beyond a Fixed-Length Context[J].arXiv:1901.02860,2019.
[75]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer[J].arXiv:1910.10683,2019.
[76]ZHANG J,ZHAO Y,SALEH M,et al.PEGASU-S:Pre-Trai-ning with Extracted Gap-Sentences for Abstractive Summarization[J].arXiv:1912.08777,2019.
[77]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[78]DONG L,YANG N,WANG W,et al.Unified Language ModelPre-Training for Natural Lang-uage Understanding and Generation[J].arXiv:1905.03197,2019.
[79]SU J L.SimBERT Pretraining model [OL].[2020-05-18].https://www.spaces.ac.cn/archives/7427.
[80]SU J L.RoFormer-Sim Pretraining model [OL].[2021-06-11].https://www.spaces.ac.cn/archives/8454.
[81]XU L,ZHANG X,DONG Q.CLUECorpus2020:A Large-Scale Chinese Corpus for Pre-Training Language Model[J].arXiv:2003.01355,2020.
[82]XU L,HU H,ZHANG X,et al.CLUE:A Chinese Language Understanding Evaluation Benchmark[J].arXiv:2004.05986,2020.
[83]ZHANG N,CHEN M,BI Z,et al.CBLUE:A Chinese Bio-medical Language Understanding Evaluation Benchmark[J].ar-Xiv:2106.08087,2021.
[84]YAO Y,DONG Q,GUAN J,et al.CUGE:A Chinese Language Understanding and Generation Evaluation Benchmark[J].ar-Xiv:2112.13610,2021.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Advances in Chinese Pre-training Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[2]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[3]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[5]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[6]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[7]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[8]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[9]	HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[10]	CHENG Cheng, JIANG Ai-lian. Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction [J]. Computer Science, 2022, 49(7): 120-126.
[11]	ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[12]	SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[13]	ZHU Wen-tao, LAN Xian-chao, LUO Huan-lin, YUE Bing, WANG Yang. Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN [J]. Computer Science, 2022, 49(6A): 378-383.
[14]	WANG Jian-ming, CHEN Xiang-yu, YANG Zi-zhong, SHI Chen-yang, ZHANG Yu-hang, QIAN Zheng-kun. Influence of Different Data Augmentation Methods on Model Recognition Accuracy [J]. Computer Science, 2022, 49(6A): 418-423.
[15]	MAO Dian-hui, HUANG Hui-yu, ZHAO Shuang. Study on Automatic Synthetic News Detection Method Complying with Regulatory Compliance [J]. Computer Science, 2022, 49(6A): 523-530.