预训练语言模型的扩展模型研究综述

doi:10.11896/jsjkx.210800125

Abstract

Abstract: In recent years,the proposal of Transformer neural network has greatly promoted the development of pre-training technology.At present,pre-training models based on deep learning have become a research hotspot in the field of natural language processing.Since the end of 2018,BERT has achieved optimal results in multiple natural language processing tasks.A series of improved pre-training models based on BERT have been proposed one after another,and pre-training model extension models designed for various scenarios have also appeared.The expansion of pre-training models from single-language to tasks such as cross-language,multi-modality,and light-weighting has enabled natural language processing to enter a new era of pre-training.This paper mainly summarizes the research methods and research conclusions of lightweight pre-training models,knowledge-incorporated pre-training models,cross-modal pre-training language models and cross-language pre-training language models,as well as the main challenges faced by the pre-training model expansion model.In summary,four research trends for the possible development of extended models are proposed to provide theoretical support for beginners who learn and understand pre-training models.

Key words: Natural language processing, Pre-training, Lightweight, Knowledge-incorporated, Cross-modal, Cross-language

CLC Number:

TP391

Abudukelimu ABULIZI, ZHANG Yu-ning, Alimujiang YASEN, GUO Wen-qiang, Abudukelimu HALIDANMU. Survey of Research on Extended Models of Pre-trained Language Models[J].Computer Science, 2022, 49(11A): 210800125-12.

References

[1]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of Conference on Computational Linguistics:Human Language Technologies.2019:4171-4186.
[2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed rep-resentations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.2013:3111-3119.
[3]PENNINGTON J,SOCHER R,MANNING CD.GloVe:Global vectors forword representation[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1532-1543.
[4]MCCANN B,BRADBURY J,XIONG C M,et al.Learned intranslation:Contextualized word vectors[C]//Proc.of the 31st International Conference on Neural Information Processing Systems.2017:6297-6308.
[5]PETERS M,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[C]//Proc.of Conference on the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237
[6]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].[2021-07-03].https://openai.com/blog/language-unsupervised/
[7]BAEVSKI A,EDUNOV S,LIU Y H,et al.Cloze-driven pre-training of self-attention networks[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019.
[8]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[EB/OL].[2021-07-03].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[9]BROWN TB,MANN B,RYDER N,et al.Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33(NeurIPS2020).2020:1877-1901.
[10]FEDUS W,ZOPH B,SHAZEER N.Switch Transformers:Sca-ling to Trillion Parameter Models with Simple an deficient Sparsity [J].arXiv:2101.03961,2021.
[11]BA J,CARUANA R.Do deep nets really need to be deep?[C]//Proc.of the 27th international Conference on Neural Information Processing Systems.2014:2654-2662.
[12]DENTON M L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc.of the 27^th International Conference on Neural Information Processing Systems.2014:1269-1277.
[13]GORDON M A,DUH K,ANDREWS N.Compressing Bert:Studying the effffects of weight pruning on transfer learning [J].arXiv:2002.08307,2020.
[14]SOHONI N S,ABERGER C R,LESZCZYNSKI M,et al.Low-memory neural network training:A technical report[C]//Proc.of Conference on annual event of the European Federation of Corrosion.2019.
[15]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[16]SUN S Q,CHENG Y,GAN Z,et al.Patient knowledge distil-lation for BERT model compression[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9^th International Joint Conference on Natural Language Proces-sing.2019:4323-4332
[17]WANG N Y,YE Y X,LIU L,et al.Language models based on deep learning:A review[J].Ruan Jian Xue Bao/Journal of Software,2021,32(4):1082-1115
[18]BUCILA C,CARUANA R,NICULESCU-MIZIL A.Modelcompression[C]//Proc.of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:535-541.
[19]XU C W,ZHOU W,G E T,et al.Bert-of-theseus:Compressing bert by progressive module replacing [J].arXiv:2002.02925,2020.
[20]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[21]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[22]SUN Z Q,YU H K.MobileBERT:a Compact Task-Agnostic BERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[23]TURC I,CHANG M W,LEE K,et al.Well-read students learn better:The impact of student initialization on knowledge distillation [J].arXiv:1908.08962,2019.
[24]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[25]ZAFRIR O,BOUDOUKH G,IZSA K,et al.Q8bert:Quantized 8 bit bert[C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019.
[26]SHEN S,DONG Z,YE J Y,et al.Q-bert:Hessian based ultra low precision quantization of bert[C]//Proc.of AAAI.2020:8815-8821.
[27]PRATO G,CHARLAIX E,REZAGHOLIZADEH M.Fullyquantized transformer for machine translation[C]//Proc.of the Conference on Empirical Methods in Natural Language Processing:Findings.2020:1-14.
[28]WANG W H,WEI F R,DONG L,et al.MiniLM:Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [J].Advances in Neural Information Processing System,2020,33:5776-5788.
[29]LAN Z Z,CHEN M D.Albert:Alite bert for self-supervised learning of language representations [J].arXiv:1909.11942,2019.
[30]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators[C]//Proc.of the Int’l Conference on Learning Representations.2019.
[31]XIN J,TANG R,LEE J,et al.DeeBERT:Dynamic Early Exiting for Accelerating BERT Inference[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2246-2251.
[32]HE Y,ZHANG X,SUN J.Channel pruning for accelerating very deep neural networks[C]//Proc.of the IEEE International Conference on Computer Vision.Venice,2017:1389-1397.
[33]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[C]//Proc.of International Conference on Learning Representations.2016.
[34]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proc.of the IEEE International Conference on Computer Vision.2017:5058-5066.
[35]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proc.of International Conference on Lear-ning Representations.2019.
[36]WANG YL,ZHANG XL,XIE LX,et al.Pruning from Scratch [J].arXiv:1909.12579v1,2019.
[37]WANG A,SINGH A,MICHAEL J,et al.Glue:A multi-taskbenchmark and analysis platform for natural language understanding [J].arXiv:1804.07461,2018.
[38]MCCARLEY J S,CHAKRAVARTI R,SIL A.Structured Pru-ning of a BERT-based Question Answering Model [J].arXiv:1910.06360v2,2020.
[39]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[40]GORDON M,DUH K,ANDREWS N.CompressingBERT:Studying the Effffects of Weight Pruning on Transfer Learning [J].arXiv:2002.08307,2020.
[41]FRANKLE J,CARBIN M.Thelottery ticket hypothesis:Finding sparse,trainable neural networks[C]//Proc.of the Seventh International Conference on Learning Representations.2019.
[42]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[C]//Proc.of Deep Learning Workshop on NIPS.2014.
[43]WANG W,WEI F,DONG L,et al.Minilm:Deepself-attentiondistillation for task-agnostic compression of pre-trained transformers [J].arXiv:2002.10957,2020.
[44]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[45]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[46]KRISHNAMOORTHI R.Quantizing deep convolutional net-works for efficient inference:Awhitepaper [J].arXiv:1806.08342,2018.
[47]ZHANG D,YANG J,YE D,et al.LQ-nets:Learned quantization for highly accurateand compact deep neural networks[C]//Proc.of the 15th European Conference on Computer Vision.2018:365-382.
[48]DONG Z,YAO Z,GHOLAMI A M.et al.HAWQ:HessianAWare Quantization of Neural Networks with Mixed-Precision[C]//Proc.of International Conference on Computer Vision(ICCV).2019.
[49]WU B,WANG Y,ZHANG P,et al.Mixed Precision Quantization of ConvNets via Difffferentiable Neural Architecture Search[C]//Proc.of ICLR.2019.
[50]LAI G K,XIE Q Z,LIU H X,et al.Race:Large-scale reading comprehension dataset from examinations[C]//Proc.of Empirical Methods in Natural Language Processing.2017:785-794.
[51]SHOEYBI M,PATWARY M,PURI R.Megatron-LM:Training Multi-Billion Parameter Language Models Using Model Parall-elism[J].arXiv:1909.08053,2019.
[52]CHILD R,GRAY S,RADFORD A,et al.Generating Long Sequences with Sparse Transformers[J].arXiv:1904.10509,2019.
[53]SUKHBAATAR S,GRAVE E,BOJANOWSKI P,et al.Adaptive Attention Span in Transformers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019.
[54]WYNTER A D,PERRY D J.Optimal Subarchitecture Extraction For BERT[J].arXiv:2010.10499,2020.
[55]WU Z,LIU Z,LIN J,et al.Lite Transformer with Long-Short Range Attention[J].arXiv:2004.11886,2020.
[56]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeezenet:Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[J].arXiv:1602.07360,2016.
[57]HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04816,2017.
[58]ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[59]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017.
[60]HE P C,LIU X D,GAO J F.Deberta:decoding-enhanced bert with disentangled attention [J].arXiv:2006.03654,2020.
[61]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistic.2020:8440-8451.
[62]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]//Proc.of the 26th International Conference on Neural Information Processing Systems.2013:2787-2795.
[63]XIN J,ZHU H,HAN X,et al.Putitback:Entity typing with language model enhancement[C]//Proc.of Conference on Empirical Methodsin Natural Language Processing.2018:993-998.
[64]YAGHOOBZADEH Y,SCHÜTZE H.Multilevelre presenta-tions for fifine-grained typing of knowledge base entities[C]//Proc.of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2017:578-589.
[65]YAMADA I,SHINDO H,TAKEDA H,et al.Joint learning of the embedding of words and entities for named entity disambiguation[C]//Proc.of the 20^th SIGNLL Conference on Computational Natural Language Learning(CoNLL).2016:250-259.
[66]SUN Y,WANG S H,LI Y K,et al.ERNIE2.0:A continual pretraining framework for language understanding[C]//Proc.of AAAI.2019.
[67]LIU WJ,ZHOU P,ZHAO Z,et al.K-BERT:Enabling language representation with knowledge graph[C]//Proc.of AAAI.2019.
[68]PETERS ME,NEUMANN M,LOGAN IVRL,et al.Knowledge enhanced contextual word representations[C]//Proc.of Confe-rence on Empirical Methods in Natural Language Processing and the 9^th Int’l Joint Conference on Natural Language Processing(EMNLPIJCNLP).2019:43-54.
[69]WANG R,TANG D,DUAN N,et al.K-Adapter:InfusingKnowledge into Pre-Trained Models with Adapters[J].arXiv:2002.01808,2020.
[70]CUI Y,CHE W,LIU T,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].arXiv:1906.08101,2019.
[71]JOSHI M,CHEN D,LIU Y,et al.SpanBERT:Improving Pre-training by Representing and Predicting Spans[J].arXiv:1907.10529,2019.
[72]DIAO S,BAI J,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Findings of the Association for Computational Linguistics(EMNLP 2020).2020.
[73]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//Proc.of the Neural Information Processing Systems.2012:1106-1114.
[74]CARON M,BOJANOWSKI P,JOULIN A,et al.Deep cluste-ring forun supervised learning of visual features[C]//Proc.of Computer Visio-ECCV.2018:139-156.
[75]HAN K,XIAO A,WU E H,et al.Transformer in Transformer [J].Advances in Neural Information Processing System,2021,34:15908-15919.
[76]SUN C,MYERS A,VONDRICK C,et al.VideoBERT:A joint model for video and language representation learning[C]//Proc.of the IEEE Int’l Conference on Computer Vision.2019:7464-7473.
[77]ZHU L C,YANG Y.ActBERT:Learning Global-Local Video-Text Representations[C]//Proc.of CVPR.2020:8746-8755.
[78]HUO Y Q,ZHANG M L,LIU G Z.WenLan:Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [J].arXiv:2103.06561v5,2021.
[79]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations [J].arXiv:1909.11740,2019.
[80]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representation[C]//Proc.of the 37th International Conference on Machine Learning(ICML 2020).2020:1575-1585.
[81]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[82]JIA C,YANG YF,XIA Y,et al.Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision[C]//International Conference on Machine Learning.PMLR,2021:4904-4916.
[83]SHARMA P,DING N,GOODMANS,et al.Conceptual cap-tions:A cleaned,hypernymed,imagealt-textdataset for automaticimage captioning[C]//Proc.of the 56^th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565.
[84]ORDONEZ V,KULKARNI G,BERG T L.Im2text:Describing images using 1 million captioned photographs[C]//Proc.of Neural Information Processing Systems.2011:1143-1151.
[85]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:AligningImage Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[86]YU F,TANG J J,YIN W C,et al.ERNIE-ViL:Knowledge Enhanced Vision-Language Representations Through Scene Graph [J].arXiv:2006.16934,2020.
[87]VINCENT P,LAROCHELLE H,BENGIO Y O,et al.Extracting and composing robust features withde noising autoencoders[C]//Proc.of the 25^th International Conference on Machine Learning.2008:1096-1103.
[88]LI L H,ATSKAR M Y,YIN D,et al.Visualbert:A simple and performant baseline for vision and language [J].arXiv:1908.03557,2019.
[89]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[90]LU J,BATRA D,PARIKH D,et al.VilBERT:Pretraining task-agnostic visio linguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23.
[91]SU W,ZHU X,CAO Y,et al.Vl-BERT:Pre-training of generic visual-linguistic representations[C]//Proc.of the 8^th Int’l Conference on Learning Representations.2020.
[92]LI G,DUAN N,FANG Y,et al.Unicoder-vl:Auniversal encoder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344.
[93]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:Aligning Image Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[94]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[95]QI D,SU L,SONG J,et al.Image BERT:Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data [J].arXiv:2001.07966,2020.
[96]GAN Z,CHEN Y C,LI L J,et al.Large-scale adversarial training forvision-and-language representation learning[J].Advances in Neural Information Processing Systems,2020,33:6616-6628.
[97]TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations fromtrans formers[C]//Proc.of Conference on Empirical Methods in Natural Language Processing.2019.
[98]LI W,GAO C,NIU G C,et al.UNIMO:Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning [J].arXiv:2012.15409v1,2020.
[99]LI XJ,YIN X,LI CY,et al.Oscar:Object semantics aligned pre-training for vision-language tasks[C]//Proc.of the European Conference on Computer Vision.2020:121-137.
[100]ZHOU L W,PALANGI H,ZHANG L,et al.Unifified visionlanguage pre-training for image captioning and VQA[C]//Proc.of the SemEval workshop at ACL.2017:13041-13049.
[101]LU JS,GOSWAMI VE,ROHRBACH M,et al.12-in-1:Multi-task vision and languagere presentation learning[C]//Proc.of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020.
[102]SUN C,BARADE LF,MURPHY K,et al.Contrastive bidirectional transformer for temporal representation learning [J].arXiv:1906.05743,2019.
[103]LI TH,LI M.Learning spatiotemporal features via video and text paird is crimination [J].arXiv:2001.05691,2020.
[104]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proc.of the IEEE international Conference on computer vision.2015:2425-2433.
[105]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[106]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the v in vqa matter:Elevating the role of image understanding invisual question answering[C]//Proc.of Computer Vision and Pattern Recognition(CVPR).2017.
[107]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[108]PLUMMER B A,WANG L W,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for riche image-to-sentence models[C]//Proc.of ICCV.2015.
[109]KAZEMZADEH S,ORDONEZ V,MATTEN M,et al.Refer it game:Referring to objects in photographs of natural scenes[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:787-798.
[110]CHUNG H W,FEVRY T,TSAI H,et al.Rethinking embedding couplingin pre-trained language models[C]//Proc.of ICLR 2021.2021.
[111]CHI Z W,DONG L,WEI F R,et al.INFOXLM:An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [J].arXiv:2007.07834v1,2020.
[112]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[113]OUYANG X,WANG S H,PANG C,et al.ERNIE-M:Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora [J].arXiv:2012.15674,2021.
[114]TIWARY S,ZHOU M.T-ULRv2[EB/OL].2020.https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal language-representation-model-t-ulrv2-tops-xtreme-leader-board/?lang=frca.
[115]DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive language models beyond a fixed-length context[C]//Proc.of the 57^th Annual Meeting of the Association for Computational Linguistics.2019:2978-2988.
[116]FANG Y W,WANG S H,GAN Z,et al.FILTER:An Enhanced Fusion Method for Cross-lingual Language Understanding[C]//Proc.of Association for the Advancement of Artifificial Intelligence.2020.
[117]PHANG J,HTUT P M,PRUKSACHATKUN Y,et al.English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too [J].arXiv:2005.13013v1,2020.
[118]PIRES T,SCHLINGER E,GARRETTE D,et al.How multilingual is Multilingual BERT? [J].arXiv:1906.01502v1,2019.
[119]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proc.of the 31^st Conference on Neural Information Processing Systems.2017:5998-6008.
[120]DELOBELLE P,WINTERS T,BERENDT B.RobBERT:aDutch RoBERTa-based language model [J].arXiv:2001.06286,2020.
[121]WENZEK G,LACHAUX M E,CONNEAU A,et al.Ccnet:Extracting high quality monolingual datasets from web crawldata [J].arXiv:1911.00359,2019.
[122]HUANG H Y,SU L,QI D.M3P:Learning Universal Representations via Multitask Multilingual Multimodal Pretraining [J].arXiv:006.02635v1,2020.
[123]RUDER,SEBASTIAN.ML and NLP Research High lights of 2020[EB/OL].http://ruder.io/research-highlights-2020,2021.
[124]ANTOUN W,BALY F,HAJJ H.AraBERT:Transformer-based Model for Arabic Language Understanding[C]//Proc.of the 4^th Workshop on Open-Source Arabic Corpora and Proces-sing Tools,with a Shared Task on Offffensive Language Detection.2020:9-15.
[125]WILIE B,VINCENTIO K,WINATA G I.IndoNLU:Bench-mark and Resources for Evaluating Indonesian Natura lLanguage Understanding[C]//Proc.of the 1^st Conference for the Asia-Pacifific Chapter of the Association for Computational Linguistics and the 10^th International Joint Conference on Natural Language Processing.2020:843-857.
[126]SONG K,TAN X,QIN T,et al.MASS:Masked sequence to sequence pre-training for language generation[C]//Proc.of the Int’l Conference on Machine Learning.2019:5926-5936.
[127]LIU Y H,GU JT,GOYAL N,et al.Multilingual denoising pre-training for neural machine translation [J].arXiv:2001.08210,2020.
[128]CHI Z W,DONG L,WEI F R.Cross-lingual natural languagegeneration via pre-training[C]//Proc.of AAAI.2020:7570-7577.
[129]SUN Z Q,YU H K.MobileBERT:a Compact Task-AgnosticBERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[130]LIU Z Y,SUN M S,LIN Y K,et al.Knowledge representation learning:A review[J].Journal of Computer Research and Deve-lopment,2016,53(2):247-261.
[131]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:acollaboratively created graph database for structuring human knowledge[C]//Proc.of the ACM SIGMOD International Conference on Management of Data.2008:1247-1250.
[132]MILLERG A.WordNet:alexical database for English[J].Communications of the ACM.1995,38:483.
[133]MITCHELL T,COHEN W,HRUSCHKA E,et al.Never Ending Language Learning[C]//Proc.of the Conference on Artifificial Intelligence.20151:103-115.
[134]LIANG X B,REN F L,LIU Y K,et al.N-reader:Machine rea-ding comprehension based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):130-137
[135]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey of Research on Extended Models of Pre-trained Language Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[2]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[3]	ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[4]	HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[5]	HAO Qiang, LI Jie, ZHANG Man, WANG Lu. Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3 [J]. Computer Science, 2022, 49(6A): 358-362.
[6]	LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[7]	ZHAO Dan-dan, HUANG De-gen, MENG Jia-na, DONG Yu, ZHANG Pan. Chinese Entity Relations Classification Based on BERT-GRU-ATT [J]. Computer Science, 2022, 49(6): 319-325.
[8]	LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[9]	ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[10]	XU Hui, WANG Zhong-qing, LI Shou-shan, ZHANG Min. Personalized Dialogue Generation Integrating Sentimental Information [J]. Computer Science, 2022, 49(11A): 211100019-6.
[11]	HUANG Yu-jiao, ZHAN Li-chao, FAN Xing-gang, XIAO Jie, LONG Hai-xia. Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM [J]. Computer Science, 2022, 49(11A): 211200181-6.
[12]	ZHANG Bin, LIU Chang-hong, ZENG Sheng, JIE An-quan. Speech-driven Personal Style Gesture Generation Method Based on Spatio-Temporal GraphConvolutional Networks [J]. Computer Science, 2022, 49(11A): 210900094-5.
[13]	HU Xin-rong, CHEN Zhi-heng, LIU Jun-ping, PENG Tao, YE Peng, ZHU Qiang. Sentiment Analysis Framework Based on Multimodal Representation Learning [J]. Computer Science, 2022, 49(11A): 210900107-6.
[14]	CHEN Qiao-song, HE Xiao-yang, XU Wen-jie, DENG Xin, WANG Jin, PIAO Chang-hao. Reentrancy Vulnerability Detection Based on Pre-training Technology and Expert Knowledge [J]. Computer Science, 2022, 49(11A): 211200182-8.
[15]	HE Peng-hao, YU Ying, XU Chao-yue. Image Super-resolution Reconstruction Network Based on Dynamic Pyramid and Subspace Attention [J]. Computer Science, 2022, 49(11A): 210900202-8.