预训练语言模型的扩展模型研究综述

doi:10.11896/jsjkx.210800125

计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800125-12.doi: 10.11896/jsjkx.210800125

预训练语言模型的扩展模型研究综述

阿布都克力木·阿布力孜^1,2, 张雨宁¹, 阿力木江·亚森¹, 郭文强¹, 哈里旦木·阿布都克里木^1,2

1 新疆财经大学信息管理学院乌鲁木齐 830012
2 新疆财经大学丝路经济与管理研究院乌鲁木齐 830012

出版日期:2022-11-10 发布日期:2022-11-21
通讯作者: 哈里旦木·阿布都克里木(abdklmhldm@gmail.com)
作者简介:(keram1106@163.com)
基金资助:
国家自然科学基金项目(61866035,61966033);2018 年度自治区高层次人才引进项目(40050027);2018 年度自治区科学技术厅天池博士项目(40050033);国家重点研发专项(2018YFC0825504)

Survey of Research on Extended Models of Pre-trained Language Models

Abudukelimu ABULIZI^1,2, ZHANG Yu-ning¹, Alimujiang YASEN¹, GUO Wen-qiang¹, Abudukelimu HALIDANMU^1,2

1 School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China
2 Institute of Silk Road Economy and Management,Xinjiang University of Finance and Economics,Urumqi 830012,China

Online:2022-11-10 Published:2022-11-21
About author:Abudukelimu ABULIZI,born in 1983,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include cognitive neuroscience,artificial intelligence and big data mining.
Abudukelimu HALIDANMU,born in 1978,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include artificial intelligence and natural language processing.
Supported by:
National Natural Science Foundation of China(61866035,61966033),2018 High-level Talented Person Project of Department of Human Resources and Social Security of Xinjiang Uyghur Autonomous Region(40050027),2018 Tianchi Ph.D Program Scientific Research Fund of Science and Technology Department of Xinjiang Uyghur Autonomous Region(40050033) and National Key Research and Deve-lopment Program of China(2018YFC0825504).

摘要/Abstract

摘要： 近些年,Transformer神经网络的提出,大大推动了预训练技术的发展。目前,基于深度学习的预训练模型已成为了自然语言处理领域的研究热点。自2018年底BERT在多个自然语言处理任务中达到了最优效果以来,一系列基于BERT改进的预训练模型相继被提出,也出现了针对各种场景而设计的预训练模型扩展模型。预训练模型从单语言扩展到跨语言、多模态、轻量化等任务,使得自然语言处理进入了一个全新的预训练时代。主要对轻量化预训练模型、融入知识的预训练模型、跨模态预训练语言模型、跨语言预训练语言模型的研究方法和研究结论进行梳理,并对预训练模型扩展模型面临的主要挑战进行总结,提出了4种扩展模型可能发展的研究趋势,为学习和理解预训练模型的初学者提供理论支持。

关键词: 自然语言处理, 预训练, 轻量化, 知识融合, 多模态, 跨语言

Abstract: In recent years,the proposal of Transformer neural network has greatly promoted the development of pre-training technology.At present,pre-training models based on deep learning have become a research hotspot in the field of natural language processing.Since the end of 2018,BERT has achieved optimal results in multiple natural language processing tasks.A series of improved pre-training models based on BERT have been proposed one after another,and pre-training model extension models designed for various scenarios have also appeared.The expansion of pre-training models from single-language to tasks such as cross-language,multi-modality,and light-weighting has enabled natural language processing to enter a new era of pre-training.This paper mainly summarizes the research methods and research conclusions of lightweight pre-training models,knowledge-incorporated pre-training models,cross-modal pre-training language models and cross-language pre-training language models,as well as the main challenges faced by the pre-training model expansion model.In summary,four research trends for the possible development of extended models are proposed to provide theoretical support for beginners who learn and understand pre-training models.

Key words: Natural language processing, Pre-training, Lightweight, Knowledge-incorporated, Cross-modal, Cross-language

中图分类号:

TP391

阿布都克力木·阿布力孜, 张雨宁, 阿力木江·亚森, 郭文强, 哈里旦木·阿布都克里木. 预训练语言模型的扩展模型研究综述[J]. 计算机科学, 2022, 49(11A): 210800125-12. https://doi.org/10.11896/jsjkx.210800125

Abudukelimu ABULIZI, ZHANG Yu-ning, Alimujiang YASEN, GUO Wen-qiang, Abudukelimu HALIDANMU. Survey of Research on Extended Models of Pre-trained Language Models[J]. Computer Science, 2022, 49(11A): 210800125-12. https://doi.org/10.11896/jsjkx.210800125

参考文献

[1]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of Conference on Computational Linguistics:Human Language Technologies.2019:4171-4186.
[2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed rep-resentations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.2013:3111-3119.
[3]PENNINGTON J,SOCHER R,MANNING CD.GloVe:Global vectors forword representation[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1532-1543.
[4]MCCANN B,BRADBURY J,XIONG C M,et al.Learned intranslation:Contextualized word vectors[C]//Proc.of the 31st International Conference on Neural Information Processing Systems.2017:6297-6308.
[5]PETERS M,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[C]//Proc.of Conference on the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237
[6]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].[2021-07-03].https://openai.com/blog/language-unsupervised/
[7]BAEVSKI A,EDUNOV S,LIU Y H,et al.Cloze-driven pre-training of self-attention networks[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019.
[8]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[EB/OL].[2021-07-03].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[9]BROWN TB,MANN B,RYDER N,et al.Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33(NeurIPS2020).2020:1877-1901.
[10]FEDUS W,ZOPH B,SHAZEER N.Switch Transformers:Sca-ling to Trillion Parameter Models with Simple an deficient Sparsity [J].arXiv:2101.03961,2021.
[11]BA J,CARUANA R.Do deep nets really need to be deep?[C]//Proc.of the 27th international Conference on Neural Information Processing Systems.2014:2654-2662.
[12]DENTON M L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc.of the 27^th International Conference on Neural Information Processing Systems.2014:1269-1277.
[13]GORDON M A,DUH K,ANDREWS N.Compressing Bert:Studying the effffects of weight pruning on transfer learning [J].arXiv:2002.08307,2020.
[14]SOHONI N S,ABERGER C R,LESZCZYNSKI M,et al.Low-memory neural network training:A technical report[C]//Proc.of Conference on annual event of the European Federation of Corrosion.2019.
[15]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[16]SUN S Q,CHENG Y,GAN Z,et al.Patient knowledge distil-lation for BERT model compression[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9^th International Joint Conference on Natural Language Proces-sing.2019:4323-4332
[17]WANG N Y,YE Y X,LIU L,et al.Language models based on deep learning:A review[J].Ruan Jian Xue Bao/Journal of Software,2021,32(4):1082-1115
[18]BUCILA C,CARUANA R,NICULESCU-MIZIL A.Modelcompression[C]//Proc.of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:535-541.
[19]XU C W,ZHOU W,G E T,et al.Bert-of-theseus:Compressing bert by progressive module replacing [J].arXiv:2002.02925,2020.
[20]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[21]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[22]SUN Z Q,YU H K.MobileBERT:a Compact Task-Agnostic BERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[23]TURC I,CHANG M W,LEE K,et al.Well-read students learn better:The impact of student initialization on knowledge distillation [J].arXiv:1908.08962,2019.
[24]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[25]ZAFRIR O,BOUDOUKH G,IZSA K,et al.Q8bert:Quantized 8 bit bert[C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019.
[26]SHEN S,DONG Z,YE J Y,et al.Q-bert:Hessian based ultra low precision quantization of bert[C]//Proc.of AAAI.2020:8815-8821.
[27]PRATO G,CHARLAIX E,REZAGHOLIZADEH M.Fullyquantized transformer for machine translation[C]//Proc.of the Conference on Empirical Methods in Natural Language Processing:Findings.2020:1-14.
[28]WANG W H,WEI F R,DONG L,et al.MiniLM:Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [J].Advances in Neural Information Processing System,2020,33:5776-5788.
[29]LAN Z Z,CHEN M D.Albert:Alite bert for self-supervised learning of language representations [J].arXiv:1909.11942,2019.
[30]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators[C]//Proc.of the Int’l Conference on Learning Representations.2019.
[31]XIN J,TANG R,LEE J,et al.DeeBERT:Dynamic Early Exiting for Accelerating BERT Inference[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2246-2251.
[32]HE Y,ZHANG X,SUN J.Channel pruning for accelerating very deep neural networks[C]//Proc.of the IEEE International Conference on Computer Vision.Venice,2017:1389-1397.
[33]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[C]//Proc.of International Conference on Learning Representations.2016.
[34]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proc.of the IEEE International Conference on Computer Vision.2017:5058-5066.
[35]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proc.of International Conference on Lear-ning Representations.2019.
[36]WANG YL,ZHANG XL,XIE LX,et al.Pruning from Scratch [J].arXiv:1909.12579v1,2019.
[37]WANG A,SINGH A,MICHAEL J,et al.Glue:A multi-taskbenchmark and analysis platform for natural language understanding [J].arXiv:1804.07461,2018.
[38]MCCARLEY J S,CHAKRAVARTI R,SIL A.Structured Pru-ning of a BERT-based Question Answering Model [J].arXiv:1910.06360v2,2020.
[39]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[40]GORDON M,DUH K,ANDREWS N.CompressingBERT:Studying the Effffects of Weight Pruning on Transfer Learning [J].arXiv:2002.08307,2020.
[41]FRANKLE J,CARBIN M.Thelottery ticket hypothesis:Finding sparse,trainable neural networks[C]//Proc.of the Seventh International Conference on Learning Representations.2019.
[42]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[C]//Proc.of Deep Learning Workshop on NIPS.2014.
[43]WANG W,WEI F,DONG L,et al.Minilm:Deepself-attentiondistillation for task-agnostic compression of pre-trained transformers [J].arXiv:2002.10957,2020.
[44]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[45]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[46]KRISHNAMOORTHI R.Quantizing deep convolutional net-works for efficient inference:Awhitepaper [J].arXiv:1806.08342,2018.
[47]ZHANG D,YANG J,YE D,et al.LQ-nets:Learned quantization for highly accurateand compact deep neural networks[C]//Proc.of the 15th European Conference on Computer Vision.2018:365-382.
[48]DONG Z,YAO Z,GHOLAMI A M.et al.HAWQ:HessianAWare Quantization of Neural Networks with Mixed-Precision[C]//Proc.of International Conference on Computer Vision(ICCV).2019.
[49]WU B,WANG Y,ZHANG P,et al.Mixed Precision Quantization of ConvNets via Difffferentiable Neural Architecture Search[C]//Proc.of ICLR.2019.
[50]LAI G K,XIE Q Z,LIU H X,et al.Race:Large-scale reading comprehension dataset from examinations[C]//Proc.of Empirical Methods in Natural Language Processing.2017:785-794.
[51]SHOEYBI M,PATWARY M,PURI R.Megatron-LM:Training Multi-Billion Parameter Language Models Using Model Parall-elism[J].arXiv:1909.08053,2019.
[52]CHILD R,GRAY S,RADFORD A,et al.Generating Long Sequences with Sparse Transformers[J].arXiv:1904.10509,2019.
[53]SUKHBAATAR S,GRAVE E,BOJANOWSKI P,et al.Adaptive Attention Span in Transformers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019.
[54]WYNTER A D,PERRY D J.Optimal Subarchitecture Extraction For BERT[J].arXiv:2010.10499,2020.
[55]WU Z,LIU Z,LIN J,et al.Lite Transformer with Long-Short Range Attention[J].arXiv:2004.11886,2020.
[56]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeezenet:Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[J].arXiv:1602.07360,2016.
[57]HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04816,2017.
[58]ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[59]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017.
[60]HE P C,LIU X D,GAO J F.Deberta:decoding-enhanced bert with disentangled attention [J].arXiv:2006.03654,2020.
[61]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistic.2020:8440-8451.
[62]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]//Proc.of the 26th International Conference on Neural Information Processing Systems.2013:2787-2795.
[63]XIN J,ZHU H,HAN X,et al.Putitback:Entity typing with language model enhancement[C]//Proc.of Conference on Empirical Methodsin Natural Language Processing.2018:993-998.
[64]YAGHOOBZADEH Y,SCHÜTZE H.Multilevelre presenta-tions for fifine-grained typing of knowledge base entities[C]//Proc.of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2017:578-589.
[65]YAMADA I,SHINDO H,TAKEDA H,et al.Joint learning of the embedding of words and entities for named entity disambiguation[C]//Proc.of the 20^th SIGNLL Conference on Computational Natural Language Learning(CoNLL).2016:250-259.
[66]SUN Y,WANG S H,LI Y K,et al.ERNIE2.0:A continual pretraining framework for language understanding[C]//Proc.of AAAI.2019.
[67]LIU WJ,ZHOU P,ZHAO Z,et al.K-BERT:Enabling language representation with knowledge graph[C]//Proc.of AAAI.2019.
[68]PETERS ME,NEUMANN M,LOGAN IVRL,et al.Knowledge enhanced contextual word representations[C]//Proc.of Confe-rence on Empirical Methods in Natural Language Processing and the 9^th Int’l Joint Conference on Natural Language Processing(EMNLPIJCNLP).2019:43-54.
[69]WANG R,TANG D,DUAN N,et al.K-Adapter:InfusingKnowledge into Pre-Trained Models with Adapters[J].arXiv:2002.01808,2020.
[70]CUI Y,CHE W,LIU T,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].arXiv:1906.08101,2019.
[71]JOSHI M,CHEN D,LIU Y,et al.SpanBERT:Improving Pre-training by Representing and Predicting Spans[J].arXiv:1907.10529,2019.
[72]DIAO S,BAI J,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Findings of the Association for Computational Linguistics(EMNLP 2020).2020.
[73]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//Proc.of the Neural Information Processing Systems.2012:1106-1114.
[74]CARON M,BOJANOWSKI P,JOULIN A,et al.Deep cluste-ring forun supervised learning of visual features[C]//Proc.of Computer Visio-ECCV.2018:139-156.
[75]HAN K,XIAO A,WU E H,et al.Transformer in Transformer [J].Advances in Neural Information Processing System,2021,34:15908-15919.
[76]SUN C,MYERS A,VONDRICK C,et al.VideoBERT:A joint model for video and language representation learning[C]//Proc.of the IEEE Int’l Conference on Computer Vision.2019:7464-7473.
[77]ZHU L C,YANG Y.ActBERT:Learning Global-Local Video-Text Representations[C]//Proc.of CVPR.2020:8746-8755.
[78]HUO Y Q,ZHANG M L,LIU G Z.WenLan:Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [J].arXiv:2103.06561v5,2021.
[79]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations [J].arXiv:1909.11740,2019.
[80]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representation[C]//Proc.of the 37th International Conference on Machine Learning(ICML 2020).2020:1575-1585.
[81]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[82]JIA C,YANG YF,XIA Y,et al.Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision[C]//International Conference on Machine Learning.PMLR,2021:4904-4916.
[83]SHARMA P,DING N,GOODMANS,et al.Conceptual cap-tions:A cleaned,hypernymed,imagealt-textdataset for automaticimage captioning[C]//Proc.of the 56^th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565.
[84]ORDONEZ V,KULKARNI G,BERG T L.Im2text:Describing images using 1 million captioned photographs[C]//Proc.of Neural Information Processing Systems.2011:1143-1151.
[85]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:AligningImage Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[86]YU F,TANG J J,YIN W C,et al.ERNIE-ViL:Knowledge Enhanced Vision-Language Representations Through Scene Graph [J].arXiv:2006.16934,2020.
[87]VINCENT P,LAROCHELLE H,BENGIO Y O,et al.Extracting and composing robust features withde noising autoencoders[C]//Proc.of the 25^th International Conference on Machine Learning.2008:1096-1103.
[88]LI L H,ATSKAR M Y,YIN D,et al.Visualbert:A simple and performant baseline for vision and language [J].arXiv:1908.03557,2019.
[89]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[90]LU J,BATRA D,PARIKH D,et al.VilBERT:Pretraining task-agnostic visio linguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23.
[91]SU W,ZHU X,CAO Y,et al.Vl-BERT:Pre-training of generic visual-linguistic representations[C]//Proc.of the 8^th Int’l Conference on Learning Representations.2020.
[92]LI G,DUAN N,FANG Y,et al.Unicoder-vl:Auniversal encoder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344.
[93]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:Aligning Image Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[94]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[95]QI D,SU L,SONG J,et al.Image BERT:Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data [J].arXiv:2001.07966,2020.
[96]GAN Z,CHEN Y C,LI L J,et al.Large-scale adversarial training forvision-and-language representation learning[J].Advances in Neural Information Processing Systems,2020,33:6616-6628.
[97]TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations fromtrans formers[C]//Proc.of Conference on Empirical Methods in Natural Language Processing.2019.
[98]LI W,GAO C,NIU G C,et al.UNIMO:Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning [J].arXiv:2012.15409v1,2020.
[99]LI XJ,YIN X,LI CY,et al.Oscar:Object semantics aligned pre-training for vision-language tasks[C]//Proc.of the European Conference on Computer Vision.2020:121-137.
[100]ZHOU L W,PALANGI H,ZHANG L,et al.Unifified visionlanguage pre-training for image captioning and VQA[C]//Proc.of the SemEval workshop at ACL.2017:13041-13049.
[101]LU JS,GOSWAMI VE,ROHRBACH M,et al.12-in-1:Multi-task vision and languagere presentation learning[C]//Proc.of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020.
[102]SUN C,BARADE LF,MURPHY K,et al.Contrastive bidirectional transformer for temporal representation learning [J].arXiv:1906.05743,2019.
[103]LI TH,LI M.Learning spatiotemporal features via video and text paird is crimination [J].arXiv:2001.05691,2020.
[104]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proc.of the IEEE international Conference on computer vision.2015:2425-2433.
[105]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[106]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the v in vqa matter:Elevating the role of image understanding invisual question answering[C]//Proc.of Computer Vision and Pattern Recognition(CVPR).2017.
[107]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[108]PLUMMER B A,WANG L W,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for riche image-to-sentence models[C]//Proc.of ICCV.2015.
[109]KAZEMZADEH S,ORDONEZ V,MATTEN M,et al.Refer it game:Referring to objects in photographs of natural scenes[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:787-798.
[110]CHUNG H W,FEVRY T,TSAI H,et al.Rethinking embedding couplingin pre-trained language models[C]//Proc.of ICLR 2021.2021.
[111]CHI Z W,DONG L,WEI F R,et al.INFOXLM:An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [J].arXiv:2007.07834v1,2020.
[112]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[113]OUYANG X,WANG S H,PANG C,et al.ERNIE-M:Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora [J].arXiv:2012.15674,2021.
[114]TIWARY S,ZHOU M.T-ULRv2[EB/OL].2020.https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal language-representation-model-t-ulrv2-tops-xtreme-leader-board/?lang=frca.
[115]DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive language models beyond a fixed-length context[C]//Proc.of the 57^th Annual Meeting of the Association for Computational Linguistics.2019:2978-2988.
[116]FANG Y W,WANG S H,GAN Z,et al.FILTER:An Enhanced Fusion Method for Cross-lingual Language Understanding[C]//Proc.of Association for the Advancement of Artifificial Intelligence.2020.
[117]PHANG J,HTUT P M,PRUKSACHATKUN Y,et al.English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too [J].arXiv:2005.13013v1,2020.
[118]PIRES T,SCHLINGER E,GARRETTE D,et al.How multilingual is Multilingual BERT? [J].arXiv:1906.01502v1,2019.
[119]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proc.of the 31^st Conference on Neural Information Processing Systems.2017:5998-6008.
[120]DELOBELLE P,WINTERS T,BERENDT B.RobBERT:aDutch RoBERTa-based language model [J].arXiv:2001.06286,2020.
[121]WENZEK G,LACHAUX M E,CONNEAU A,et al.Ccnet:Extracting high quality monolingual datasets from web crawldata [J].arXiv:1911.00359,2019.
[122]HUANG H Y,SU L,QI D.M3P:Learning Universal Representations via Multitask Multilingual Multimodal Pretraining [J].arXiv:006.02635v1,2020.
[123]RUDER,SEBASTIAN.ML and NLP Research High lights of 2020[EB/OL].http://ruder.io/research-highlights-2020,2021.
[124]ANTOUN W,BALY F,HAJJ H.AraBERT:Transformer-based Model for Arabic Language Understanding[C]//Proc.of the 4^th Workshop on Open-Source Arabic Corpora and Proces-sing Tools,with a Shared Task on Offffensive Language Detection.2020:9-15.
[125]WILIE B,VINCENTIO K,WINATA G I.IndoNLU:Bench-mark and Resources for Evaluating Indonesian Natura lLanguage Understanding[C]//Proc.of the 1^st Conference for the Asia-Pacifific Chapter of the Association for Computational Linguistics and the 10^th International Joint Conference on Natural Language Processing.2020:843-857.
[126]SONG K,TAN X,QIN T,et al.MASS:Masked sequence to sequence pre-training for language generation[C]//Proc.of the Int’l Conference on Machine Learning.2019:5926-5936.
[127]LIU Y H,GU JT,GOYAL N,et al.Multilingual denoising pre-training for neural machine translation [J].arXiv:2001.08210,2020.
[128]CHI Z W,DONG L,WEI F R.Cross-lingual natural languagegeneration via pre-training[C]//Proc.of AAAI.2020:7570-7577.
[129]SUN Z Q,YU H K.MobileBERT:a Compact Task-AgnosticBERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[130]LIU Z Y,SUN M S,LIN Y K,et al.Knowledge representation learning:A review[J].Journal of Computer Research and Deve-lopment,2016,53(2):247-261.
[131]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:acollaboratively created graph database for structuring human knowledge[C]//Proc.of the ACM SIGMOD International Conference on Management of Data.2008:1247-1250.
[132]MILLERG A.WordNet:alexical database for English[J].Communications of the ACM.1995,38:483.
[133]MITCHELL T,COHEN W,HRUSCHKA E,et al.Never Ending Language Learning[C]//Proc.of the Conference on Artifificial Intelligence.20151:103-115.
[134]LIANG X B,REN F L,LIU Y K,et al.N-reader:Machine rea-ding comprehension based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):130-137
[135]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

预训练语言模型的扩展模型研究综述

Survey of Research on Extended Models of Pre-trained Language Models

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0

[1]	聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2]	周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[4]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[5]	常炳国, 石华龙, 常雨馨. 基于深度学习的黑色素瘤智能诊断多模型算法 Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning 计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197
[6]	郝强, 李杰, 张曼, 王路. 基于改进YOLOv3的空间非合作目标部件识别算法 Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3 计算机科学, 2022, 49(6A): 358-362. https://doi.org/10.11896/jsjkx.210700048
[7]	李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134
[8]	赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[9]	李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019
[10]	刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
[11]	赵亮, 张洁, 陈志奎. 基于双图正则化的自适应多模态鲁棒特征学习 Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization 计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[12]	张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062
[13]	徐晖, 王中卿, 李寿山, 张民. 结合情感信息的个性化对话生成 Personalized Dialogue Generation Integrating Sentimental Information 计算机科学, 2022, 49(11A): 211100019-6. https://doi.org/10.11896/jsjkx.211100019
[14]	于娟, 张晨. 基于Kernel-XGBoost的跨语言术语对齐方法 Cross-lingual Term Alignment with Kernel-XGBoost 计算机科学, 2022, 49(11A): 211000111-6. https://doi.org/10.11896/jsjkx.211000111
[15]	黄玉娇, 詹李超, 范兴刚, 肖杰, 龙海霞. 基于知识蒸馏模型ELECTRA-base-BiLSTM的文本分类 Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM 计算机科学, 2022, 49(11A): 211200181-6. https://doi.org/10.11896/jsjkx.211200181