Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 210800125-12.doi: 10.11896/jsjkx.210800125

• Artificial Intelligence • Previous Articles     Next Articles

Survey of Research on Extended Models of Pre-trained Language Models

Abudukelimu ABULIZI1,2, ZHANG Yu-ning1, Alimujiang YASEN1, GUO Wen-qiang1, Abudukelimu HALIDANMU1,2   

  1. 1 School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China
    2 Institute of Silk Road Economy and Management,Xinjiang University of Finance and Economics,Urumqi 830012,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:Abudukelimu ABULIZI,born in 1983,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include cognitive neuroscience,artificial intelligence and big data mining.
    Abudukelimu HALIDANMU,born in 1978,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include artificial intelligence and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61866035,61966033),2018 High-level Talented Person Project of Department of Human Resources and Social Security of Xinjiang Uyghur Autonomous Region(40050027),2018 Tianchi Ph.D Program Scientific Research Fund of Science and Technology Department of Xinjiang Uyghur Autonomous Region(40050033) and National Key Research and Deve-lopment Program of China(2018YFC0825504).

Abstract: In recent years,the proposal of Transformer neural network has greatly promoted the development of pre-training technology.At present,pre-training models based on deep learning have become a research hotspot in the field of natural language processing.Since the end of 2018,BERT has achieved optimal results in multiple natural language processing tasks.A series of improved pre-training models based on BERT have been proposed one after another,and pre-training model extension models designed for various scenarios have also appeared.The expansion of pre-training models from single-language to tasks such as cross-language,multi-modality,and light-weighting has enabled natural language processing to enter a new era of pre-training.This paper mainly summarizes the research methods and research conclusions of lightweight pre-training models,knowledge-incorporated pre-training models,cross-modal pre-training language models and cross-language pre-training language models,as well as the main challenges faced by the pre-training model expansion model.In summary,four research trends for the possible development of extended models are proposed to provide theoretical support for beginners who learn and understand pre-training models.

Key words: Natural language processing, Pre-training, Lightweight, Knowledge-incorporated, Cross-modal, Cross-language

CLC Number: 

  • TP391
[1]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of Conference on Computational Linguistics:Human Language Technologies.2019:4171-4186.
[2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed rep-resentations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.2013:3111-3119.
[3]PENNINGTON J,SOCHER R,MANNING CD.GloVe:Global vectors forword representation[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1532-1543.
[4]MCCANN B,BRADBURY J,XIONG C M,et al.Learned intranslation:Contextualized word vectors[C]//Proc.of the 31st International Conference on Neural Information Processing Systems.2017:6297-6308.
[5]PETERS M,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[C]//Proc.of Conference on the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237
[6]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].[2021-07-03].https://openai.com/blog/language-unsupervised/
[7]BAEVSKI A,EDUNOV S,LIU Y H,et al.Cloze-driven pre-training of self-attention networks[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019.
[8]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[EB/OL].[2021-07-03].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[9]BROWN TB,MANN B,RYDER N,et al.Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33(NeurIPS2020).2020:1877-1901.
[10]FEDUS W,ZOPH B,SHAZEER N.Switch Transformers:Sca-ling to Trillion Parameter Models with Simple an deficient Sparsity [J].arXiv:2101.03961,2021.
[11]BA J,CARUANA R.Do deep nets really need to be deep?[C]//Proc.of the 27th international Conference on Neural Information Processing Systems.2014:2654-2662.
[12]DENTON M L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc.of the 27th International Conference on Neural Information Processing Systems.2014:1269-1277.
[13]GORDON M A,DUH K,ANDREWS N.Compressing Bert:Studying the effffects of weight pruning on transfer learning [J].arXiv:2002.08307,2020.
[14]SOHONI N S,ABERGER C R,LESZCZYNSKI M,et al.Low-memory neural network training:A technical report[C]//Proc.of Conference on annual event of the European Federation of Corrosion.2019.
[15]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[16]SUN S Q,CHENG Y,GAN Z,et al.Patient knowledge distil-lation for BERT model compression[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proces-sing.2019:4323-4332
[17]WANG N Y,YE Y X,LIU L,et al.Language models based on deep learning:A review[J].Ruan Jian Xue Bao/Journal of Software,2021,32(4):1082-1115
[18]BUCILA C,CARUANA R,NICULESCU-MIZIL A.Modelcompression[C]//Proc.of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:535-541.
[19]XU C W,ZHOU W,G E T,et al.Bert-of-theseus:Compressing bert by progressive module replacing [J].arXiv:2002.02925,2020.
[20]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[21]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[22]SUN Z Q,YU H K.MobileBERT:a Compact Task-Agnostic BERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[23]TURC I,CHANG M W,LEE K,et al.Well-read students learn better:The impact of student initialization on knowledge distillation [J].arXiv:1908.08962,2019.
[24]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[25]ZAFRIR O,BOUDOUKH G,IZSA K,et al.Q8bert:Quantized 8 bit bert[C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019.
[26]SHEN S,DONG Z,YE J Y,et al.Q-bert:Hessian based ultra low precision quantization of bert[C]//Proc.of AAAI.2020:8815-8821.
[27]PRATO G,CHARLAIX E,REZAGHOLIZADEH M.Fullyquantized transformer for machine translation[C]//Proc.of the Conference on Empirical Methods in Natural Language Processing:Findings.2020:1-14.
[28]WANG W H,WEI F R,DONG L,et al.MiniLM:Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [J].Advances in Neural Information Processing System,2020,33:5776-5788.
[29]LAN Z Z,CHEN M D.Albert:Alite bert for self-supervised learning of language representations [J].arXiv:1909.11942,2019.
[30]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators[C]//Proc.of the Int’l Conference on Learning Representations.2019.
[31]XIN J,TANG R,LEE J,et al.DeeBERT:Dynamic Early Exiting for Accelerating BERT Inference[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2246-2251.
[32]HE Y,ZHANG X,SUN J.Channel pruning for accelerating very deep neural networks[C]//Proc.of the IEEE International Conference on Computer Vision.Venice,2017:1389-1397.
[33]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[C]//Proc.of International Conference on Learning Representations.2016.
[34]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proc.of the IEEE International Conference on Computer Vision.2017:5058-5066.
[35]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proc.of International Conference on Lear-ning Representations.2019.
[36]WANG YL,ZHANG XL,XIE LX,et al.Pruning from Scratch [J].arXiv:1909.12579v1,2019.
[37]WANG A,SINGH A,MICHAEL J,et al.Glue:A multi-taskbenchmark and analysis platform for natural language understanding [J].arXiv:1804.07461,2018.
[38]MCCARLEY J S,CHAKRAVARTI R,SIL A.Structured Pru-ning of a BERT-based Question Answering Model [J].arXiv:1910.06360v2,2020.
[39]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024.
[40]GORDON M,DUH K,ANDREWS N.CompressingBERT:Studying the Effffects of Weight Pruning on Transfer Learning [J].arXiv:2002.08307,2020.
[41]FRANKLE J,CARBIN M.Thelottery ticket hypothesis:Finding sparse,trainable neural networks[C]//Proc.of the Seventh International Conference on Learning Representations.2019.
[42]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[C]//Proc.of Deep Learning Workshop on NIPS.2014.
[43]WANG W,WEI F,DONG L,et al.Minilm:Deepself-attentiondistillation for task-agnostic compression of pre-trained transformers [J].arXiv:2002.10957,2020.
[44]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019.
[45]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174.
[46]KRISHNAMOORTHI R.Quantizing deep convolutional net-works for efficient inference:Awhitepaper [J].arXiv:1806.08342,2018.
[47]ZHANG D,YANG J,YE D,et al.LQ-nets:Learned quantization for highly accurateand compact deep neural networks[C]//Proc.of the 15th European Conference on Computer Vision.2018:365-382.
[48]DONG Z,YAO Z,GHOLAMI A M.et al.HAWQ:HessianAWare Quantization of Neural Networks with Mixed-Precision[C]//Proc.of International Conference on Computer Vision(ICCV).2019.
[49]WU B,WANG Y,ZHANG P,et al.Mixed Precision Quantization of ConvNets via Difffferentiable Neural Architecture Search[C]//Proc.of ICLR.2019.
[50]LAI G K,XIE Q Z,LIU H X,et al.Race:Large-scale reading comprehension dataset from examinations[C]//Proc.of Empirical Methods in Natural Language Processing.2017:785-794.
[51]SHOEYBI M,PATWARY M,PURI R.Megatron-LM:Training Multi-Billion Parameter Language Models Using Model Parall-elism[J].arXiv:1909.08053,2019.
[52]CHILD R,GRAY S,RADFORD A,et al.Generating Long Sequences with Sparse Transformers[J].arXiv:1904.10509,2019.
[53]SUKHBAATAR S,GRAVE E,BOJANOWSKI P,et al.Adaptive Attention Span in Transformers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019.
[54]WYNTER A D,PERRY D J.Optimal Subarchitecture Extraction For BERT[J].arXiv:2010.10499,2020.
[55]WU Z,LIU Z,LIN J,et al.Lite Transformer with Long-Short Range Attention[J].arXiv:2004.11886,2020.
[56]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeezenet:Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[J].arXiv:1602.07360,2016.
[57]HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04816,2017.
[58]ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018.
[59]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017.
[60]HE P C,LIU X D,GAO J F.Deberta:decoding-enhanced bert with disentangled attention [J].arXiv:2006.03654,2020.
[61]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistic.2020:8440-8451.
[62]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]//Proc.of the 26th International Conference on Neural Information Processing Systems.2013:2787-2795.
[63]XIN J,ZHU H,HAN X,et al.Putitback:Entity typing with language model enhancement[C]//Proc.of Conference on Empirical Methodsin Natural Language Processing.2018:993-998.
[64]YAGHOOBZADEH Y,SCHÜTZE H.Multilevelre presenta-tions for fifine-grained typing of knowledge base entities[C]//Proc.of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2017:578-589.
[65]YAMADA I,SHINDO H,TAKEDA H,et al.Joint learning of the embedding of words and entities for named entity disambiguation[C]//Proc.of the 20th SIGNLL Conference on Computational Natural Language Learning(CoNLL).2016:250-259.
[66]SUN Y,WANG S H,LI Y K,et al.ERNIE2.0:A continual pretraining framework for language understanding[C]//Proc.of AAAI.2019.
[67]LIU WJ,ZHOU P,ZHAO Z,et al.K-BERT:Enabling language representation with knowledge graph[C]//Proc.of AAAI.2019.
[68]PETERS ME,NEUMANN M,LOGAN IVRL,et al.Knowledge enhanced contextual word representations[C]//Proc.of Confe-rence on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conference on Natural Language Processing(EMNLPIJCNLP).2019:43-54.
[69]WANG R,TANG D,DUAN N,et al.K-Adapter:InfusingKnowledge into Pre-Trained Models with Adapters[J].arXiv:2002.01808,2020.
[70]CUI Y,CHE W,LIU T,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].arXiv:1906.08101,2019.
[71]JOSHI M,CHEN D,LIU Y,et al.SpanBERT:Improving Pre-training by Representing and Predicting Spans[J].arXiv:1907.10529,2019.
[72]DIAO S,BAI J,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Findings of the Association for Computational Linguistics(EMNLP 2020).2020.
[73]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//Proc.of the Neural Information Processing Systems.2012:1106-1114.
[74]CARON M,BOJANOWSKI P,JOULIN A,et al.Deep cluste-ring forun supervised learning of visual features[C]//Proc.of Computer Visio-ECCV.2018:139-156.
[75]HAN K,XIAO A,WU E H,et al.Transformer in Transformer [J].Advances in Neural Information Processing System,2021,34:15908-15919.
[76]SUN C,MYERS A,VONDRICK C,et al.VideoBERT:A joint model for video and language representation learning[C]//Proc.of the IEEE Int’l Conference on Computer Vision.2019:7464-7473.
[77]ZHU L C,YANG Y.ActBERT:Learning Global-Local Video-Text Representations[C]//Proc.of CVPR.2020:8746-8755.
[78]HUO Y Q,ZHANG M L,LIU G Z.WenLan:Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [J].arXiv:2103.06561v5,2021.
[79]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations [J].arXiv:1909.11740,2019.
[80]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representation[C]//Proc.of the 37th International Conference on Machine Learning(ICML 2020).2020:1575-1585.
[81]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[82]JIA C,YANG YF,XIA Y,et al.Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision[C]//International Conference on Machine Learning.PMLR,2021:4904-4916.
[83]SHARMA P,DING N,GOODMANS,et al.Conceptual cap-tions:A cleaned,hypernymed,imagealt-textdataset for automaticimage captioning[C]//Proc.of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565.
[84]ORDONEZ V,KULKARNI G,BERG T L.Im2text:Describing images using 1 million captioned photographs[C]//Proc.of Neural Information Processing Systems.2011:1143-1151.
[85]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:AligningImage Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[86]YU F,TANG J J,YIN W C,et al.ERNIE-ViL:Knowledge Enhanced Vision-Language Representations Through Scene Graph [J].arXiv:2006.16934,2020.
[87]VINCENT P,LAROCHELLE H,BENGIO Y O,et al.Extracting and composing robust features withde noising autoencoders[C]//Proc.of the 25th International Conference on Machine Learning.2008:1096-1103.
[88]LI L H,ATSKAR M Y,YIN D,et al.Visualbert:A simple and performant baseline for vision and language [J].arXiv:1908.03557,2019.
[89]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99.
[90]LU J,BATRA D,PARIKH D,et al.VilBERT:Pretraining task-agnostic visio linguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23.
[91]SU W,ZHU X,CAO Y,et al.Vl-BERT:Pre-training of generic visual-linguistic representations[C]//Proc.of the 8th Int’l Conference on Learning Representations.2020.
[92]LI G,DUAN N,FANG Y,et al.Unicoder-vl:Auniversal encoder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344.
[93]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:Aligning Image Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020.
[94]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[95]QI D,SU L,SONG J,et al.Image BERT:Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data [J].arXiv:2001.07966,2020.
[96]GAN Z,CHEN Y C,LI L J,et al.Large-scale adversarial training forvision-and-language representation learning[J].Advances in Neural Information Processing Systems,2020,33:6616-6628.
[97]TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations fromtrans formers[C]//Proc.of Conference on Empirical Methods in Natural Language Processing.2019.
[98]LI W,GAO C,NIU G C,et al.UNIMO:Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning [J].arXiv:2012.15409v1,2020.
[99]LI XJ,YIN X,LI CY,et al.Oscar:Object semantics aligned pre-training for vision-language tasks[C]//Proc.of the European Conference on Computer Vision.2020:121-137.
[100]ZHOU L W,PALANGI H,ZHANG L,et al.Unifified visionlanguage pre-training for image captioning and VQA[C]//Proc.of the SemEval workshop at ACL.2017:13041-13049.
[101]LU JS,GOSWAMI VE,ROHRBACH M,et al.12-in-1:Multi-task vision and languagere presentation learning[C]//Proc.of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020.
[102]SUN C,BARADE LF,MURPHY K,et al.Contrastive bidirectional transformer for temporal representation learning [J].arXiv:1906.05743,2019.
[103]LI TH,LI M.Learning spatiotemporal features via video and text paird is crimination [J].arXiv:2001.05691,2020.
[104]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proc.of the IEEE international Conference on computer vision.2015:2425-2433.
[105]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[106]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the v in vqa matter:Elevating the role of image understanding invisual question answering[C]//Proc.of Computer Vision and Pattern Recognition(CVPR).2017.
[107]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731.
[108]PLUMMER B A,WANG L W,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for riche image-to-sentence models[C]//Proc.of ICCV.2015.
[109]KAZEMZADEH S,ORDONEZ V,MATTEN M,et al.Refer it game:Referring to objects in photographs of natural scenes[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:787-798.
[110]CHUNG H W,FEVRY T,TSAI H,et al.Rethinking embedding couplingin pre-trained language models[C]//Proc.of ICLR 2021.2021.
[111]CHI Z W,DONG L,WEI F R,et al.INFOXLM:An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [J].arXiv:2007.07834v1,2020.
[112]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019.
[113]OUYANG X,WANG S H,PANG C,et al.ERNIE-M:Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora [J].arXiv:2012.15674,2021.
[114]TIWARY S,ZHOU M.T-ULRv2[EB/OL].2020.https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal language-representation-model-t-ulrv2-tops-xtreme-leader-board/?lang=frca.
[115]DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive language models beyond a fixed-length context[C]//Proc.of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2978-2988.
[116]FANG Y W,WANG S H,GAN Z,et al.FILTER:An Enhanced Fusion Method for Cross-lingual Language Understanding[C]//Proc.of Association for the Advancement of Artifificial Intelligence.2020.
[117]PHANG J,HTUT P M,PRUKSACHATKUN Y,et al.English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too [J].arXiv:2005.13013v1,2020.
[118]PIRES T,SCHLINGER E,GARRETTE D,et al.How multilingual is Multilingual BERT? [J].arXiv:1906.01502v1,2019.
[119]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proc.of the 31st Conference on Neural Information Processing Systems.2017:5998-6008.
[120]DELOBELLE P,WINTERS T,BERENDT B.RobBERT:aDutch RoBERTa-based language model [J].arXiv:2001.06286,2020.
[121]WENZEK G,LACHAUX M E,CONNEAU A,et al.Ccnet:Extracting high quality monolingual datasets from web crawldata [J].arXiv:1911.00359,2019.
[122]HUANG H Y,SU L,QI D.M3P:Learning Universal Representations via Multitask Multilingual Multimodal Pretraining [J].arXiv:006.02635v1,2020.
[123]RUDER,SEBASTIAN.ML and NLP Research High lights of 2020[EB/OL].http://ruder.io/research-highlights-2020,2021.
[124]ANTOUN W,BALY F,HAJJ H.AraBERT:Transformer-based Model for Arabic Language Understanding[C]//Proc.of the 4th Workshop on Open-Source Arabic Corpora and Proces-sing Tools,with a Shared Task on Offffensive Language Detection.2020:9-15.
[125]WILIE B,VINCENTIO K,WINATA G I.IndoNLU:Bench-mark and Resources for Evaluating Indonesian Natura lLanguage Understanding[C]//Proc.of the 1st Conference for the Asia-Pacifific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing.2020:843-857.
[126]SONG K,TAN X,QIN T,et al.MASS:Masked sequence to sequence pre-training for language generation[C]//Proc.of the Int’l Conference on Machine Learning.2019:5926-5936.
[127]LIU Y H,GU JT,GOYAL N,et al.Multilingual denoising pre-training for neural machine translation [J].arXiv:2001.08210,2020.
[128]CHI Z W,DONG L,WEI F R.Cross-lingual natural languagegeneration via pre-training[C]//Proc.of AAAI.2020:7570-7577.
[129]SUN Z Q,YU H K.MobileBERT:a Compact Task-AgnosticBERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170.
[130]LIU Z Y,SUN M S,LIN Y K,et al.Knowledge representation learning:A review[J].Journal of Computer Research and Deve-lopment,2016,53(2):247-261.
[131]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:acollaboratively created graph database for structuring human knowledge[C]//Proc.of the ACM SIGMOD International Conference on Management of Data.2008:1247-1250.
[132]MILLERG A.WordNet:alexical database for English[J].Communications of the ACM.1995,38:483.
[133]MITCHELL T,COHEN W,HRUSCHKA E,et al.Never Ending Language Learning[C]//Proc.of the Conference on Artifificial Intelligence.20151:103-115.
[134]LIANG X B,REN F L,LIU Y K,et al.N-reader:Machine rea-ding comprehension based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):130-137
[135]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016.
[1] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[2] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[3] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[4] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[5] HAO Qiang, LI Jie, ZHANG Man, WANG Lu. Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3 [J]. Computer Science, 2022, 49(6A): 358-362.
[6] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[7] ZHAO Dan-dan, HUANG De-gen, MENG Jia-na, DONG Yu, ZHANG Pan. Chinese Entity Relations Classification Based on BERT-GRU-ATT [J]. Computer Science, 2022, 49(6): 319-325.
[8] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
[9] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[10] XU Hui, WANG Zhong-qing, LI Shou-shan, ZHANG Min. Personalized Dialogue Generation Integrating Sentimental Information [J]. Computer Science, 2022, 49(11A): 211100019-6.
[11] HUANG Yu-jiao, ZHAN Li-chao, FAN Xing-gang, XIAO Jie, LONG Hai-xia. Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM [J]. Computer Science, 2022, 49(11A): 211200181-6.
[12] ZHANG Bin, LIU Chang-hong, ZENG Sheng, JIE An-quan. Speech-driven Personal Style Gesture Generation Method Based on Spatio-Temporal GraphConvolutional Networks [J]. Computer Science, 2022, 49(11A): 210900094-5.
[13] HU Xin-rong, CHEN Zhi-heng, LIU Jun-ping, PENG Tao, YE Peng, ZHU Qiang. Sentiment Analysis Framework Based on Multimodal Representation Learning [J]. Computer Science, 2022, 49(11A): 210900107-6.
[14] CHEN Qiao-song, HE Xiao-yang, XU Wen-jie, DENG Xin, WANG Jin, PIAO Chang-hao. Reentrancy Vulnerability Detection Based on Pre-training Technology and Expert Knowledge [J]. Computer Science, 2022, 49(11A): 211200182-8.
[15] HE Peng-hao, YU Ying, XU Chao-yue. Image Super-resolution Reconstruction Network Based on Dynamic Pyramid and Subspace Attention [J]. Computer Science, 2022, 49(11A): 210900202-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!