计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210800125-12.doi: 10.11896/jsjkx.210800125
阿布都克力木·阿布力孜1,2, 张雨宁1, 阿力木江·亚森1, 郭文强1, 哈里旦木·阿布都克里木1,2
Abudukelimu ABULIZI1,2, ZHANG Yu-ning1, Alimujiang YASEN1, GUO Wen-qiang1, Abudukelimu HALIDANMU1,2
摘要: 近些年,Transformer神经网络的提出,大大推动了预训练技术的发展。目前,基于深度学习的预训练模型已成为了自然语言处理领域的研究热点。自2018年底BERT在多个自然语言处理任务中达到了最优效果以来,一系列基于BERT改进的预训练模型相继被提出,也出现了针对各种场景而设计的预训练模型扩展模型。预训练模型从单语言扩展到跨语言、多模态、轻量化等任务,使得自然语言处理进入了一个全新的预训练时代。主要对轻量化预训练模型、融入知识的预训练模型、跨模态预训练语言模型、跨语言预训练语言模型的研究方法和研究结论进行梳理,并对预训练模型扩展模型面临的主要挑战进行总结,提出了4种扩展模型可能发展的研究趋势,为学习和理解预训练模型的初学者提供理论支持。
中图分类号:
[1]DEVLIN J,CHANG MW,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of Conference on Computational Linguistics:Human Language Technologies.2019:4171-4186. [2]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed rep-resentations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.2013:3111-3119. [3]PENNINGTON J,SOCHER R,MANNING CD.GloVe:Global vectors forword representation[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:1532-1543. [4]MCCANN B,BRADBURY J,XIONG C M,et al.Learned intranslation:Contextualized word vectors[C]//Proc.of the 31st International Conference on Neural Information Processing Systems.2017:6297-6308. [5]PETERS M,NEUMANN M,IYYER M,et al.Deep contextua-lized word representations[C]//Proc.of Conference on the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237 [6]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].[2021-07-03].https://openai.com/blog/language-unsupervised/ [7]BAEVSKI A,EDUNOV S,LIU Y H,et al.Cloze-driven pre-training of self-attention networks[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019. [8]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[EB/OL].[2021-07-03].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf. [9]BROWN TB,MANN B,RYDER N,et al.Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33(NeurIPS2020).2020:1877-1901. [10]FEDUS W,ZOPH B,SHAZEER N.Switch Transformers:Sca-ling to Trillion Parameter Models with Simple an deficient Sparsity [J].arXiv:2101.03961,2021. [11]BA J,CARUANA R.Do deep nets really need to be deep?[C]//Proc.of the 27th international Conference on Neural Information Processing Systems.2014:2654-2662. [12]DENTON M L,ZAREMBA W,BRUNA J,et al.Exploiting linear structure within convolutional networks for efficient evaluation[C]//Proc.of the 27th International Conference on Neural Information Processing Systems.2014:1269-1277. [13]GORDON M A,DUH K,ANDREWS N.Compressing Bert:Studying the effffects of weight pruning on transfer learning [J].arXiv:2002.08307,2020. [14]SOHONI N S,ABERGER C R,LESZCZYNSKI M,et al.Low-memory neural network training:A technical report[C]//Proc.of Conference on annual event of the European Federation of Corrosion.2019. [15]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024. [16]SUN S Q,CHENG Y,GAN Z,et al.Patient knowledge distil-lation for BERT model compression[C]//Proc.of Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Proces-sing.2019:4323-4332 [17]WANG N Y,YE Y X,LIU L,et al.Language models based on deep learning:A review[J].Ruan Jian Xue Bao/Journal of Software,2021,32(4):1082-1115 [18]BUCILA C,CARUANA R,NICULESCU-MIZIL A.Modelcompression[C]//Proc.of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:535-541. [19]XU C W,ZHOU W,G E T,et al.Bert-of-theseus:Compressing bert by progressive module replacing [J].arXiv:2002.02925,2020. [20]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174. [21]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019. [22]SUN Z Q,YU H K.MobileBERT:a Compact Task-Agnostic BERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170. [23]TURC I,CHANG M W,LEE K,et al.Well-read students learn better:The impact of student initialization on knowledge distillation [J].arXiv:1908.08962,2019. [24]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019. [25]ZAFRIR O,BOUDOUKH G,IZSA K,et al.Q8bert:Quantized 8 bit bert[C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019. [26]SHEN S,DONG Z,YE J Y,et al.Q-bert:Hessian based ultra low precision quantization of bert[C]//Proc.of AAAI.2020:8815-8821. [27]PRATO G,CHARLAIX E,REZAGHOLIZADEH M.Fullyquantized transformer for machine translation[C]//Proc.of the Conference on Empirical Methods in Natural Language Processing:Findings.2020:1-14. [28]WANG W H,WEI F R,DONG L,et al.MiniLM:Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [J].Advances in Neural Information Processing System,2020,33:5776-5788. [29]LAN Z Z,CHEN M D.Albert:Alite bert for self-supervised learning of language representations [J].arXiv:1909.11942,2019. [30]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators[C]//Proc.of the Int’l Conference on Learning Representations.2019. [31]XIN J,TANG R,LEE J,et al.DeeBERT:Dynamic Early Exiting for Accelerating BERT Inference[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2246-2251. [32]HE Y,ZHANG X,SUN J.Channel pruning for accelerating very deep neural networks[C]//Proc.of the IEEE International Conference on Computer Vision.Venice,2017:1389-1397. [33]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[C]//Proc.of International Conference on Learning Representations.2016. [34]LUO J H,WU J,LIN W.Thinet:A filter level pruning method for deep neural network compression[C]//Proc.of the IEEE International Conference on Computer Vision.2017:5058-5066. [35]LIU Z,SUN M,ZHOU T,et al.Rethinking the value of network pruning[C]//Proc.of International Conference on Lear-ning Representations.2019. [36]WANG YL,ZHANG XL,XIE LX,et al.Pruning from Scratch [J].arXiv:1909.12579v1,2019. [37]WANG A,SINGH A,MICHAEL J,et al.Glue:A multi-taskbenchmark and analysis platform for natural language understanding [J].arXiv:1804.07461,2018. [38]MCCARLEY J S,CHAKRAVARTI R,SIL A.Structured Pru-ning of a BERT-based Question Answering Model [J].arXiv:1910.06360v2,2020. [39]MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one? [C]//Proc.of Thirty-third Conference on Neural Information Processing Systems.2019:14014-14024. [40]GORDON M,DUH K,ANDREWS N.CompressingBERT:Studying the Effffects of Weight Pruning on Transfer Learning [J].arXiv:2002.08307,2020. [41]FRANKLE J,CARBIN M.Thelottery ticket hypothesis:Finding sparse,trainable neural networks[C]//Proc.of the Seventh International Conference on Learning Representations.2019. [42]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[C]//Proc.of Deep Learning Workshop on NIPS.2014. [43]WANG W,WEI F,DONG L,et al.Minilm:Deepself-attentiondistillation for task-agnostic compression of pre-trained transformers [J].arXiv:2002.10957,2020. [44]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaperandlighter [J].arXiv:1910.01108,2019. [45]JIAO X Q,YIN Y C.Tiny BERT:Distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics:EMNLP.2020:4163-4174. [46]KRISHNAMOORTHI R.Quantizing deep convolutional net-works for efficient inference:Awhitepaper [J].arXiv:1806.08342,2018. [47]ZHANG D,YANG J,YE D,et al.LQ-nets:Learned quantization for highly accurateand compact deep neural networks[C]//Proc.of the 15th European Conference on Computer Vision.2018:365-382. [48]DONG Z,YAO Z,GHOLAMI A M.et al.HAWQ:HessianAWare Quantization of Neural Networks with Mixed-Precision[C]//Proc.of International Conference on Computer Vision(ICCV).2019. [49]WU B,WANG Y,ZHANG P,et al.Mixed Precision Quantization of ConvNets via Difffferentiable Neural Architecture Search[C]//Proc.of ICLR.2019. [50]LAI G K,XIE Q Z,LIU H X,et al.Race:Large-scale reading comprehension dataset from examinations[C]//Proc.of Empirical Methods in Natural Language Processing.2017:785-794. [51]SHOEYBI M,PATWARY M,PURI R.Megatron-LM:Training Multi-Billion Parameter Language Models Using Model Parall-elism[J].arXiv:1909.08053,2019. [52]CHILD R,GRAY S,RADFORD A,et al.Generating Long Sequences with Sparse Transformers[J].arXiv:1904.10509,2019. [53]SUKHBAATAR S,GRAVE E,BOJANOWSKI P,et al.Adaptive Attention Span in Transformers[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019. [54]WYNTER A D,PERRY D J.Optimal Subarchitecture Extraction For BERT[J].arXiv:2010.10499,2020. [55]WU Z,LIU Z,LIN J,et al.Lite Transformer with Long-Short Range Attention[J].arXiv:2004.11886,2020. [56]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Squeezenet:Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[J].arXiv:1602.07360,2016. [57]HOWARD A G,ZHU M,CHEN B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04816,2017. [58]ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:An Extremely Efficient Convolutional Neural Network for Mobile Devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2018. [59]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017. [60]HE P C,LIU X D,GAO J F.Deberta:decoding-enhanced bert with disentangled attention [J].arXiv:2006.03654,2020. [61]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistic.2020:8440-8451. [62]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]//Proc.of the 26th International Conference on Neural Information Processing Systems.2013:2787-2795. [63]XIN J,ZHU H,HAN X,et al.Putitback:Entity typing with language model enhancement[C]//Proc.of Conference on Empirical Methodsin Natural Language Processing.2018:993-998. [64]YAGHOOBZADEH Y,SCHÜTZE H.Multilevelre presenta-tions for fifine-grained typing of knowledge base entities[C]//Proc.of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2017:578-589. [65]YAMADA I,SHINDO H,TAKEDA H,et al.Joint learning of the embedding of words and entities for named entity disambiguation[C]//Proc.of the 20th SIGNLL Conference on Computational Natural Language Learning(CoNLL).2016:250-259. [66]SUN Y,WANG S H,LI Y K,et al.ERNIE2.0:A continual pretraining framework for language understanding[C]//Proc.of AAAI.2019. [67]LIU WJ,ZHOU P,ZHAO Z,et al.K-BERT:Enabling language representation with knowledge graph[C]//Proc.of AAAI.2019. [68]PETERS ME,NEUMANN M,LOGAN IVRL,et al.Knowledge enhanced contextual word representations[C]//Proc.of Confe-rence on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conference on Natural Language Processing(EMNLPIJCNLP).2019:43-54. [69]WANG R,TANG D,DUAN N,et al.K-Adapter:InfusingKnowledge into Pre-Trained Models with Adapters[J].arXiv:2002.01808,2020. [70]CUI Y,CHE W,LIU T,et al.Pre-Training with Whole Word Masking for Chinese BERT[J].arXiv:1906.08101,2019. [71]JOSHI M,CHEN D,LIU Y,et al.SpanBERT:Improving Pre-training by Representing and Predicting Spans[J].arXiv:1907.10529,2019. [72]DIAO S,BAI J,SONG Y,et al.ZEN:Pre-training Chinese Text Encoder Enhanced by N-gram Representations[C]//Findings of the Association for Computational Linguistics(EMNLP 2020).2020. [73]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassifification with deep convolutional neural networks[C]//Proc.of the Neural Information Processing Systems.2012:1106-1114. [74]CARON M,BOJANOWSKI P,JOULIN A,et al.Deep cluste-ring forun supervised learning of visual features[C]//Proc.of Computer Visio-ECCV.2018:139-156. [75]HAN K,XIAO A,WU E H,et al.Transformer in Transformer [J].Advances in Neural Information Processing System,2021,34:15908-15919. [76]SUN C,MYERS A,VONDRICK C,et al.VideoBERT:A joint model for video and language representation learning[C]//Proc.of the IEEE Int’l Conference on Computer Vision.2019:7464-7473. [77]ZHU L C,YANG Y.ActBERT:Learning Global-Local Video-Text Representations[C]//Proc.of CVPR.2020:8746-8755. [78]HUO Y Q,ZHANG M L,LIU G Z.WenLan:Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [J].arXiv:2103.06561v5,2021. [79]CHEN Y C,LI L,YU L,et al.Uniter:Learning universal image-text representations [J].arXiv:1909.11740,2019. [80]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representation[C]//Proc.of the 37th International Conference on Machine Learning(ICML 2020).2020:1575-1585. [81]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763. [82]JIA C,YANG YF,XIA Y,et al.Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision[C]//International Conference on Machine Learning.PMLR,2021:4904-4916. [83]SHARMA P,DING N,GOODMANS,et al.Conceptual cap-tions:A cleaned,hypernymed,imagealt-textdataset for automaticimage captioning[C]//Proc.of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2556-2565. [84]ORDONEZ V,KULKARNI G,BERG T L.Im2text:Describing images using 1 million captioned photographs[C]//Proc.of Neural Information Processing Systems.2011:1143-1151. [85]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:AligningImage Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020. [86]YU F,TANG J J,YIN W C,et al.ERNIE-ViL:Knowledge Enhanced Vision-Language Representations Through Scene Graph [J].arXiv:2006.16934,2020. [87]VINCENT P,LAROCHELLE H,BENGIO Y O,et al.Extracting and composing robust features withde noising autoencoders[C]//Proc.of the 25th International Conference on Machine Learning.2008:1096-1103. [88]LI L H,ATSKAR M Y,YIN D,et al.Visualbert:A simple and performant baseline for vision and language [J].arXiv:1908.03557,2019. [89]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems.2015:91-99. [90]LU J,BATRA D,PARIKH D,et al.VilBERT:Pretraining task-agnostic visio linguistic representations for vision-and-language tasks[C]//Advances in Neural Information Processing Systems.2019:13-23. [91]SU W,ZHU X,CAO Y,et al.Vl-BERT:Pre-training of generic visual-linguistic representations[C]//Proc.of the 8th Int’l Conference on Learning Representations.2020. [92]LI G,DUAN N,FANG Y,et al.Unicoder-vl:Auniversal encoder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344. [93]HUANG Z H,ZENG Z Y,LIU B,et al.Pixel-BERT:Aligning Image Pixel swith Text by Deep Multi-Modal Transformers [J].arXiv:2004.00849,2020. [94]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [95]QI D,SU L,SONG J,et al.Image BERT:Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data [J].arXiv:2001.07966,2020. [96]GAN Z,CHEN Y C,LI L J,et al.Large-scale adversarial training forvision-and-language representation learning[J].Advances in Neural Information Processing Systems,2020,33:6616-6628. [97]TAN H,BANSAL M.Lxmert:Learning cross-modality encoder representations fromtrans formers[C]//Proc.of Conference on Empirical Methods in Natural Language Processing.2019. [98]LI W,GAO C,NIU G C,et al.UNIMO:Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning [J].arXiv:2012.15409v1,2020. [99]LI XJ,YIN X,LI CY,et al.Oscar:Object semantics aligned pre-training for vision-language tasks[C]//Proc.of the European Conference on Computer Vision.2020:121-137. [100]ZHOU L W,PALANGI H,ZHANG L,et al.Unifified visionlanguage pre-training for image captioning and VQA[C]//Proc.of the SemEval workshop at ACL.2017:13041-13049. [101]LU JS,GOSWAMI VE,ROHRBACH M,et al.12-in-1:Multi-task vision and languagere presentation learning[C]//Proc.of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020. [102]SUN C,BARADE LF,MURPHY K,et al.Contrastive bidirectional transformer for temporal representation learning [J].arXiv:1906.05743,2019. [103]LI TH,LI M.Learning spatiotemporal features via video and text paird is crimination [J].arXiv:2001.05691,2020. [104]ANTOL S,AGRAWAL A,LU J,et al.Vqa:Visual question answering[C]//Proc.of the IEEE international Conference on computer vision.2015:2425-2433. [105]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731. [106]GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the v in vqa matter:Elevating the role of image understanding invisual question answering[C]//Proc.of Computer Vision and Pattern Recognition(CVPR).2017. [107]ZELLERS R,BISK Y O,FARHADI A,et al.From recognition to cognition:Visual commonsense reasoning[C]//Proc.of the IEEE Conference on Computer Vision and Pattern Recognition.2019:6720-6731. [108]PLUMMER B A,WANG L W,CERVANTES C M,et al.Flickr30k entities:Collecting region-to-phrase correspondences for riche image-to-sentence models[C]//Proc.of ICCV.2015. [109]KAZEMZADEH S,ORDONEZ V,MATTEN M,et al.Refer it game:Referring to objects in photographs of natural scenes[C]//Proc.of Conference on Empirical Methods in Natural Language Processing(EMNLP).2014:787-798. [110]CHUNG H W,FEVRY T,TSAI H,et al.Rethinking embedding couplingin pre-trained language models[C]//Proc.of ICLR 2021.2021. [111]CHI Z W,DONG L,WEI F R,et al.INFOXLM:An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [J].arXiv:2007.07834v1,2020. [112]ZHAO S Q,GUPTA R.Extreme Language Model Compression with Optimal Subwords and Shared Projections [J].arXiv:1909.11687,2019. [113]OUYANG X,WANG S H,PANG C,et al.ERNIE-M:Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora [J].arXiv:2012.15674,2021. [114]TIWARY S,ZHOU M.T-ULRv2[EB/OL].2020.https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal language-representation-model-t-ulrv2-tops-xtreme-leader-board/?lang=frca. [115]DAI Z,YANG Z,YANG Y,et al.Transformer-XL:Attentive language models beyond a fixed-length context[C]//Proc.of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2978-2988. [116]FANG Y W,WANG S H,GAN Z,et al.FILTER:An Enhanced Fusion Method for Cross-lingual Language Understanding[C]//Proc.of Association for the Advancement of Artifificial Intelligence.2020. [117]PHANG J,HTUT P M,PRUKSACHATKUN Y,et al.English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too [J].arXiv:2005.13013v1,2020. [118]PIRES T,SCHLINGER E,GARRETTE D,et al.How multilingual is Multilingual BERT? [J].arXiv:1906.01502v1,2019. [119]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proc.of the 31st Conference on Neural Information Processing Systems.2017:5998-6008. [120]DELOBELLE P,WINTERS T,BERENDT B.RobBERT:aDutch RoBERTa-based language model [J].arXiv:2001.06286,2020. [121]WENZEK G,LACHAUX M E,CONNEAU A,et al.Ccnet:Extracting high quality monolingual datasets from web crawldata [J].arXiv:1911.00359,2019. [122]HUANG H Y,SU L,QI D.M3P:Learning Universal Representations via Multitask Multilingual Multimodal Pretraining [J].arXiv:006.02635v1,2020. [123]RUDER,SEBASTIAN.ML and NLP Research High lights of 2020[EB/OL].http://ruder.io/research-highlights-2020,2021. [124]ANTOUN W,BALY F,HAJJ H.AraBERT:Transformer-based Model for Arabic Language Understanding[C]//Proc.of the 4th Workshop on Open-Source Arabic Corpora and Proces-sing Tools,with a Shared Task on Offffensive Language Detection.2020:9-15. [125]WILIE B,VINCENTIO K,WINATA G I.IndoNLU:Bench-mark and Resources for Evaluating Indonesian Natura lLanguage Understanding[C]//Proc.of the 1st Conference for the Asia-Pacifific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing.2020:843-857. [126]SONG K,TAN X,QIN T,et al.MASS:Masked sequence to sequence pre-training for language generation[C]//Proc.of the Int’l Conference on Machine Learning.2019:5926-5936. [127]LIU Y H,GU JT,GOYAL N,et al.Multilingual denoising pre-training for neural machine translation [J].arXiv:2001.08210,2020. [128]CHI Z W,DONG L,WEI F R.Cross-lingual natural languagegeneration via pre-training[C]//Proc.of AAAI.2020:7570-7577. [129]SUN Z Q,YU H K.MobileBERT:a Compact Task-AgnosticBERT for Resource-Limited Devices[C]//Proc.of the 58th Annual Meeting of the Association for Computational Linguistics.2020:2158-2170. [130]LIU Z Y,SUN M S,LIN Y K,et al.Knowledge representation learning:A review[J].Journal of Computer Research and Deve-lopment,2016,53(2):247-261. [131]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:acollaboratively created graph database for structuring human knowledge[C]//Proc.of the ACM SIGMOD International Conference on Management of Data.2008:1247-1250. [132]MILLERG A.WordNet:alexical database for English[J].Communications of the ACM.1995,38:483. [133]MITCHELL T,COHEN W,HRUSCHKA E,et al.Never Ending Language Learning[C]//Proc.of the Conference on Artifificial Intelligence.20151:103-115. [134]LIANG X B,REN F L,LIU Y K,et al.N-reader:Machine rea-ding comprehension based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):130-137 [135]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[4] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[5] | 常炳国, 石华龙, 常雨馨. 基于深度学习的黑色素瘤智能诊断多模型算法 Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning 计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197 |
[6] | 郝强, 李杰, 张曼, 王路. 基于改进YOLOv3的空间非合作目标部件识别算法 Spatial Non-cooperative Target Components Recognition Algorithm Based on Improved YOLOv3 计算机科学, 2022, 49(6A): 358-362. https://doi.org/10.11896/jsjkx.210700048 |
[7] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[8] | 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123 |
[9] | 李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019 |
[10] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[11] | 赵亮, 张洁, 陈志奎. 基于双图正则化的自适应多模态鲁棒特征学习 Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization 计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078 |
[12] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[13] | 徐晖, 王中卿, 李寿山, 张民. 结合情感信息的个性化对话生成 Personalized Dialogue Generation Integrating Sentimental Information 计算机科学, 2022, 49(11A): 211100019-6. https://doi.org/10.11896/jsjkx.211100019 |
[14] | 于娟, 张晨. 基于Kernel-XGBoost的跨语言术语对齐方法 Cross-lingual Term Alignment with Kernel-XGBoost 计算机科学, 2022, 49(11A): 211000111-6. https://doi.org/10.11896/jsjkx.211000111 |
[15] | 黄玉娇, 詹李超, 范兴刚, 肖杰, 龙海霞. 基于知识蒸馏模型ELECTRA-base-BiLSTM的文本分类 Text Classification Based on Knowledge Distillation Model ELECTRA-base-BiLSTM 计算机科学, 2022, 49(11A): 211200181-6. https://doi.org/10.11896/jsjkx.211200181 |
|