计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 176-184.doi: 10.11896/jsjkx.220800223
孙凯丽1,2, 罗旭东1,2, 罗有容3
SUN Kaili1,2, LUO Xudong 1,2, Michael Y.LUO3
摘要: 近年来,预训练语言模型发展迅速,将自然语言处理推到了一个全新的发展阶段。文中的综述旨在帮助研究人员了解强大的预训练语言模型在何处以及如何应用于自然语言处理。具体来讲,首先简要回顾了典型的预训练模型,包括单语言预训练模型、多语言预训练模型以及中文预训练模型;然后讨论了这些预训练模型对5个不同的自然语言处理任务的贡献,即信息提取、情感分析、问答系统、文本摘要和机器翻译;最后讨论了预训练模型的应用所面临的一些挑战。
中图分类号:
[1]QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:A survey [J].Science China Technological Sciences,2020,63(10):1872-1897. [2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [C]//Advances in Neural Information Processing Systems.2017. [3]SHERSTINSKY A.Fundamentals of recurrent neural network(RNN) and long short-term memory(LSTM) network [J/OL].Physica D:Nonlinear Phenomena,2020,404:132306.https://doi.org/10.1016/j.physd.2019.132306. [4]KENTON,DEVLIN J,CHANG M W,et al.BERT:Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of NAACL-HLT.2019:4171-4186. [5]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf. [6]ERHAN D,COURVILLE A,BENGIO Y,et al.Why does unsupervised pre-training help deep learning? [C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:201-208. [7]QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:A survey [J].Science China Technological Sciences,2020,63(10):1872-1897. [8]ZAIB M,SHENG Q Z,EMMA ZHANG W.A short survey of pre-trained language models for conversational ai-a new age in NLP [C]//Proceedings of the Australasian Computer Science Week Multiconference.2020:1-4. [9]LUO X,YIN S,LIN P.A survey ofcross-lingual sen-timent analysis based on pretrained models[C]//Proceedings of the 21st International Conference on Electronic Business.2021:23-33. [10]KALYAN K S,RAJASEKHARAN A,SANGEETHA S.AMMU:A survey of transformer-based biomedical pretrained language models [J].Journal of Biomedical Informatics,2022,2(126):103982. [11]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of Language Representations [C]//International Conference on Learning Representations.2019. [12]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators [J].arXiv:2003.10555,2020. [13]YANG Z,DAI Z,YANG Y,et al.XLNet:Generalized autoregressive pretraining for language understanding [C]//Advances in Neural Information Processing Systems.2019. [14]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf. [15]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners [J].OpenAI Blog,2019,1(8):9. [16]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners [C]//Advances in Neural Information Processing Systems 33(NeurIPS 2020).2020. [17]ZHANG Z,HAN X,LIU Z,et al.ERNIE:Enhanced Language Representation with Informative Entities [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1441-1451. [18]JIAO X,YIN Y,SHANG L,et al.TinyBERT:Distilling BERT for natural language understanding [C]//Findings of the Association for Computational Linguistics:EMNLP 2020.2020:4163-4174. [19]CUI Y,CHE W,LIU T,et al.Pre-training with whole wordmasking for Chinese BERT [J].Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514. [20]CONNEAU A,LAMPLE G.Cross-lingual language model pretraining [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:7059-7069. [21]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:8440-8451. [22]LI L H,YATSKAR M,YIN D,et al.VisualBERT:A simple and performant baseline for vision and language [J].arXiv:1908.03557. [23]HOU Y T,ABULIZI A,ABUDUKELIMU H.Advances in Chinese Pre-training Models [J].Computer Science,2022,49(7):148-163. [24]GRISHMAN R.Information extraction [J].IEEE IntelligentSystems,2015,30(5):8-15. [25]CAI Q.Research on Chinese naming recognition model based on BERT embedding [C]//2019 IEEE 10th International Confe-rence on Software Engineering and Service Science.2019:1-4. [26]CHEN D,SONG H Z,ZHANG J,et al.Entity recognition fusing BERT and memory networks [J].Computer Science,2021,48(10):91-97. [27]ZHANG X,LUO X.A Machine-reading-comprehension method for named entity recognition in legal documents [C]//The 29th International Conference on Neural Information Processing.2022. [28]MENG F,YANG S,WANG J,et al.Creating knowledge graph of electric power equipment faults based on BERT-BiLSTM-CRF model [J].Journal of Electrical Engineering & Technology,2022,17:2507-2516. [29]MOON T,AWASTHY P,NI J,et al.Towards lingua francanamed entity recognition with BERT [J].arXiv:1912.01389,2019. [30]CHEN S,PEI Y,KE Z,et al.Low-resource named entity recognition via the pre-training model [J].Symmetry,2021,13(5):786. [31]BOUDJELLAL N,ZHANG H,KHAN A,et al.ABioNER:ABERT-based model for Arabic biomedical named-entity recognition [J/OL].Complexity.https://doi.org/10.1155/2021/6633213. [32]AGRAWAL A,TRIPATHI S,VARDHAN M,et al.BERT-based transfer-learning approach for nested named-entity recognition using joint labeling [J].Applied Sciences,2022,12(3):976. [33]HAN X,WANG L.A novel document-level relation extraction method based on BERT and entity information [J].IEEE Access,2020,8:96912-96919. [34]QIAO B,ZOU Z,HUANG Y,et al.A joint model for entity and relation extraction based on BERT [J].Neural Computing and Applications,2022,34(5):3471-3481. [35]XU S,SUN S,ZHANG Z,et al.BERT gated multi-window attention network for relation extraction [J].Neurocomputing,2022,492:516-529. [36]BAKSHI R K,KAUR N,KAUR R,et al.Opinion mining and sentiment analysis [C]//2016 3rd International Conference on Computing for Sustainable Global Development.2016:452-455. [37]MORAES R,VALIATI J F,NETO W P G O.Document-level sentiment classification:An empirical comparison between SVM and ANN [J].Expert Systems with Applications,2013,40(2):621-633. [38]KONG J,WANG J,ZHANG X.Hierarchical BERT with anadaptive fine-tuning strategy for document classification [J/OL].Knowledge-Based Systems,2022,238:107872.https://doi.org/10.1016/j.knosys.2021.107872. [39]LIAO W,ZENG B,YIN X,et al.An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa [J].Applied Intelligence,2021,51(6):3522-3533. [40]CHEN S,KONG B.cs@ DravidianLangTech-EACL2021:Of-fensive language identification based on multilingual BERT model [C]//Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages.2021:230-235. [41]JAYANTHY S M,GUPTA A.Sj_aj@ dravidianlangtech-eacl2021:Task-adaptive pre-training of multilingual BERT models for offensive language identification [C]//Proceedings of th4e First Workshop on Speech and Language Technologies.2021:307-312. [42]XIA M,ZHENG G,MUKHERJEE S,et al.MetaXL:Meta representation transformation for low-resource cross-lingual lear-ning [C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:499-511. [43]ARACI D.FinBERT:Financial sentiment analysis with pre-trained language models [J].arXiv:1908.10063,2019. [44]CAO Z,ZHOU Y,YANG A,et al.Deep transfer learning mechanism for fine-grained cross-domain sentiment classification [J].Connection Science,2021,33(4):911-928. [45]SHARMA Y,GUPTA S.Deep learning approaches for question answering system [J].Procedia Computer Science,2018,132:785-794. [46]ONIANI D,WANG Y.A qualitative evaluation of languagemodels on automatic question-answering for COVID-19 [C]//Proceedings of the 11th ACM International Conference on Bioinformatics,Computational Biology and Health Informatics.2020:1-9. [47]WANG C,LUO X.A Legalquestion answering system based on BERT [C]//2021 5th International Conference on Computer Science and Artificial Intelligence.2021:278-283. [48]ZHOU S,ZHANG Y.DATLMedQA:a data augmentation and transfer learning based solution for medical question answering [J].Applied Sciences,2021,11(23):11251. [49]CHAU C N,NGUYEN T S,NGUYEN L M.VNLawBERT:A Vietnamese legal answer selection approach using BERT language model [C]//2020 7th NAFOSTED Conference on Information and Computer Science.2020:298-301. [50]ZHU J,LUO X,WU J.A BERT-Based Two-Stage RankingMethod for Legal Case Retrieval [C]//International Conference on Knowledge Science,Engineering and Management.Cham:Springer,2022:534-546. [51]ZHU H,TIWARI P,GHONEIM A,et al.A collaborativeAI-enabled pretrained language model for AIoT domain question answering [J].Transactions on Industrial Informatics,2021,18(5):3387-3396. [52]WANG Z,NG P,MA X,et al.Multi-passage BERT:A globally normalized BERT model for open-domain question answering [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language.2019:5878-5882. [53]SHONIBARE O.ASBERT:Siamese andtriplet network embedding for open question answering [J].arXiv:2104.08558,2021. [54]YI S W.Research onknowledge graph question answering based on improved BERT[J].Computer Science and Applications,2020,10(12):2361-2370. [55]YANG Z,GARCIA N,CHU C,et al.A comparative study of language transformers for video question answering [J].Neurocomputing,2021,445:121-133. [56]LI Y,SHEN W,GAO J,et al.Community question answering entity linking via leveraging auxiliary data [J].arXiv:2205.11917,2022. [57]KIEUVONGNGAM V,TAN B,NIU Y.Automatic text summarization of COVID-19 medical research articles using BERT and GPT-2 [J].arXiv:2006.01997,2020. [58]LIU J,WU J,LUO X.Chinese Judicialsummarising based on short sentence extraction and GPT-2 [C]//Knowledge Science,Engineering and Management(KSEM 2021).Cham:Springer,2021:376-393. [59]YOON J,JUNAID M,ALI S,et al.Abstractive summarization of Korean legal cases using pre-trained language models [C]//2022 16th International Conference on Ubiquitous Information Management and Communication.2022:1-7. [60]ZHOU W,WANG Z,WEI B.Abstractive automatic summarizing model for legal judgment documents [J].Computer Science,2021,48(12):331-336. [61]FARAHANI M,GHARACHORLOO M,MANTHOURI M.Leveraging ParsBERT and pretrained mT5 for Persian abstractive text summarization [C]//2021 26th International Computer Conference.Computer Society of Iran,2017:1-6. [62]BAHDANAU D,CHU K,BENGIO Y.Neural machine translation by jointly learning to align and translate [C]//Proceedings of the 3rd International Conference on Learning Representations.2015:1-15. [63]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural ma-chine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016. [64]DABRE R,CHU C,KUNCHUKUTTAN A.A survey of multilingual neural machine translation[J].ACM Computing Surveys(CSUR),2020,53(5):1-38. [65]WENG R,YU H,HUANG S,et al.Acquiring knowledge from pre-trained model to neural machine translation [C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:9266-9273. [66]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation [J].Neurocomputing,2021,460:84-94. [67]SHAVARANI H S,SARKAR A.Better neural machine translation by extracting linguistic information from BERT [C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:2772-2783. [68]BRISKILAL J,SUBALALITHA C N.An ensemble model forclassifying idioms and literal texts using BERT and RoBERTa [J].Information Processing & Management,2022,59(1):102756. [69]CHEN G,MA S,CHEN Y,et al.Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders [C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:15-26. [70]WANG Q,LI M X,WU S X,et al.A neural machine translation approach based on the cross-language pre-trained language mo-del XLM-R [J].Journal of Peking University:Natural Sciences Edition,2022,58(1):29-36. [71]ÜSTÜN A,BERARD A,BESACIER L,et al.Multilingual unsupervised neural machine translation with denoising adapters [C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:6650-6662. [72]MA S,DONG L,HUANGS,et al.DeltaLM:Encoder-decoderpre-training for language generation and translation by augmenting pretrained multilingual encoders [J].arXiv:2106.13736,2021. |
[1] | 蔡肖, 陈志华, 盛斌. 基于移位窗口金字塔Transformer的遥感图像目标检测 SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing 计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208 |
[2] | 王斌, 梁宇栋, 刘哲, 张超, 李德玉. 亮度自调节的无监督图像去雾与低光图像增强算法研究 Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment 计算机科学, 2023, 50(1): 123-130. https://doi.org/10.11896/jsjkx.211100058 |
[3] | 李雪辉, 张拥军, 史殿习, 徐化池, 史燕燕. 融合注意力特征的无锚框视觉目标跟踪方法 AFTM:Anchor-free Object Tracking Method with Attention Features 计算机科学, 2023, 50(1): 138-146. https://doi.org/10.11896/jsjkx.211000083 |
[4] | 蒲金垚, 卜令梅, 卢永美, 叶子铭, 陈黎, 于中华. 利用异构图神经网络实现情绪-原因对的有效抽取 Utilizing Heterogeneous Graph Neural Network to Extract Emotion-Cause Pairs Effectively 计算机科学, 2023, 50(1): 205-212. https://doi.org/10.11896/jsjkx.211100265 |
[5] | 郑诚, 梅亮, 赵伊研, 张苏航. 基于双向注意力机制和门控图卷积网络的文本分类方法 Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks 计算机科学, 2023, 50(1): 221-228. https://doi.org/10.11896/jsjkx.211100095 |
[6] | 李小玲, 吴昊天, 周涛, 鲁辉. 一种基于强化学习的口令猜解模型 Password Guessing Model Based on Reinforcement Learning 计算机科学, 2023, 50(1): 334-341. https://doi.org/10.11896/jsjkx.211100001 |
[7] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[8] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[9] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[10] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[11] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[12] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[13] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[14] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[15] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
|