预训练语言模型的应用综述

doi:10.11896/jsjkx.220800223

Abstract

Abstract: In recent years,pretrained language models have developed rapidly,pushing natural language processing into a whole new stage of development.To help researchers understand where and how the powerful pretrained language models can be applied in natural language processing,this paper surveys the state-of-the-art of its application.Specifically,we first briefly review typical pretrained language models,including monolingual,multilingual and Chinese pretrained models.Then,we discuss these pretrained language models' contributions to five different natural language processing tasks:information extraction,sentiment analysis,question answering,text summarization,and machine translation.Finally,we discuss some challenges faced by the applications of pretrained language models.

Key words: Pretrained language model, Natural language process, Deep learning, Information extraction, Sentiment analysis, Question answering system, Text summarization, Machine translation

CLC Number:

TP391

SUN Kaili, LUO Xudong , Michael Y.LUO. Survey of Applications of Pretrained Language Models[J].Computer Science, 2023, 50(1): 176-184.

References

[1]QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:A survey [J].Science China Technological Sciences,2020,63(10):1872-1897.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [C]//Advances in Neural Information Processing Systems.2017.
[3]SHERSTINSKY A.Fundamentals of recurrent neural network(RNN) and long short-term memory(LSTM) network [J/OL].Physica D:Nonlinear Phenomena,2020,404:132306.https://doi.org/10.1016/j.physd.2019.132306.
[4]KENTON,DEVLIN J,CHANG M W,et al.BERT:Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of NAACL-HLT.2019:4171-4186.
[5]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].https://www.cs.ubc.ca/～amuham01/LING530/papers/radford2018improving.pdf.
[6]ERHAN D,COURVILLE A,BENGIO Y,et al.Why does unsupervised pre-training help deep learning? [C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:201-208.
[7]QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:A survey [J].Science China Technological Sciences,2020,63(10):1872-1897.
[8]ZAIB M,SHENG Q Z,EMMA ZHANG W.A short survey of pre-trained language models for conversational ai-a new age in NLP [C]//Proceedings of the Australasian Computer Science Week Multiconference.2020:1-4.
[9]LUO X,YIN S,LIN P.A survey ofcross-lingual sen-timent analysis based on pretrained models[C]//Proceedings of the 21st International Conference on Electronic Business.2021:23-33.
[10]KALYAN K S,RAJASEKHARAN A,SANGEETHA S.AMMU:A survey of transformer-based biomedical pretrained language models [J].Journal of Biomedical Informatics,2022,2(126):103982.
[11]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A LiteBERT for Self-supervised Learning of Language Representations [C]//International Conference on Learning Representations.2019.
[12]CLARK K,LUONG M T,LE Q V,et al.ELECTRA:Pre-training text encoders as discriminators rather than generators [J].arXiv:2003.10555,2020.
[13]YANG Z,DAI Z,YANG Y,et al.XLNet:Generalized autoregressive pretraining for language understanding [C]//Advances in Neural Information Processing Systems.2019.
[14]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training [EB/OL].https://www.cs.ubc.ca/～amuham01/LING530/papers/radford2018improving.pdf.
[15]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners [J].OpenAI Blog,2019,1(8):9.
[16]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners [C]//Advances in Neural Information Processing Systems 33(NeurIPS 2020).2020.
[17]ZHANG Z,HAN X,LIU Z,et al.ERNIE:Enhanced Language Representation with Informative Entities [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1441-1451.
[18]JIAO X,YIN Y,SHANG L,et al.TinyBERT:Distilling BERT for natural language understanding [C]//Findings of the Association for Computational Linguistics:EMNLP 2020.2020:4163-4174.
[19]CUI Y,CHE W,LIU T,et al.Pre-training with whole wordmasking for Chinese BERT [J].Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[20]CONNEAU A,LAMPLE G.Cross-lingual language model pretraining [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:7059-7069.
[21]CONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised cross-lingual representation learning at scale [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:8440-8451.
[22]LI L H,YATSKAR M,YIN D,et al.VisualBERT:A simple and performant baseline for vision and language [J].arXiv:1908.03557.
[23]HOU Y T,ABULIZI A,ABUDUKELIMU H.Advances in Chinese Pre-training Models [J].Computer Science,2022,49(7):148-163.
[24]GRISHMAN R.Information extraction [J].IEEE IntelligentSystems,2015,30(5):8-15.
[25]CAI Q.Research on Chinese naming recognition model based on BERT embedding [C]//2019 IEEE 10th International Confe-rence on Software Engineering and Service Science.2019:1-4.
[26]CHEN D,SONG H Z,ZHANG J,et al.Entity recognition fusing BERT and memory networks [J].Computer Science,2021,48(10):91-97.
[27]ZHANG X,LUO X.A Machine-reading-comprehension method for named entity recognition in legal documents [C]//The 29th International Conference on Neural Information Processing.2022.
[28]MENG F,YANG S,WANG J,et al.Creating knowledge graph of electric power equipment faults based on BERT-BiLSTM-CRF model [J].Journal of Electrical Engineering & Technology,2022,17:2507-2516.
[29]MOON T,AWASTHY P,NI J,et al.Towards lingua francanamed entity recognition with BERT [J].arXiv:1912.01389,2019.
[30]CHEN S,PEI Y,KE Z,et al.Low-resource named entity recognition via the pre-training model [J].Symmetry,2021,13(5):786.
[31]BOUDJELLAL N,ZHANG H,KHAN A,et al.ABioNER:ABERT-based model for Arabic biomedical named-entity recognition [J/OL].Complexity.https://doi.org/10.1155/2021/6633213.
[32]AGRAWAL A,TRIPATHI S,VARDHAN M,et al.BERT-based transfer-learning approach for nested named-entity recognition using joint labeling [J].Applied Sciences,2022,12(3):976.
[33]HAN X,WANG L.A novel document-level relation extraction method based on BERT and entity information [J].IEEE Access,2020,8:96912-96919.
[34]QIAO B,ZOU Z,HUANG Y,et al.A joint model for entity and relation extraction based on BERT [J].Neural Computing and Applications,2022,34(5):3471-3481.
[35]XU S,SUN S,ZHANG Z,et al.BERT gated multi-window attention network for relation extraction [J].Neurocomputing,2022,492:516-529.
[36]BAKSHI R K,KAUR N,KAUR R,et al.Opinion mining and sentiment analysis [C]//2016 3rd International Conference on Computing for Sustainable Global Development.2016:452-455.
[37]MORAES R,VALIATI J F,NETO W P G O.Document-level sentiment classification:An empirical comparison between SVM and ANN [J].Expert Systems with Applications,2013,40(2):621-633.
[38]KONG J,WANG J,ZHANG X.Hierarchical BERT with anadaptive fine-tuning strategy for document classification [J/OL].Knowledge-Based Systems,2022,238:107872.https://doi.org/10.1016/j.knosys.2021.107872.
[39]LIAO W,ZENG B,YIN X,et al.An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa [J].Applied Intelligence,2021,51(6):3522-3533.
[40]CHEN S,KONG B.cs@ DravidianLangTech-EACL2021:Of-fensive language identification based on multilingual BERT model [C]//Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages.2021:230-235.
[41]JAYANTHY S M,GUPTA A.Sj_aj@ dravidianlangtech-eacl2021:Task-adaptive pre-training of multilingual BERT models for offensive language identification [C]//Proceedings of th4e First Workshop on Speech and Language Technologies.2021:307-312.
[42]XIA M,ZHENG G,MUKHERJEE S,et al.MetaXL:Meta representation transformation for low-resource cross-lingual lear-ning [C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2021:499-511.
[43]ARACI D.FinBERT:Financial sentiment analysis with pre-trained language models [J].arXiv:1908.10063,2019.
[44]CAO Z,ZHOU Y,YANG A,et al.Deep transfer learning mechanism for fine-grained cross-domain sentiment classification [J].Connection Science,2021,33(4):911-928.
[45]SHARMA Y,GUPTA S.Deep learning approaches for question answering system [J].Procedia Computer Science,2018,132:785-794.
[46]ONIANI D,WANG Y.A qualitative evaluation of languagemodels on automatic question-answering for COVID-19 [C]//Proceedings of the 11th ACM International Conference on Bioinformatics,Computational Biology and Health Informatics.2020:1-9.
[47]WANG C,LUO X.A Legalquestion answering system based on BERT [C]//2021 5th International Conference on Computer Science and Artificial Intelligence.2021:278-283.
[48]ZHOU S,ZHANG Y.DATLMedQA:a data augmentation and transfer learning based solution for medical question answering [J].Applied Sciences,2021,11(23):11251.
[49]CHAU C N,NGUYEN T S,NGUYEN L M.VNLawBERT:A Vietnamese legal answer selection approach using BERT language model [C]//2020 7th NAFOSTED Conference on Information and Computer Science.2020:298-301.
[50]ZHU J,LUO X,WU J.A BERT-Based Two-Stage RankingMethod for Legal Case Retrieval [C]//International Conference on Knowledge Science,Engineering and Management.Cham:Springer,2022:534-546.
[51]ZHU H,TIWARI P,GHONEIM A,et al.A collaborativeAI-enabled pretrained language model for AIoT domain question answering [J].Transactions on Industrial Informatics,2021,18(5):3387-3396.
[52]WANG Z,NG P,MA X,et al.Multi-passage BERT:A globally normalized BERT model for open-domain question answering [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language.2019:5878-5882.
[53]SHONIBARE O.ASBERT:Siamese andtriplet network embedding for open question answering [J].arXiv:2104.08558,2021.
[54]YI S W.Research onknowledge graph question answering based on improved BERT[J].Computer Science and Applications,2020,10(12):2361-2370.
[55]YANG Z,GARCIA N,CHU C,et al.A comparative study of language transformers for video question answering [J].Neurocomputing,2021,445:121-133.
[56]LI Y,SHEN W,GAO J,et al.Community question answering entity linking via leveraging auxiliary data [J].arXiv:2205.11917,2022.
[57]KIEUVONGNGAM V,TAN B,NIU Y.Automatic text summarization of COVID-19 medical research articles using BERT and GPT-2 [J].arXiv:2006.01997,2020.
[58]LIU J,WU J,LUO X.Chinese Judicialsummarising based on short sentence extraction and GPT-2 [C]//Knowledge Science,Engineering and Management(KSEM 2021).Cham:Springer,2021:376-393.
[59]YOON J,JUNAID M,ALI S,et al.Abstractive summarization of Korean legal cases using pre-trained language models [C]//2022 16th International Conference on Ubiquitous Information Management and Communication.2022:1-7.
[60]ZHOU W,WANG Z,WEI B.Abstractive automatic summarizing model for legal judgment documents [J].Computer Science,2021,48(12):331-336.
[61]FARAHANI M,GHARACHORLOO M,MANTHOURI M.Leveraging ParsBERT and pretrained mT5 for Persian abstractive text summarization [C]//2021 26th International Computer Conference.Computer Society of Iran,2017:1-6.
[62]BAHDANAU D,CHU K,BENGIO Y.Neural machine translation by jointly learning to align and translate [C]//Proceedings of the 3rd International Conference on Learning Representations.2015:1-15.
[63]WU Y,SCHUSTER M,CHEN Z,et al.Google’s neural ma-chine translation system:Bridging the gap between human and machine translation [J].arXiv:1609.08144,2016.
[64]DABRE R,CHU C,KUNCHUKUTTAN A.A survey of multilingual neural machine translation[J].ACM Computing Surveys(CSUR),2020,53(5):1-38.
[65]WENG R,YU H,HUANG S,et al.Acquiring knowledge from pre-trained model to neural machine translation [C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:9266-9273.
[66]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation [J].Neurocomputing,2021,460:84-94.
[67]SHAVARANI H S,SARKAR A.Better neural machine translation by extracting linguistic information from BERT [C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:2772-2783.
[68]BRISKILAL J,SUBALALITHA C N.An ensemble model forclassifying idioms and literal texts using BERT and RoBERTa [J].Information Processing & Management,2022,59(1):102756.
[69]CHEN G,MA S,CHEN Y,et al.Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders [C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:15-26.
[70]WANG Q,LI M X,WU S X,et al.A neural machine translation approach based on the cross-language pre-trained language mo-del XLM-R [J].Journal of Peking University:Natural Sciences Edition,2022,58(1):29-36.
[71]ÜSTÜN A,BERARD A,BESACIER L,et al.Multilingual unsupervised neural machine translation with denoising adapters [C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:6650-6662.
[72]MA S,DONG L,HUANGS,et al.DeltaLM:Encoder-decoderpre-training for language generation and translation by augmenting pretrained multilingual encoders [J].arXiv:2106.13736,2021.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Survey of Applications of Pretrained Language Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0

[1]	LI Xuehui, ZHANG Yongjun, SHI Dianxi, XU Huachi, SHI Yanyan. AFTM:Anchor-free Object Tracking Method with Attention Features [J]. Computer Science, 2023, 50(1): 138-146.
[2]	PU Jinyao, BU Lingmei, LU Yongmei, YE Ziming, CHEN Li, YU Zhonghua. Utilizing Heterogeneous Graph Neural Network to Extract Emotion-Cause Pairs Effectively [J]. Computer Science, 2023, 50(1): 205-212.
[3]	ZHENG Cheng, MEI Liang, ZHAO Yiyan, ZHANG Suhang. Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks [J]. Computer Science, 2023, 50(1): 221-228.
[4]	LI Xiaoling, WU Haotian, ZHOU Tao, LU Hui. Password Guessing Model Based on Reinforcement Learning [J]. Computer Science, 2023, 50(1): 334-341.
[5]	CAI Xiao, CEHN Zhihua, SHENG Bin. SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing [J]. Computer Science, 2023, 50(1): 105-113.
[6]	WANG Bin, LIANG Yudong, LIU Zhe, ZHANG Chao, LI Deyu. Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment [J]. Computer Science, 2023, 50(1): 123-130.
[7]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[8]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[9]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[10]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[12]	YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[13]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[14]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[15]	SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.