计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230700112-8.doi: 10.11896/jsjkx.230700112

• 人工智能 • 上一篇    下一篇

基于预训练语言模型的机器翻译最新进展

杨滨瑕, 罗旭东, 孙凯丽   

  1. 广西师范大学计算机科学与工程学院/软件学院 广西 桂林 541004
    教育区块链与智能技术教育部重点实验室 广西 桂林 541004
    广西多源信息挖掘与安全重点实验室 广西 桂林 541004
  • 发布日期:2024-06-06
  • 通讯作者: 罗旭东(luoxd@mailbox.gxnu.edu.cn)
  • 作者简介:(binxiay@stu.gxnu.edu.cn)
  • 基金资助:
    广西多源信息挖掘与安全重点实验室系统性研究课题基金项目(22-A-01-02)

Recent Progress on Machine Translation Based on Pre-trained Language Models

YANG Binxia, LUO Xudong, SUN Kaili   

  1. School of Computer Science and Engineering & School of Software,Guangxi Normal University,Guilin,Guangxi 541004,China
    Key Laboratory of Blockchain and Intelligent Technology in Education,Ministry of Education,Guilin,Guangxi 541004,China
    Guangxi Key Lab of Multi-source Information Mining & Security,Guilin,Guangxi 541004,China
  • Published:2024-06-06
  • About author:YANG Binxia,born in 1999,postgra-duate.Her main research interest is natural language processing(NLP).
    LUO Xudong,born in 1963,Ph.D,distinguished professor,Ph.D supervisor.His main research interests include na-tural language processing,intelligent decision-making,game theory,automated negotiation and fuzzy logic.
  • Supported by:
    Guangxi Key Lab of Multi-source Information Mining & Security(22-A-01-02).

摘要: 自然语言处理涉及许多重要主题,其中之一是机器翻译。预训练语言模型,如BERT和GPT,是用于处理包括机器翻译在内的各种自然语言处理任务的先进方法。因此,许多研究人员使用预训练语言模型来解决机器翻译问题。为推动研究向前发展,首先概述了这一领域的最新进展,包括主要的研究问题和基于各种预训练语言模型的解决方案;其次比较了这些解决方案的动机、共性、差异和局限性;然后总结了训练这类机器翻译模型常用的数据集,以及评估这些模型的指标;最后讨论了进一步的研究方向。

关键词: 自然语言处理, 机器翻译, 预训练语言模型, BERT, GPT

Abstract: Natural language processing(NLP) involves many important topics,one of which is machine translation(MT).Pre-trained language models(PLMs),such as BERT and GPT,are state-of-the-art approaches for various NLP tasks including MT.Therefore,many researchers use PLMs to solve MT problems.To push the research forward,this paper provides an overview of recent advances in this field,including the main research questions and solutions based on various PLMs.We compare the motivations,commonalities,differences and limitations of these solutions,and summarise the datasets commonly used to train such MT models,as well as the metrics used to evaluate them.Finally,further research directions are discussed.

Key words: Natural language processing, Machine translation, Pre-trained language model, BERT, GPT

中图分类号: 

  • TP391
[1]BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations(ICLR).2015.
[2]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation[J].Neurocomputing,2021,460:84-94.
[3]SUN K,LUO X,LUO M Y.A survey of pretrained language models[C]//International Conference on Knowledge Science,Engineering and Management.Cham:Springer International Publishing,2022:442-456.
[4]RIVERA-TRIGUEROS I.Machine translation systems andquality assessment:a systematic review[J].Language Resources and Evaluation,2022,56(2):593-619.
[5]RANATHUNGA S,LEE E S A,SKENDULI M P,et al.Neural machine translation for low-resource languages:A survey[J].ACM Computing Surveys,2023,55(11):1-37.
[6]KENTON,DEVLIN J,CHANG M W,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[7]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Impro-ving language understanding by generative pre-training[EB/OL].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[8]CONNEAU A,LAMPLE G.Cross-lingual language model pretraining[C]//Advances in Neural Information Processing Systems.2019,32.
[9]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[10]YANG J,WANG M,ZHOU H,et al.Towards making the most of BERT in neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:9378-9385.
[11]ZHU J,XIA Y,WU L,et al.Incorporating bert into neural machine translation[C]//8rd International Conference on Learning Representations(ICLR 2020).2020.
[12]ZHANG J R,LI H Z,SHI S M,et al.Dynamic Attention Aggre-gation with BERT for Neural Machine Translation[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8.
[13]SHAVARANI H S,SARKAR A.Better Neural Machine Translation by Extracting Linguistic Information from BERT[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:2772-2783.
[14]GUO J,ZHANG Z,XU L,et al.Adaptive adapters:An efficient way to incorporate BERT into neural machine translation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1740-1751.
[15]TRAN K.From English to foreign languages:Transferring pre-trained language models[EB/OL].https://arxiv.org/abs/2002.07306.
[16]MIYAZAKI T,MORITA Y,SANO M.Machine translationfrom spoken language to Sign language using pre-trained language model as encoder[C]//Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages:Sign Language Resources in the Service of the Language Community,Technological Challenges and Application Perspectives.2020:139-144.
[17]ÜSTÜN A,BÉRARD A,BESACIER L,et al.Multilingual Unsupervised Neural Machine Translation with Denoising Adapters[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:6650-6662.
[18]BRISKILAL J,SUBALALITHA C N.An ensemble model forclassifying idioms and literal texts using BERT and RoBERTa[J].Information Processing & Management,2022,59(1):102756.
[19]LIU J,THOMA S.GERMAN TO ENGLISH:Fake News De-tection with Machine Translation[J].Lecture Notes in Informatics,Gesellschaft für Informatik,2022,3457:1-8.
[20]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation[J].Neurocomputing,2021,460:84-94.
[21]HAN J M,BABUSCHKIN I,EDWARDS H,et al.Unsupervised neural machine translation with generative language models only[EB/OL].https://arxiv.org/abs/2110.05448.
[22]TAN Z,ZHANG X,WANG S,et al.MSP:Multi-Stage Promp-ting for Making Pre-trained Language Models Better Translators[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:6131-6142.
[23]WENG R,YU H,HUANG S,et al.Acquiring knowledge from pre-trained model to neural machine translation[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:9266-9273.
[24]ZHANG B,NAGESH A,KNIGHT K.Parallel Corpus Filtering via Pre-trained Language Models[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:8545-8554.
[25]SAWAI R,PAIK I,KUWANA A.Sentence Augmentation forLanguage Translation Using GPT-2[J].Electronics,2021,10(24):3082.
[26]RUBINO R,SUMITA E.Intermediate self-supervised learningfor machine translation quality estimation[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:4355-4360.
[27]LI Z,ZHAO H,WANG R,et al.SJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task[C]//Proceedings of the Fifth Conference on Machine Translation.2020:218-229.
[28]CHEN G,MA S,CHEN Y,et al.Zero-Shot Cross-LingualTransfer of Neural Machine Translation with Multilingual Pretrained Encoders[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:15-26.
[29]MA S,DONG L,HUANG S,et al.Deltalm:Encoder-decoder pre-training for language generation and translation by augmenting pretrained multilingual encoders[EB/OL].https://arxiv.org/abs/2106.13736.
[30]SUN X,GE T,MA S,et al.A unified strategy for multilingual grammatical error correction with pre-trained cross-lingual language mode[EB/OL].https://arxiv.org/abs/2201.10707.
[31]LIU Y,GU J,GOYAL N,et al.Multilingual denoising pre-trai-ning for neural machine translation[J].Transactions of the Association for Computational Linguistics,2020,8:726-742.
[32]WANG X,TU Z,SHI S.Tencent AI lab machine translation systems for the WMT21 biomedical translation task[C]//Proceedings of the Sixth Conference on Machine Translation.2021:874-878.
[33]DABRE R,SHROTRIYA H,KUNCHUKUTTAN A,et al.IndicBART:A Pre-trained Model for Indic Natural Language Ge-neration[C]//Findings of the Association for Computational Linguistics:ACL 2022.2022:1849-1863.
[34]RIPPETH E,AGRAWAL S,CARPUAT M.Controlling Translation Formality Using Pre-trained Multilingual Language Mo-dels[C]//Proceedings of the 19th International Conference on Spoken Language Translation(IWSLT 2022).2022:327-340.
[35]LOIC B,MAGDALENA B,ONDREJ B,et al.Findings of the 2020 conference on machine translation[C]//Proceedings of the Fifth Conference on Machine Translation.2020:1-55.
[36]KOEHN P.Europarl:A parallel corpus for statistical machinetranslation[C]//Proceedings of machine translation summit x:papers.2005:79-86.
[37]ZIEMSKI M,JUNCZYS-DOWMUNT M,POULIQUEN B.The united nations parallel corpus v1.0[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC’16).2016:3530-3534.
[38]LISON P,TIEDEMANN J.Open Subtitles 2016:extractinglarge parallel corpora from movie and TV subtitles[C]//10th Conference on International Language Resources and Evaluation(LREC’16).European Language Resources Association,2016:923-929.
[39]BAÑÓN M,CHEN P,HADDOW B,et al.ParaCrawl:Web-scaleacquisition of parallel corpora[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:4555-4567.
[40]VASWANI A,BENGIO S,BREVDO E,et al.Tensor2Tensorfor Neural Machine Translation[C]//Proceedings of the 13th Conference of the Association for Machine Translation in the Americas(Volume 1:Research Track).2018:193-199.
[41]EISELE A,CHEN Y.MultiUN:A multilingual corpus fromunited nation documents[C]//LREC.2010.
[42]KWON S,GO B H,LEE J H.A text-based visual context modulation neural model for multimodal machine translation[J].Pattern Recognition Letters,2020,136:212-218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!