Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230700112-8.doi: 10.11896/jsjkx.230700112

• Artificial Intelligenc • Previous Articles     Next Articles

Recent Progress on Machine Translation Based on Pre-trained Language Models

YANG Binxia, LUO Xudong, SUN Kaili   

  1. School of Computer Science and Engineering & School of Software,Guangxi Normal University,Guilin,Guangxi 541004,China
    Key Laboratory of Blockchain and Intelligent Technology in Education,Ministry of Education,Guilin,Guangxi 541004,China
    Guangxi Key Lab of Multi-source Information Mining & Security,Guilin,Guangxi 541004,China
  • Published:2024-06-06
  • About author:YANG Binxia,born in 1999,postgra-duate.Her main research interest is natural language processing(NLP).
    LUO Xudong,born in 1963,Ph.D,distinguished professor,Ph.D supervisor.His main research interests include na-tural language processing,intelligent decision-making,game theory,automated negotiation and fuzzy logic.
  • Supported by:
    Guangxi Key Lab of Multi-source Information Mining & Security(22-A-01-02).

Abstract: Natural language processing(NLP) involves many important topics,one of which is machine translation(MT).Pre-trained language models(PLMs),such as BERT and GPT,are state-of-the-art approaches for various NLP tasks including MT.Therefore,many researchers use PLMs to solve MT problems.To push the research forward,this paper provides an overview of recent advances in this field,including the main research questions and solutions based on various PLMs.We compare the motivations,commonalities,differences and limitations of these solutions,and summarise the datasets commonly used to train such MT models,as well as the metrics used to evaluate them.Finally,further research directions are discussed.

Key words: Natural language processing, Machine translation, Pre-trained language model, BERT, GPT

CLC Number: 

  • TP391
[1]BAHDANAU D,CHO K H,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations(ICLR).2015.
[2]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation[J].Neurocomputing,2021,460:84-94.
[3]SUN K,LUO X,LUO M Y.A survey of pretrained language models[C]//International Conference on Knowledge Science,Engineering and Management.Cham:Springer International Publishing,2022:442-456.
[4]RIVERA-TRIGUEROS I.Machine translation systems andquality assessment:a systematic review[J].Language Resources and Evaluation,2022,56(2):593-619.
[5]RANATHUNGA S,LEE E S A,SKENDULI M P,et al.Neural machine translation for low-resource languages:A survey[J].ACM Computing Surveys,2023,55(11):1-37.
[6]KENTON,DEVLIN J,CHANG M W,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[7]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Impro-ving language understanding by generative pre-training[EB/OL].https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[8]CONNEAU A,LAMPLE G.Cross-lingual language model pretraining[C]//Advances in Neural Information Processing Systems.2019,32.
[9]LEWIS M,LIU Y,GOYAL N,et al.BART:Denoising Se-quence-to-Sequence Pre-training for Natural Language Generation,Translation,and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:7871-7880.
[10]YANG J,WANG M,ZHOU H,et al.Towards making the most of BERT in neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:9378-9385.
[11]ZHU J,XIA Y,WU L,et al.Incorporating bert into neural machine translation[C]//8rd International Conference on Learning Representations(ICLR 2020).2020.
[12]ZHANG J R,LI H Z,SHI S M,et al.Dynamic Attention Aggre-gation with BERT for Neural Machine Translation[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8.
[13]SHAVARANI H S,SARKAR A.Better Neural Machine Translation by Extracting Linguistic Information from BERT[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:2772-2783.
[14]GUO J,ZHANG Z,XU L,et al.Adaptive adapters:An efficient way to incorporate BERT into neural machine translation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1740-1751.
[15]TRAN K.From English to foreign languages:Transferring pre-trained language models[EB/OL].https://arxiv.org/abs/2002.07306.
[16]MIYAZAKI T,MORITA Y,SANO M.Machine translationfrom spoken language to Sign language using pre-trained language model as encoder[C]//Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages:Sign Language Resources in the Service of the Language Community,Technological Challenges and Application Perspectives.2020:139-144.
[17]ÜSTÜN A,BÉRARD A,BESACIER L,et al.Multilingual Unsupervised Neural Machine Translation with Denoising Adapters[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:6650-6662.
[18]BRISKILAL J,SUBALALITHA C N.An ensemble model forclassifying idioms and literal texts using BERT and RoBERTa[J].Information Processing & Management,2022,59(1):102756.
[19]LIU J,THOMA S.GERMAN TO ENGLISH:Fake News De-tection with Machine Translation[J].Lecture Notes in Informatics,Gesellschaft für Informatik,2022,3457:1-8.
[20]ZHANG Z,WU S,JIANG D,et al.BERT-JAM:Maximizing the utilization of BERT for neural machine translation[J].Neurocomputing,2021,460:84-94.
[21]HAN J M,BABUSCHKIN I,EDWARDS H,et al.Unsupervised neural machine translation with generative language models only[EB/OL].https://arxiv.org/abs/2110.05448.
[22]TAN Z,ZHANG X,WANG S,et al.MSP:Multi-Stage Promp-ting for Making Pre-trained Language Models Better Translators[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:6131-6142.
[23]WENG R,YU H,HUANG S,et al.Acquiring knowledge from pre-trained model to neural machine translation[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:9266-9273.
[24]ZHANG B,NAGESH A,KNIGHT K.Parallel Corpus Filtering via Pre-trained Language Models[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:8545-8554.
[25]SAWAI R,PAIK I,KUWANA A.Sentence Augmentation forLanguage Translation Using GPT-2[J].Electronics,2021,10(24):3082.
[26]RUBINO R,SUMITA E.Intermediate self-supervised learningfor machine translation quality estimation[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:4355-4360.
[27]LI Z,ZHAO H,WANG R,et al.SJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task[C]//Proceedings of the Fifth Conference on Machine Translation.2020:218-229.
[28]CHEN G,MA S,CHEN Y,et al.Zero-Shot Cross-LingualTransfer of Neural Machine Translation with Multilingual Pretrained Encoders[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.2021:15-26.
[29]MA S,DONG L,HUANG S,et al.Deltalm:Encoder-decoder pre-training for language generation and translation by augmenting pretrained multilingual encoders[EB/OL].https://arxiv.org/abs/2106.13736.
[30]SUN X,GE T,MA S,et al.A unified strategy for multilingual grammatical error correction with pre-trained cross-lingual language mode[EB/OL].https://arxiv.org/abs/2201.10707.
[31]LIU Y,GU J,GOYAL N,et al.Multilingual denoising pre-trai-ning for neural machine translation[J].Transactions of the Association for Computational Linguistics,2020,8:726-742.
[32]WANG X,TU Z,SHI S.Tencent AI lab machine translation systems for the WMT21 biomedical translation task[C]//Proceedings of the Sixth Conference on Machine Translation.2021:874-878.
[33]DABRE R,SHROTRIYA H,KUNCHUKUTTAN A,et al.IndicBART:A Pre-trained Model for Indic Natural Language Ge-neration[C]//Findings of the Association for Computational Linguistics:ACL 2022.2022:1849-1863.
[34]RIPPETH E,AGRAWAL S,CARPUAT M.Controlling Translation Formality Using Pre-trained Multilingual Language Mo-dels[C]//Proceedings of the 19th International Conference on Spoken Language Translation(IWSLT 2022).2022:327-340.
[35]LOIC B,MAGDALENA B,ONDREJ B,et al.Findings of the 2020 conference on machine translation[C]//Proceedings of the Fifth Conference on Machine Translation.2020:1-55.
[36]KOEHN P.Europarl:A parallel corpus for statistical machinetranslation[C]//Proceedings of machine translation summit x:papers.2005:79-86.
[37]ZIEMSKI M,JUNCZYS-DOWMUNT M,POULIQUEN B.The united nations parallel corpus v1.0[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC’16).2016:3530-3534.
[38]LISON P,TIEDEMANN J.Open Subtitles 2016:extractinglarge parallel corpora from movie and TV subtitles[C]//10th Conference on International Language Resources and Evaluation(LREC’16).European Language Resources Association,2016:923-929.
[39]BAÑÓN M,CHEN P,HADDOW B,et al.ParaCrawl:Web-scaleacquisition of parallel corpora[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:4555-4567.
[40]VASWANI A,BENGIO S,BREVDO E,et al.Tensor2Tensorfor Neural Machine Translation[C]//Proceedings of the 13th Conference of the Association for Machine Translation in the Americas(Volume 1:Research Track).2018:193-199.
[41]EISELE A,CHEN Y.MultiUN:A multilingual corpus fromunited nation documents[C]//LREC.2010.
[42]KWON S,GO B H,LEE J H.A text-based visual context modulation neural model for multimodal machine translation[J].Pattern Recognition Letters,2020,136:212-218.
[1] WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[2] LI Minzhe, YIN Jibin. TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement [J]. Computer Science, 2024, 51(6A): 230900030-6.
[3] JIANG Haoda, ZHAO Chunlei, CHEN Han, WANG Chundong. Construction Method of Domain Sentiment Lexicon Based on Improved TF-IDF and BERT [J]. Computer Science, 2024, 51(6A): 230800011-9.
[4] YANG Junzhe, SONG Ying, CHEN Yifei. Text Emotional Analysis Model Fusing Theme Characteristics [J]. Computer Science, 2024, 51(6A): 230600111-8.
[5] PENG Bo, LI Yaodong, GONG Xianfu, LI Hao. Method for Entity Relation Extraction Based on Heterogeneous Graph Neural Networks and TextSemantic Enhancement [J]. Computer Science, 2024, 51(6A): 230700071-5.
[6] MENG Xiangfu, REN Quanying, YANG Dongshen, LI Keqian, YAO Keyu, ZHU Yan. Literature Classification of Individual Reports of Adverse Drug Reactions Based on BERT and CNN [J]. Computer Science, 2024, 51(6A): 230400049-6.
[7] LI Bin, WANG Haochang. Implementation and Application of Chinese Grammatical Error Diagnosis System Based on CRF [J]. Computer Science, 2024, 51(6A): 230900073-6.
[8] CHEN Bingting, ZOU Weiqin, CAI Biyu, LIU Wenjie. Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge [J]. Computer Science, 2024, 51(6A): 230400068-7.
[9] XU Yiran, ZHOU Yu. Prompt Learning Based Parameter-efficient Code Generation [J]. Computer Science, 2024, 51(6): 61-67.
[10] CHEN Haoyang, ZHANG Lei. Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa [J]. Computer Science, 2024, 51(5): 250-257.
[11] YAN Yintong, YU Lu, WANG Taiyan, LI Yuwei, PAN Zulie. Study on Binary Code Similarity Detection Based on Jump-SBERT [J]. Computer Science, 2024, 51(5): 355-362.
[12] ZHANG Mingdao, ZHOU Xin, WU Xiaohong, QING Linbo, HE Xiaohai. Unified Fake News Detection Based on Semantic Expansion and HDGCN [J]. Computer Science, 2024, 51(4): 299-306.
[13] TU Xin, ZHANG Wei, LI Jidong, LI Meijiao , LONG Xiangbo. Study on Automatic Classification of English Tense Exercises for Intelligent Online Teaching [J]. Computer Science, 2024, 51(4): 353-358.
[14] ZHENG Cheng, SHI Jingwei, WEI Suhua, CHENG Jiaming. Dual Feature Adaptive Fusion Network Based on Dependency Type Pruning for Aspect-basedSentiment Analysis [J]. Computer Science, 2024, 51(3): 205-213.
[15] MAO Xin, LEI Zhanyao, QI Zhengwei. Automated Kaomoji Extraction Based on Large-scale Danmaku Texts [J]. Computer Science, 2024, 51(1): 284-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!