Computer Science ›› 2022, Vol. 49 ›› Issue (1): 24-30.doi: 10.11896/jsjkx.210800254

• Multilingual Computing Advanced Technology • Previous Articles     Next Articles

Similarity-based Curriculum Learning for Multilingual Neural Machine Translation

YU Dong1, XIE Wan-ying1, GU Shu-hao2,3, FENG Yang2,3   

  1. 1 College of Information Sciences,Beijing Language and Culture University,Beijing 100083,China
    2 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    3 University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2021-08-28 Revised:2021-10-18 Online:2022-01-15 Published:2022-01-18
  • About author:YU Dong,born in 1982,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His main research interests include natural language processing and artificial intelligence.
    FENG Yang,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include natural language processing,machine translation and dialogue.
  • Supported by:
    Humanity and Social Science Youth Foundation of Ministry of Education(19YJCZH230) and Research Funds of Beijing Language and Culture University(20YCX138).

Abstract: Multilingual neural machine translation (MNMT) with a single model has drawn more attention due to its capability to deal with multiple languages.However,the current multilingual translation paradigm does not make use of the similar features embodied in different languages,which has already been proven useful for improving the multilingual translation.Besides,the training of multilingual model is usually very time-consuming due to the huge amount of training data.To address these problems,we propose a similarity-based curriculum learning method to improve the overall performance and convergence speed.We propose two hierarchical criteria for measuring the similarity,one is for ranking different languages (inter-language) with singular vector canonical correlation analysis,and the other is for ranking different sentences in a particular language (intra-language) with cosine similarity.At the same time,the paper proposes a curriculum learning strategy that takes the loss of validation set as the curriculum replacement standard.We conduct experiments on balanced and unbalanced IWSLT multilingual data sets and Europarl corpus datasets.The results demonstrate that the proposed method outperforms strong multilingual translation systems and can achieve up to a 64% decrease in training time.

Key words: Curriculum learning, Language ranking, Machine translation, Multilingual, Sentence ranking, Sentence ranking, Similarity evaluation

CLC Number: 

  • TP391
[1]KALCHBRENNER N,BLUNSOM P.Recurrent continuoustranslation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013:1700-1709.
[2]AHARONI R,JOHNSON M,FIRAT O.Massively multilingual neural machine translation[J].arXiv:1903.00089,2019.
[3]ARIVAZHAGAN N,BAPNA A,FIRAT O,et al.2019.Massively Multilingual Neural Machine Translation in the Wild:Findings and Challenges[J].arXiv:1907.05019,2019.
[4]HA T L,NIEHUES J,WAIBEL A.Toward multilingual neural machine translation with universal encoder and decoder[J].ar-Xiv:1611.04798,2016.
[5]XUE Q T,LI J H,GONG Z X.Multi-language unsupervisedneural machine translation[J].Journal of Xiamen University(Natural Science),2020,59(2):192-197.
[6]FIRAT O,CHO K,BENGIO Y.Multi-Way,Multilingual Neural Machine Translation with a Shared Attention Mechanism[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:866-875.
[7]JOHNSON M,SCHUSTER M,LE Q V,et al.Google's multilingual neural machine translation system:Enabling zero-shot translation[J].Transactions of the Association for Computational Linguistics,2017,5:339-351.
[8]TAN X,CHEN J,HE D,et al.Multilingual Neural MachineTranslation with Language Clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:963-973.
[9]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[10]KOCMI T,BOJAR O.Curriculum Learning and MinibatchBucketing in Neural Machine Translation[C]//Proceedings of the International Conference Recent Advances in Natural Language Processing(RANLP 2017).2017:379-386.
[11]LU Y,KEUNG P,LADHAK F,et al.A neural interlingua formultilingual machine translation[C]//Proceedings of the Third Conference on Machine Translation:Research Papers.2018:84-92.
[12]GU J,HASSAN H,DEVLIN J,et al.Universal Neural MachineTranslation for Extremely Low Resource Languages[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long Papers).2018:344-354.
[13]DONG D,WU H,HE W,et al.Multi-task learning for multiple language translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers).2015:1723-1732.
[14]WANG Y,ZHOU L,ZHANG J,et al.A compact and language-sensitive multilingual translation method[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1213-1223.
[15]DABRE R,FUJITA A.Recurrent stacking of layers for compact neural machine translation models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):6292-6299.
[16]WANG S,FAN Y X,GUO J F,et al.Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty[J].Journal of Guangxi Normal University(Natural Science Edition),2021,39(2):13-20.
[17]WANG W,WATANABE T,HUGHES M,et al.DenoisingNeural Machine Translation Training with Trusted Data and Online Data Selection[C]//Proceedings of the Third Confe-rence on Machine Translation:Research Papers.2018:133-143.
[18]KUMAR G,FOSTER G,CHERRY C,et al.ReinforcementLearning based Curriculum Optimization for Neural Machine Translation[C]//Proceedings of NAACL-HLT.2019:2054-2061.
[19]PLATANIOS E A,STRETCU O,NEUBIG G,et al.Compe-tence-based Curriculum Learning for Neural Machine Translation[C]//Proceedings of NAACL-HLT.2019:1162-1172.
[20]ZHANG X,SHAPIRO P,KUMAR G,et al.Curriculum Lear-ning for Domain Adaptation in Neural Machine Translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long and Short Papers).2019:1903-1915.
[21]ZHAO M,WU H,NIU D,et al.Reinforced Curriculum Learning on Pre-Trained Neural Machine Translation Models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):9652-9659.
[22]GUO J,TAN X,XU L,et al.Fine-tuning by curriculum learning for non-autoregressive neural machine translation[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:7839-7846.
[23]RAGHU M,GILMER J,YOSINSKI J,et al.SVCCA:Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability[C]//Advances in Neural Information Processing Systems 30:Annual Conference on Neural Information Processing Systems.2017:6076-6085.
[24]KUDUGUNTA S,BAPNA A,CASWELL I,et al.Investigating Multilingual NMT Representations at Scale[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:1565-1575.
[25]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:An overview with application to learning methods[J].Neural computation,2004,16(12):2639-2664.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[1] DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312.
[2] LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23.
[3] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[4] LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46.
[5] CHENG Gao-feng, YAN Yong-hong. Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods [J]. Computer Science, 2022, 49(1): 47-52.
[6] LIU Chuang, XIONG De-yi. Survey of Multilingual Question Answering [J]. Computer Science, 2022, 49(1): 65-72.
[7] NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[8] LIU Xiao-die. Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception [J]. Computer Science, 2021, 48(6A): 299-305.
[9] GUO Dan, TANG Shen-geng, HONG Ri-chang, WANG Meng. Review of Sign Language Recognition, Translation and Generation [J]. Computer Science, 2021, 48(3): 60-70.
[10] ZHOU Xiao-shi, ZHANG Zi-wei, WEN Juan. Natural Language Steganography Based on Neural Machine Translation [J]. Computer Science, 2021, 48(11A): 557-564.
[11] QIAO Bo-wen,LI Jun-hui. Neural Machine Translation Combining Source Semantic Roles [J]. Computer Science, 2020, 47(2): 163-168.
[12] JI Ming-xuan, SONG Yu-rong. New Machine Translation Model Based on Logarithmic Position Representation and Self-attention [J]. Computer Science, 2020, 47(11A): 86-91.
[13] WANG Kun, DUAN Xiang-yu. Neural Machine Translation Inclined to Close Neighbor Association [J]. Computer Science, 2019, 46(5): 198-202.
[14] ZHANG Ai-ying. Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection [J]. Computer Science, 2018, 45(9): 308-313.
[15] WANG Qi, DUAN Xiang-yu. Neural Machine Translation Based on Attention Convolution [J]. Computer Science, 2018, 45(11): 226-230.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!