计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 24-30.doi: 10.11896/jsjkx.210800254

• 多语言计算前沿技术* 上一篇    下一篇

基于语种关联度课程学习的多语言神经机器翻译

于东1, 谢婉莹1, 谷舒豪2,3, 冯洋2,3   

  1. 1 北京语言大学信息科学学院 北京100083
    2 中国科学院计算技术研究所 北京100190
    3 中国科学院大学 北京100049
  • 收稿日期:2021-08-28 修回日期:2021-10-18 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 冯洋(fengyang@ict.ac.cn)
  • 作者简介:yudong@blcu.edu.cn
  • 基金资助:
    教育部人文社会科学研究青年基金项目(19YJCZH230);北京语言大学研究生创新基金资助项目(20YCX138)

Similarity-based Curriculum Learning for Multilingual Neural Machine Translation

YU Dong1, XIE Wan-ying1, GU Shu-hao2,3, FENG Yang2,3   

  1. 1 College of Information Sciences,Beijing Language and Culture University,Beijing 100083,China
    2 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    3 University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2021-08-28 Revised:2021-10-18 Online:2022-01-15 Published:2022-01-18
  • About author:YU Dong,born in 1982,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His main research interests include natural language processing and artificial intelligence.
    FENG Yang,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include natural language processing,machine translation and dialogue.
  • Supported by:
    Humanity and Social Science Youth Foundation of Ministry of Education(19YJCZH230) and Research Funds of Beijing Language and Culture University(20YCX138).

摘要: 近年来,使用单一模型实现多语言神经机器翻译的方法受到了广泛关注。然而,现有方法多将所有语种语料直接混合作为训练语料,未能利用多种语言之间关联和相似的信息。此外,模型训练涉及语言种类多、数据量大、整体训练难度大、耗时长等问题。针对以上两个问题,文中提出了一种基于语种关联度的课程学习方法来提高多语言神经机器翻译的整体性能和收敛速度。具体来说,提出了两种度量语种关联度的指标:使用奇异向量典型相关分析对不同语言进行排序以及使用余弦相似度对特定语言中的不同句子进行排序。进一步,文中提出以验证集损失为课程替换标准的课程学习策略,使模型训练由整体训练转化为一系列课程上的训练,降低了训练难度。该方法填补了课程学习策略在多语言神经机器翻译领域的空白。文中在平衡和非平衡的IWSLT多语言数据集和Europarl语料库数据集上进行了实验,结果表明,所提方法优于多语言基线翻译系统,最多可使训练时间缩短64%。

关键词: 多语言, 关联度评估, 机器翻译, 句子排序, 课程学习, 语种排序

Abstract: Multilingual neural machine translation (MNMT) with a single model has drawn more attention due to its capability to deal with multiple languages.However,the current multilingual translation paradigm does not make use of the similar features embodied in different languages,which has already been proven useful for improving the multilingual translation.Besides,the training of multilingual model is usually very time-consuming due to the huge amount of training data.To address these problems,we propose a similarity-based curriculum learning method to improve the overall performance and convergence speed.We propose two hierarchical criteria for measuring the similarity,one is for ranking different languages (inter-language) with singular vector canonical correlation analysis,and the other is for ranking different sentences in a particular language (intra-language) with cosine similarity.At the same time,the paper proposes a curriculum learning strategy that takes the loss of validation set as the curriculum replacement standard.We conduct experiments on balanced and unbalanced IWSLT multilingual data sets and Europarl corpus datasets.The results demonstrate that the proposed method outperforms strong multilingual translation systems and can achieve up to a 64% decrease in training time.

Key words: Curriculum learning, Language ranking, Machine translation, Multilingual, Sentence ranking, Sentence ranking, Similarity evaluation

中图分类号: 

  • TP391
[1]KALCHBRENNER N,BLUNSOM P.Recurrent continuoustranslation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013:1700-1709.
[2]AHARONI R,JOHNSON M,FIRAT O.Massively multilingual neural machine translation[J].arXiv:1903.00089,2019.
[3]ARIVAZHAGAN N,BAPNA A,FIRAT O,et al.2019.Massively Multilingual Neural Machine Translation in the Wild:Findings and Challenges[J].arXiv:1907.05019,2019.
[4]HA T L,NIEHUES J,WAIBEL A.Toward multilingual neural machine translation with universal encoder and decoder[J].ar-Xiv:1611.04798,2016.
[5]XUE Q T,LI J H,GONG Z X.Multi-language unsupervisedneural machine translation[J].Journal of Xiamen University(Natural Science),2020,59(2):192-197.
[6]FIRAT O,CHO K,BENGIO Y.Multi-Way,Multilingual Neural Machine Translation with a Shared Attention Mechanism[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:866-875.
[7]JOHNSON M,SCHUSTER M,LE Q V,et al.Google's multilingual neural machine translation system:Enabling zero-shot translation[J].Transactions of the Association for Computational Linguistics,2017,5:339-351.
[8]TAN X,CHEN J,HE D,et al.Multilingual Neural MachineTranslation with Language Clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:963-973.
[9]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[10]KOCMI T,BOJAR O.Curriculum Learning and MinibatchBucketing in Neural Machine Translation[C]//Proceedings of the International Conference Recent Advances in Natural Language Processing(RANLP 2017).2017:379-386.
[11]LU Y,KEUNG P,LADHAK F,et al.A neural interlingua formultilingual machine translation[C]//Proceedings of the Third Conference on Machine Translation:Research Papers.2018:84-92.
[12]GU J,HASSAN H,DEVLIN J,et al.Universal Neural MachineTranslation for Extremely Low Resource Languages[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long Papers).2018:344-354.
[13]DONG D,WU H,HE W,et al.Multi-task learning for multiple language translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers).2015:1723-1732.
[14]WANG Y,ZHOU L,ZHANG J,et al.A compact and language-sensitive multilingual translation method[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1213-1223.
[15]DABRE R,FUJITA A.Recurrent stacking of layers for compact neural machine translation models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):6292-6299.
[16]WANG S,FAN Y X,GUO J F,et al.Dynamic Learning Method of Neural Machine Translation Based on Sample Difficulty[J].Journal of Guangxi Normal University(Natural Science Edition),2021,39(2):13-20.
[17]WANG W,WATANABE T,HUGHES M,et al.DenoisingNeural Machine Translation Training with Trusted Data and Online Data Selection[C]//Proceedings of the Third Confe-rence on Machine Translation:Research Papers.2018:133-143.
[18]KUMAR G,FOSTER G,CHERRY C,et al.ReinforcementLearning based Curriculum Optimization for Neural Machine Translation[C]//Proceedings of NAACL-HLT.2019:2054-2061.
[19]PLATANIOS E A,STRETCU O,NEUBIG G,et al.Compe-tence-based Curriculum Learning for Neural Machine Translation[C]//Proceedings of NAACL-HLT.2019:1162-1172.
[20]ZHANG X,SHAPIRO P,KUMAR G,et al.Curriculum Lear-ning for Domain Adaptation in Neural Machine Translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long and Short Papers).2019:1903-1915.
[21]ZHAO M,WU H,NIU D,et al.Reinforced Curriculum Learning on Pre-Trained Neural Machine Translation Models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):9652-9659.
[22]GUO J,TAN X,XU L,et al.Fine-tuning by curriculum learning for non-autoregressive neural machine translation[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2020:7839-7846.
[23]RAGHU M,GILMER J,YOSINSKI J,et al.SVCCA:Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability[C]//Advances in Neural Information Processing Systems 30:Annual Conference on Neural Information Processing Systems.2017:6076-6085.
[24]KUDUGUNTA S,BAPNA A,CASWELL I,et al.Investigating Multilingual NMT Representations at Scale[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).2019:1565-1575.
[25]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:An overview with application to learning methods[J].Neural computation,2004,16(12):2639-2664.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[1] 董振恒, 任维平, 游新冬, 吕学强.
融入新能源领域术语知识的机器翻译方法
Machine Translation Method Integrating New Energy Terminology Knowledge
计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117
[2] 刘俊鹏, 苏劲松, 黄德根.
融合特定语言适配模块的多语言神经机器翻译
Incorporating Language-specific Adapter into Multilingual Neural Machine Translation
计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005
[3] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[4] 刘妍, 熊德意.
面向小语种机器翻译的平行语料库构建方法
Construction Method of Parallel Corpus for Minority Language Machine Translation
计算机科学, 2022, 49(1): 41-46. https://doi.org/10.11896/jsjkx.210900012
[5] 程高峰, 颜永红.
多语言语音识别声学模型建模方法最新进展
Latest Development of Multilingual Speech Recognition Acoustic Model Modeling Methods
计算机科学, 2022, 49(1): 47-52. https://doi.org/10.11896/jsjkx.210900013
[6] 刘创, 熊德意.
多语言问答研究综述
Survey of Multilingual Question Answering
计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003
[7] 宁秋怡, 史小静, 段湘煜, 张民.
基于风格感知的无监督领域适应算法
Unsupervised Domain Adaptation Based on Style Aware
计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094
[8] 刘小蝶.
基于边界感知的复杂名词短语的识别和转换研究
Recognition and Transformation for Complex Noun Phrases Based on Boundary Perception
计算机科学, 2021, 48(6A): 299-305. https://doi.org/10.11896/jsjkx.200500157
[9] 郭丹, 唐申庚, 洪日昌, 汪萌.
手语识别、翻译与生成综述
Review of Sign Language Recognition, Translation and Generation
计算机科学, 2021, 48(3): 60-70. https://doi.org/10.11896/jsjkx.210100227
[10] 周小诗, 张梓葳, 文娟.
基于神经网络机器翻译的自然语言信息隐藏
Natural Language Steganography Based on Neural Machine Translation
计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015
[11] 乔博文,李军辉.
融合语义角色的神经机器翻译
Neural Machine Translation Combining Source Semantic Roles
计算机科学, 2020, 47(2): 163-168. https://doi.org/10.11896/jsjkx.190100048
[12] 纪明轩, 宋玉蓉.
一种基于对数位置表示和自注意力的机器翻译新模型
New Machine Translation Model Based on Logarithmic Position Representation and Self-attention
计算机科学, 2020, 47(11A): 86-91. https://doi.org/10.11896/jsjkx.200200003
[13] 王坤, 段湘煜.
倾向近邻关联的神经机器翻译
Neural Machine Translation Inclined to Close Neighbor Association
计算机科学, 2019, 46(5): 198-202. https://doi.org/10.11896/j.issn.1002-137X.2019.05.030
[14] 张爱英.
基于多语言语音数据选择的资源稀缺蒙语语音识别研究
Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection
计算机科学, 2018, 45(9): 308-313. https://doi.org/10.11896/j.issn.1002-137X.2018.09.052
[15] 汪琪, 段湘煜.
基于注意力卷积的神经机器翻译
Neural Machine Translation Based on Attention Convolution
计算机科学, 2018, 45(11): 226-230. https://doi.org/10.11896/j.issn.1002-137X.2018.11.035
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!