计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 17-23.doi: 10.11896/jsjkx.210900005

• 多语言计算前沿技术* 上一篇    下一篇

融合特定语言适配模块的多语言神经机器翻译

刘俊鹏1, 苏劲松2, 黄德根1   

  1. 1 大连理工大学计算机科学与技术学院 辽宁 大连116024
    2 厦门大学信息学院 福建 厦门361005
  • 收稿日期:2021-09-01 修回日期:2021-10-19 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 黄德根(huangdg@dlut.edu.cn)
  • 作者简介:liujunpeng_nlp@mail.dlut.edu.cn
  • 基金资助:
    国家重点研发计划(2020AAA0108004)

Incorporating Language-specific Adapter into Multilingual Neural Machine Translation

LIU Jun-peng1, SU Jin-song2, HUANG De-gen1   

  1. 1 School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China
    2 School of Informatics,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2021-09-01 Revised:2021-10-19 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Jun-peng,born in 1992,postgra-duate.His main research interests include machine translation and so on.
    HUANG De-gen,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine translation and neural language processing.
  • Supported by:
    National Key Research and Development Program of China(2020AAA0108004).

摘要: 多语言神经机器翻译利用单一的编码器-解码器模型对多种语言之间的翻译同时进行建模。多语言神经机器翻译不仅能够促进关联语言之间的知识迁移,提高低资源语言的翻译质量,并且能够实现未见语言对之间的翻译。现有多语言神经机器翻译仍然存在语言多样性建模能力不足和未见语言对翻译质量不佳的问题。为此,首先在现有的适配器模型基础上提出变维双语适配器模型,在Transformer模型的每个子层之间加入双语适配器以抽取每个语言对的独特特征,并通过改变适配器隐层维度调整编码器和解码器两端的特定语言表达空间;其次,提出一种共享单语适配器模型,对每种语言的独特特征进行建模。在IWSLT多语言翻译数据集上的实验结果表明,变维双语适配器模型能够显著提升多语言翻译的性能,而单语适配器模型能够在不影响多语言翻译性能的条件下提高未见语言对的翻译质量。

关键词: 单语适配器, 多语言神经机器翻译, 双语适配器, 特定语言建模

Abstract: Multilingual neural machine translation (mNMT) leverages a single encoder-decoder model for translations in multiple language pairs.mNMT can encourage knowledge transfer among related languages,improve low-resource translation and enable zero-shot translation.However,the existing mNMT models are weak in modeling language diversity and perform poor zero-shot translation.To solve the above problems,we first propose a variable dimension bilingual adapter based on the existing adapter architecture.The bilingual adapters are introduced in-between each two Transformer sub-layers to extract language-pair-specific features and the language-pair-specific capacity in the encoder or the decoder can be altered by changing the inner dimension of adapters.We then propose a shared monolingual adapter to model unique features for each language.Experiments on IWSLT dataset show that the proposed model remarkably outperforms the multilingual baseline model and the monolingual adapter can improve the zero-shot translation without deteriorating the performance of multilingual translation.

Key words: Bilingual adapter, Language-specific modeling, Monolingual adapter, Multilingual neural machine translation

中图分类号: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[2]JOHNSON M,SCHUSTER M,LE Q V,et al.Google's Multilingual Neural Machine Translation System:Enabling zero-shot Translation[J].Transactions of the Association for Computational Linguistics,2017,5:339-351.
[3]BAPNA A,ARIVAZHAGAN N,FIRAT O.Simple,ScalableAdaptation for Neural Machine Translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:1538-1548.
[4]WANG Y N,ZHANG J J,ZHAI F F,et al.Three Strategies to Improve One-to-Many Multilingual Translation[C]//Procee-dings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:2955-2960.
[5]SACHAN D C,NEUBIG G.Parameter Sharing Methods forMultilingual Self-Attentional Translation Models[C]//Procee-dings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2018:261-271.
[6]PLATANIOS E A,SACHAN M,NEUBIG G,et al.Contextual Parameter Generation for Universal Neural Machine Translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:425-435.
[7]TAN X,CHEN J L,HE D,et al.Multilingual Neural Machine Translation with Language Clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:963-973.
[8]WANG Y N,ZHOU L,ZHANG J J,et al.A Compact and Language-Sensitive Multilingual Translation Method[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1213-1223.
[9]ZHANG B,WILLIAMS P,TITOV I,et al.Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL.2020:1628-1639.
[10]ZHANG B,BAPNA A,SENNRICH R,et al.Share or not?Learning to Schedule Language-specific Capacity for Multilingual Translation[C]//International Conference on Learning Representations.2021.
[11]GU J T,WANG Y,CHO K,et al.Improved zero-shot NeuralMachine Translation via Ignoring Spurious Correlations[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1258-1268.
[12]ARIVAZHAGAN N,BAPNA A,FIRAT O,et al.The Missing Ingredient in zero-shot Neural Machine Translation[J] arXiv:1903.07091.[13]CURREY A,HEAFIELD K.Zero-resource Neural MachineTranslation with Monolingual Pivot Data[C]//Proceedings of the 3rd Workshop in Neural Generation and Translation.Hong Kong:ACL,2019:99-107.
[14]FIRAT O,SANKARAN B,AL-ONAIZAN Y,et al.Zero-resource Translation with Multi-lingual Neural Machine Translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Austin:ACL,2016:268-277.
[15]AL-SHEDIVAT M,PARIKH A.Consistency by Agreement in zero-shot Neural Machine Translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:1184-1197.
[16]PHILIP J,BÉRARD A,GALLÉ M,et al.Monolingual Adapters for Zero-shot Neural Machine Translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Online:ACL,2020:4465-4470.
[17]SENNRICH R,HADDOW B,BIRCH A.Neural MachineTranslation of Rare Words with Subword Units[C]//Procee-dings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:1715-1725.
[18]KING D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.
[19]POST M.A Call for Clarity in Reporting BLEU Scores[C]//Proceedings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2019:186-191.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 王明, 武文芳, 王大玲, 冯时, 张一飞.
生成链接树:一种高数据真实性的反事实解释生成方法
Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity
计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158
[3] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[5] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[6] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[7] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[8] 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇.
基于异质信息网的短文本特征扩充方法
Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[9] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[10] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[11] 曹晓雯, 梁美玉, 鲁康康.
基于细粒度语义推理的跨媒体双路对抗哈希学习模型
Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model
计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[12] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[13] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[14] 曲倩文, 车啸平, 曲晨鑫, 李瑾如.
基于信息感知的虚拟现实用户临场感研究
Study on Information Perception Based User Presence in Virtual Reality
计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200
[15] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!