计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 17-23.doi: 10.11896/jsjkx.210900005
刘俊鹏1, 苏劲松2, 黄德根1
LIU Jun-peng1, SU Jin-song2, HUANG De-gen1
摘要: 多语言神经机器翻译利用单一的编码器-解码器模型对多种语言之间的翻译同时进行建模。多语言神经机器翻译不仅能够促进关联语言之间的知识迁移,提高低资源语言的翻译质量,并且能够实现未见语言对之间的翻译。现有多语言神经机器翻译仍然存在语言多样性建模能力不足和未见语言对翻译质量不佳的问题。为此,首先在现有的适配器模型基础上提出变维双语适配器模型,在Transformer模型的每个子层之间加入双语适配器以抽取每个语言对的独特特征,并通过改变适配器隐层维度调整编码器和解码器两端的特定语言表达空间;其次,提出一种共享单语适配器模型,对每种语言的独特特征进行建模。在IWSLT多语言翻译数据集上的实验结果表明,变维双语适配器模型能够显著提升多语言翻译的性能,而单语适配器模型能够在不影响多语言翻译性能的条件下提高未见语言对的翻译质量。
中图分类号:
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [2]JOHNSON M,SCHUSTER M,LE Q V,et al.Google's Multilingual Neural Machine Translation System:Enabling zero-shot Translation[J].Transactions of the Association for Computational Linguistics,2017,5:339-351. [3]BAPNA A,ARIVAZHAGAN N,FIRAT O.Simple,ScalableAdaptation for Neural Machine Translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:1538-1548. [4]WANG Y N,ZHANG J J,ZHAI F F,et al.Three Strategies to Improve One-to-Many Multilingual Translation[C]//Procee-dings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:2955-2960. [5]SACHAN D C,NEUBIG G.Parameter Sharing Methods forMultilingual Self-Attentional Translation Models[C]//Procee-dings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2018:261-271. [6]PLATANIOS E A,SACHAN M,NEUBIG G,et al.Contextual Parameter Generation for Universal Neural Machine Translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:425-435. [7]TAN X,CHEN J L,HE D,et al.Multilingual Neural Machine Translation with Language Clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:963-973. [8]WANG Y N,ZHOU L,ZHANG J J,et al.A Compact and Language-Sensitive Multilingual Translation Method[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1213-1223. [9]ZHANG B,WILLIAMS P,TITOV I,et al.Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL.2020:1628-1639. [10]ZHANG B,BAPNA A,SENNRICH R,et al.Share or not?Learning to Schedule Language-specific Capacity for Multilingual Translation[C]//International Conference on Learning Representations.2021. [11]GU J T,WANG Y,CHO K,et al.Improved zero-shot NeuralMachine Translation via Ignoring Spurious Correlations[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1258-1268. [12]ARIVAZHAGAN N,BAPNA A,FIRAT O,et al.The Missing Ingredient in zero-shot Neural Machine Translation[J] arXiv:1903.07091.[13]CURREY A,HEAFIELD K.Zero-resource Neural MachineTranslation with Monolingual Pivot Data[C]//Proceedings of the 3rd Workshop in Neural Generation and Translation.Hong Kong:ACL,2019:99-107. [14]FIRAT O,SANKARAN B,AL-ONAIZAN Y,et al.Zero-resource Translation with Multi-lingual Neural Machine Translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Austin:ACL,2016:268-277. [15]AL-SHEDIVAT M,PARIKH A.Consistency by Agreement in zero-shot Neural Machine Translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:1184-1197. [16]PHILIP J,BÉRARD A,GALLÉ M,et al.Monolingual Adapters for Zero-shot Neural Machine Translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Online:ACL,2020:4465-4470. [17]SENNRICH R,HADDOW B,BIRCH A.Neural MachineTranslation of Rare Words with Subword Units[C]//Procee-dings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:1715-1725. [18]KING D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980. [19]POST M.A Call for Clarity in Reporting BLEU Scores[C]//Proceedings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2019:186-191. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 王明, 武文芳, 王大玲, 冯时, 张一飞. 生成链接树:一种高数据真实性的反事实解释生成方法 Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity 计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158 |
[3] | 张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131 |
[4] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[5] | 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲. 基于无监督集群级的科技论文异质图节点表示学习方法 Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level 计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196 |
[6] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[7] | 郑文萍, 刘美麟, 杨贵. 一种基于节点稳定性和邻域相似性的社区发现算法 Community Detection Algorithm Based on Node Stability and Neighbor Similarity 计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146 |
[8] | 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇. 基于异质信息网的短文本特征扩充方法 Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network 计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241 |
[9] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[10] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[11] | 曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011 |
[12] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[13] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[14] | 曲倩文, 车啸平, 曲晨鑫, 李瑾如. 基于信息感知的虚拟现实用户临场感研究 Study on Information Perception Based User Presence in Virtual Reality 计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200 |
[15] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
|