Computer Science ›› 2022, Vol. 49 ›› Issue (1): 17-23.doi: 10.11896/jsjkx.210900005

• Multilingual Computing Advanced Technology • Previous Articles     Next Articles

Incorporating Language-specific Adapter into Multilingual Neural Machine Translation

LIU Jun-peng1, SU Jin-song2, HUANG De-gen1   

  1. 1 School of Computer Science and Technology,Dalian University of Technology,Dalian,Liaoning 116024,China
    2 School of Informatics,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2021-09-01 Revised:2021-10-19 Online:2022-01-15 Published:2022-01-18
  • About author:LIU Jun-peng,born in 1992,postgra-duate.His main research interests include machine translation and so on.
    HUANG De-gen,born in 1965,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine translation and neural language processing.
  • Supported by:
    National Key Research and Development Program of China(2020AAA0108004).

Abstract: Multilingual neural machine translation (mNMT) leverages a single encoder-decoder model for translations in multiple language pairs.mNMT can encourage knowledge transfer among related languages,improve low-resource translation and enable zero-shot translation.However,the existing mNMT models are weak in modeling language diversity and perform poor zero-shot translation.To solve the above problems,we first propose a variable dimension bilingual adapter based on the existing adapter architecture.The bilingual adapters are introduced in-between each two Transformer sub-layers to extract language-pair-specific features and the language-pair-specific capacity in the encoder or the decoder can be altered by changing the inner dimension of adapters.We then propose a shared monolingual adapter to model unique features for each language.Experiments on IWSLT dataset show that the proposed model remarkably outperforms the multilingual baseline model and the monolingual adapter can improve the zero-shot translation without deteriorating the performance of multilingual translation.

Key words: Bilingual adapter, Language-specific modeling, Monolingual adapter, Multilingual neural machine translation

CLC Number: 

  • TP391
[1]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[2]JOHNSON M,SCHUSTER M,LE Q V,et al.Google's Multilingual Neural Machine Translation System:Enabling zero-shot Translation[J].Transactions of the Association for Computational Linguistics,2017,5:339-351.
[3]BAPNA A,ARIVAZHAGAN N,FIRAT O.Simple,ScalableAdaptation for Neural Machine Translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:1538-1548.
[4]WANG Y N,ZHANG J J,ZHAI F F,et al.Three Strategies to Improve One-to-Many Multilingual Translation[C]//Procee-dings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:2955-2960.
[5]SACHAN D C,NEUBIG G.Parameter Sharing Methods forMultilingual Self-Attentional Translation Models[C]//Procee-dings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2018:261-271.
[6]PLATANIOS E A,SACHAN M,NEUBIG G,et al.Contextual Parameter Generation for Universal Neural Machine Translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Brussels:ACL,2018:425-435.
[7]TAN X,CHEN J L,HE D,et al.Multilingual Neural Machine Translation with Language Clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).Hong Kong:ACL,2019:963-973.
[8]WANG Y N,ZHOU L,ZHANG J J,et al.A Compact and Language-Sensitive Multilingual Translation Method[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1213-1223.
[9]ZHANG B,WILLIAMS P,TITOV I,et al.Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL.2020:1628-1639.
[10]ZHANG B,BAPNA A,SENNRICH R,et al.Share or not?Learning to Schedule Language-specific Capacity for Multilingual Translation[C]//International Conference on Learning Representations.2021.
[11]GU J T,WANG Y,CHO K,et al.Improved zero-shot NeuralMachine Translation via Ignoring Spurious Correlations[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Florence:ACL,2019:1258-1268.
[12]ARIVAZHAGAN N,BAPNA A,FIRAT O,et al.The Missing Ingredient in zero-shot Neural Machine Translation[J] arXiv:1903.07091.[13]CURREY A,HEAFIELD K.Zero-resource Neural MachineTranslation with Monolingual Pivot Data[C]//Proceedings of the 3rd Workshop in Neural Generation and Translation.Hong Kong:ACL,2019:99-107.
[14]FIRAT O,SANKARAN B,AL-ONAIZAN Y,et al.Zero-resource Translation with Multi-lingual Neural Machine Translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Austin:ACL,2016:268-277.
[15]AL-SHEDIVAT M,PARIKH A.Consistency by Agreement in zero-shot Neural Machine Translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Minneapolis:ACL,2019:1184-1197.
[16]PHILIP J,BÉRARD A,GALLÉ M,et al.Monolingual Adapters for Zero-shot Neural Machine Translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Online:ACL,2020:4465-4470.
[17]SENNRICH R,HADDOW B,BIRCH A.Neural MachineTranslation of Rare Words with Subword Units[C]//Procee-dings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:1715-1725.
[18]KING D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980.
[19]POST M.A Call for Clarity in Reporting BLEU Scores[C]//Proceedings of the Third Conference on Machine Translation:Research Papers.Brussels:ACL,2019:186-191.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!