基于注意力卷积的神经机器翻译

doi:10.11896／j.issn.1002-137X.2018.11.035

摘要/Abstract

摘要： 现有神经机器翻译模型普遍采用的注意力机制是基于单词级别的,文中通过在注意力机制上执行多层卷积,从而将注意力机制从基于单词的级别提高到基于短语的级别。经过卷积操作后的注意力信息将愈加明显地体现出短语结构性,并被用于生成新的上下文向量,从而将新生成的上下文向量融入到神经机器翻译框架中。在大规模的中-英测试数据集上的实验结果表明,基于注意力卷积的神经机翻译模型能够很好地捕获语句中的短语结构信息,增强翻译词前后的上下文依赖关系,优化上下文向量,提高机器翻译的性能。

关键词: 短语级别, 多层卷积网络结构, 神经机器翻译, 注意力机制

Abstract: The attention mechanism commonly used by the existing neural machine translation is based on the word level.By creating multi-layer convolutional structure on the basis of attention mechanism,this paper improved attention mecha-nism from word-based level to phrase-based level.After convolutional operation,the attention information can reflect phrase structure more clearly and generate new context vectors.Then,the new context vectors are used to integrate into the neural machine translation framework.Experimental results on large-scale Chinese-to-English tasks show that neural machine translation based on attention convolution can effectively capture the phrasal information in statements,enhance the context dependencies of translated words,optimize the context vectors and improve the translation quality.

Key words: Attention mechanism, Multi-layer convolutional structure, Neural machine translation, Phrase-based level

中图分类号:

TP391

汪琪, 段湘煜. 基于注意力卷积的神经机器翻译[J]. 计算机科学, 2018, 45(11): 226-230. https://doi.org/10.11896／j.issn.1002-137X.2018.11.035

WANG Qi, DUAN Xiang-yu. Neural Machine Translation Based on Attention Convolution[J]. Computer Science, 2018, 45(11): 226-230. https://doi.org/10.11896／j.issn.1002-137X.2018.11.035

参考文献

[1]FENG Z W.Studies of SCI-Tech Translation[M].Beijing:China Translation Corporation,2004.(in Chinese)
冯志伟.机器翻译研究[M].北京:中国对外翻译出版公司,2004.
[2]宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008.
[3]LIU Q.Syntax-based Statistical Machine Transaltion Models and Approaches[J].Journal of Chinese Information Processing,2011,25(6):63-71.(in Chinese)
刘群.基于句法的统计机器翻译模型与方法[J].中文信息学报,2011,25(6):63-71.
[4]KOEHN P,OCH F J,MARCU D.Statistical Phrase-based Translation[C]∥Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1.Associa-tion for Computational Linguistics,2003:48-54.
[5]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473,2014.
[6]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-based Neural Machine Translation[J].arXiv:1508.04025,2015.
[7]SUTSKEVER I,VINYALS O,LE Q V.Sequence to Sequence Learning with Neural Networks[C]∥Advances in Neural Information Processing Systems.2014:3104-3112.
[8]LI Y C,XIONG D Y,ZHANG M.A survey of Neural Machine Translation[J／OL].Chinese Journal of Computers.http://cjc.ict.ac.cn/online/bfpub/lyc-20171229152034.pdf.(in Chinese)
李亚超,熊德意,张民.神经机器翻译综述[J／OL].计算机学报.http://cjc.ict.ac.cn/online/bfpub/lyc-20171229152034.pdf.
[9]GEHRING J,AULI M,GRANGIER D,et al.Convolutional Sequence to Sequence Learning[J].arXiv preprint arXiv:1705.03122,2017.
[10]XU K,BA J,KIROS R,et al.Show,Attend and Tell:Neural Ima- ge Caption Generation with Visual Attention[C]∥International Conference on Machine Learning.2015:2048-2057.
[11]CHENG Y,WU H,WU H,et al.Agreement-based joint trai- ning for bidirectional attention-based neural machine translation[J].arXivpreprint arXiv:1512.04650,2015.
[12]TU Z,LU Z,LIU Y,et al.Modeling Coverage for Neural Machine Translation[J].arXiv preprint arXiv:1601.04811,2016.
[13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[C]∥Advances in Neural Information Processing Systems.2017:5998-6008.
[14]CHO K,MERRIENBOER B,GULECHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].arXiv preprint arXiv:1406.1078,2014.
[15]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[16]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computiton,1997,9(8):1735-1780.
[17]NEUBIG G,DYER C,GOLDBERG Y,et al.DyNet:The Dynamic Neural Network Toolkit[J].arXiv:1701.03980.
[18]NEUBIG G.Lamtram:A toolkit for language and translation modeling using neural networks[OL].http://www.github.com/neubig/lamtram,2015.
[19]PAPINENI K,ROUKOS S,WARD T,et al.Bleu:a Method for Automatic Evaluation of Machine Translation[C]∥Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:311-318.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[8]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12]	熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[13]	彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[14]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed