一种基于对数位置表示和自注意力的机器翻译新模型

doi:10.11896/jsjkx.200200003

计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 86-91.doi: 10.11896/jsjkx.200200003

一种基于对数位置表示和自注意力的机器翻译新模型

纪明轩¹, 宋玉蓉²

1 南京邮电大学计算机学院南京 210023
2 南京邮电大学自动化学院南京 210023

出版日期:2020-11-15 发布日期:2020-11-17
通讯作者: 宋玉蓉(songyr@njupt.edu.cn)
作者简介:793271650@qq.com
基金资助:
国家自然科学基金(61672298,61873326,61802155);江苏高校哲学社会科学研究重点项目(2018SJZDI142)

New Machine Translation Model Based on Logarithmic Position Representation and Self-attention

JI Ming-xuan¹, SONG Yu-rong²

1 College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
2 College of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210023,China

Online:2020-11-15 Published:2020-11-17
About author:JI Ming-xuan,born in 1995,postgradua-te.His main research interests include machine translation and emotion analysis.
SONG Yu-rong,born in 1971,Ph.D,professor,is a member of China Computer Federation.Her main research interests include network information dissemination and its control.
Supported by:
This work was supported by the National Natural Science Foundation of China (61672298,61873326,61802155) and Key Research Projects of Philosophy and Social Sciences in Jiangsu Universities (2018SJZDI142).

摘要/Abstract

摘要： 在机器翻译任务中,自注意力机制由于高度可并行化的计算能力而减少了模型的训练时间,并且可以有效地捕捉到上下文中所有单词之间的语义相关度而受到了广泛的关注。然而,不同于循环神经网络,自注意力机制的高效源于忽略上下文单词之间的位置结构信息。为了使模型能够利用单词之间的位置信息,基于自注意力机制的机器翻译模型Transformer使用正余弦位置编码方式表示单词的绝对位置信息,然而,这种方法虽然能够反应出相对距离,但却缺乏方向性。文中将对数位置表示方法与自注意力机制相结合,提出一种机器翻译新模型。该模型不仅继承了自注意力机制的高效性,还可以保留单词之间的距离信息与方向性信息。研究表明,与传统的自注意力机制模型以及其它模型相比,文中所提新模型能够显著地提高机器翻译的准确性。

关键词: 对数位置表示, 机器翻译, 位置编码, 位置信息, 自注意力

Abstract: In the task of machine translation,self-attention mechanism has attracted widespread attention due to its highly parallelizable computing ability,which significantly reduces the training time of the model,and its ability to effectively capture the semantic relevance between all words in the context.However,unlike recurrent neural networks,the efficiency of self-attention mechanism stems from ignoring the position information between the words of the context.In order to make the model utilize the position information between the words,the machine translation model called Transformer,which is based on self-attention mechanism,represents the absolute position information of the words with sine function and cosine function.Although this method can reflect the relative distance,it lacks directionality.Therefore,based on the logarithmic position representation and self-attention mechanism,a new model of machine translation is proposed.This model not only inherits the efficiency of self-attention mechanism,but also retains distance and directionality between words.The results show that the new modelcan significantly improve the accuracy of machine translation compared with the traditional self-attention mechanism model and other models.

Key words: Logarithmic position representation, Machine translation, Position encoding, Position information, Self-attention

中图分类号:

TP181

纪明轩, 宋玉蓉. 一种基于对数位置表示和自注意力的机器翻译新模型[J]. 计算机科学, 2020, 47(11A): 86-91. https://doi.org/10.11896/jsjkx.200200003

JI Ming-xuan, SONG Yu-rong. New Machine Translation Model Based on Logarithmic Position Representation and Self-attention[J]. Computer Science, 2020, 47(11A): 86-91. https://doi.org/10.11896/jsjkx.200200003

参考文献

[1] SAK H,SENIOR A,BEAUFAVS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Fifteenth Annual Conference of the International Speech Communication Association.2014.
[2] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[3] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[4] JOZEFOWICZ R,VINYALS O,MIKE S M,et al.Exploring the limits of language modeling[J].arXiv:1602.02410,2016.
[5] NIE Y,HAN Y,HUANG J,et al.Attention-based encoder-decoder model for answer selection in question answering[J].Frontiers of Information Technology & Electronic Engineering,2017,18(4):535-544.
[6] SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[J].arXiv:1701.06538,2017.
[7] CAO J,LI R.Fixed-time synchronization of delayed memristor-based recurrent neural networks[J].Science China Information Sciences,2017,60(3):108-122.
[8] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:1243-1252.
[10] CHO K,VAN M B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[11] BAEVSKI A,EDUNOV S,LIU Y,et al.Cloze-driven pretraining of self-attention networks[J].arXiv:1903.07785,2019.
[12] SHEN T,ZHOU T,LONG G,et al.Disan:directional self-attention network for rnn/cnn-free language understanding[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[13] XU D,RUAN C,KORPEOGLU E,et al.Self-attention with functional time representation learning[C]//Advances in Neural Information Processing Systems.2019:15889-15899.
[14] LIANG X B,REN F L,LIU Y K,et al.N-reader:machine reading comprehension model based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):134-141.
[15] HAO J,WANG X,SHI S,et al.Towards better modeling hierarchical structure for self-attention with ordered neurons[J].arXiv:1909.01562,2019
[16] SHENY,TAN S,SORDONI A,et al.Ordered neurons:integrating tree structures into recurrent neural networks[J].arXiv:1810.09536,2018.
[17] HAO J,WANG X,SHI S,et al.Multi-granularity self-attention for neural machine translation[J].arXiv:1909.02222,2019.
[18] YANG B,WANG L,WONG D F,et al.Convolutional self-attention networks[J].arXiv:1904.03107,2019.
[19] FAN Z W,ZHANG M,LI Z H.BiLSTM-based implicit dis-course relation classification combining self-attention mechanism and syntactic information[J].Computer Science,2019,46(5):221-227.
[20] WANG Y S,LEE H Y,CHEN Y N.Tree Transformer:Integrating Tree Structures into Self-Attention[J].arXiv:1909.06639,2019.
[21] ZHAO H,ZHANG Y,LIU S,et al.PSAnet:point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283.
[22] WANG X,TU Z,WANG L,et al.Self-attention with structural position representation[J].arXiv:1909.00383,2019.
[23] MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one?[C]//Advances in Neural Information Processing Systems.2019:14014-14024.
[24] VOITA E,TALBOT D,MOISEEV F,et al.Analyzing multi-head self-attention:specialized heads do the heavy lifting,the rest can be pruned[J].arXiv:1905.09418,2019.
[25] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.

相关文章 15

[1]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2]	吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3]	方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[4]	陈坤峰, 潘志松, 王家宝, 施蕾, 张锦. 基于双目叠加仿生的微换衣行人再识别 Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation 计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[5]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[6]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7]	张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[8]	董振恒, 任维平, 游新冬, 吕学强. 融入新能源领域术语知识的机器翻译方法 Machine Translation Method Integrating New Energy Terminology Knowledge 计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117
[9]	赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[10]	韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[11]	胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[12]	宁秋怡, 史小静, 段湘煜, 张民. 基于风格感知的无监督领域适应算法 Unsupervised Domain Adaptation Based on Style Aware 计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094
[13]	刘俊鹏, 苏劲松, 黄德根. 融合特定语言适配模块的多语言神经机器翻译 Incorporating Language-specific Adapter into Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005
[14]	于东, 谢婉莹, 谷舒豪, 冯洋. 基于语种关联度课程学习的多语言神经机器翻译 Similarity-based Curriculum Learning for Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 24-30. https://doi.org/10.11896/jsjkx.210800254
[15]	侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种基于对数位置表示和自注意力的机器翻译新模型

New Machine Translation Model Based on Logarithmic Position Representation and Self-attention

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0