计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 86-91.doi: 10.11896/jsjkx.200200003

• 人工智能 • 上一篇    下一篇

一种基于对数位置表示和自注意力的机器翻译新模型

纪明轩1, 宋玉蓉2   

  1. 1 南京邮电大学计算机学院 南京 210023
    2 南京邮电大学自动化学院 南京 210023
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 宋玉蓉(songyr@njupt.edu.cn)
  • 作者简介:793271650@qq.com
  • 基金资助:
    国家自然科学基金(61672298,61873326,61802155);江苏高校哲学社会科学研究重点项目(2018SJZDI142)

New Machine Translation Model Based on Logarithmic Position Representation and Self-attention

JI Ming-xuan1, SONG Yu-rong2   

  1. 1 College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 College of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:JI Ming-xuan,born in 1995,postgradua-te.His main research interests include machine translation and emotion analysis.
    SONG Yu-rong,born in 1971,Ph.D,professor,is a member of China Computer Federation.Her main research interests include network information dissemination and its control.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61672298,61873326,61802155) and Key Research Projects of Philosophy and Social Sciences in Jiangsu Universities (2018SJZDI142).

摘要: 在机器翻译任务中,自注意力机制由于高度可并行化的计算能力而减少了模型的训练时间,并且可以有效地捕捉到上下文中所有单词之间的语义相关度而受到了广泛的关注。然而,不同于循环神经网络,自注意力机制的高效源于忽略上下文单词之间的位置结构信息。为了使模型能够利用单词之间的位置信息,基于自注意力机制的机器翻译模型Transformer使用正余弦位置编码方式表示单词的绝对位置信息,然而,这种方法虽然能够反应出相对距离,但却缺乏方向性。文中将对数位置表示方法与自注意力机制相结合,提出一种机器翻译新模型。该模型不仅继承了自注意力机制的高效性,还可以保留单词之间的距离信息与方向性信息。研究表明,与传统的自注意力机制模型以及其它模型相比,文中所提新模型能够显著地提高机器翻译的准确性。

关键词: 对数位置表示, 机器翻译, 位置编码, 位置信息, 自注意力

Abstract: In the task of machine translation,self-attention mechanism has attracted widespread attention due to its highly parallelizable computing ability,which significantly reduces the training time of the model,and its ability to effectively capture the semantic relevance between all words in the context.However,unlike recurrent neural networks,the efficiency of self-attention mechanism stems from ignoring the position information between the words of the context.In order to make the model utilize the position information between the words,the machine translation model called Transformer,which is based on self-attention mechanism,represents the absolute position information of the words with sine function and cosine function.Although this method can reflect the relative distance,it lacks directionality.Therefore,based on the logarithmic position representation and self-attention mechanism,a new model of machine translation is proposed.This model not only inherits the efficiency of self-attention mechanism,but also retains distance and directionality between words.The results show that the new modelcan significantly improve the accuracy of machine translation compared with the traditional self-attention mechanism model and other models.

Key words: Logarithmic position representation, Machine translation, Position encoding, Position information, Self-attention

中图分类号: 

  • TP181
[1] SAK H,SENIOR A,BEAUFAVS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Fifteenth Annual Conference of the International Speech Communication Association.2014.
[2] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[3] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[4] JOZEFOWICZ R,VINYALS O,MIKE S M,et al.Exploring the limits of language modeling[J].arXiv:1602.02410,2016.
[5] NIE Y,HAN Y,HUANG J,et al.Attention-based encoder-decoder model for answer selection in question answering[J].Frontiers of Information Technology & Electronic Engineering,2017,18(4):535-544.
[6] SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[J].arXiv:1701.06538,2017.
[7] CAO J,LI R.Fixed-time synchronization of delayed memristor-based recurrent neural networks[J].Science China Information Sciences,2017,60(3):108-122.
[8] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:1243-1252.
[10] CHO K,VAN M B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[11] BAEVSKI A,EDUNOV S,LIU Y,et al.Cloze-driven pretraining of self-attention networks[J].arXiv:1903.07785,2019.
[12] SHEN T,ZHOU T,LONG G,et al.Disan:directional self-attention network for rnn/cnn-free language understanding[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[13] XU D,RUAN C,KORPEOGLU E,et al.Self-attention with functional time representation learning[C]//Advances in Neural Information Processing Systems.2019:15889-15899.
[14] LIANG X B,REN F L,LIU Y K,et al.N-reader:machine reading comprehension model based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):134-141.
[15] HAO J,WANG X,SHI S,et al.Towards better modeling hierarchical structure for self-attention with ordered neurons[J].arXiv:1909.01562,2019
[16] SHENY,TAN S,SORDONI A,et al.Ordered neurons:integrating tree structures into recurrent neural networks[J].arXiv:1810.09536,2018.
[17] HAO J,WANG X,SHI S,et al.Multi-granularity self-attention for neural machine translation[J].arXiv:1909.02222,2019.
[18] YANG B,WANG L,WONG D F,et al.Convolutional self-attention networks[J].arXiv:1904.03107,2019.
[19] FAN Z W,ZHANG M,LI Z H.BiLSTM-based implicit dis-course relation classification combining self-attention mechanism and syntactic information[J].Computer Science,2019,46(5):221-227.
[20] WANG Y S,LEE H Y,CHEN Y N.Tree Transformer:Integrating Tree Structures into Self-Attention[J].arXiv:1909.06639,2019.
[21] ZHAO H,ZHANG Y,LIU S,et al.PSAnet:point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283.
[22] WANG X,TU Z,WANG L,et al.Self-attention with structural position representation[J].arXiv:1909.00383,2019.
[23] MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one?[C]//Advances in Neural Information Processing Systems.2019:14014-14024.
[24] VOITA E,TALBOT D,MOISEEV F,et al.Analyzing multi-head self-attention:specialized heads do the heavy lifting,the rest can be pruned[J].arXiv:1905.09418,2019.
[25] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3] 方义秋, 张震坤, 葛君伟.
基于自注意力机制和迁移学习的跨领域推荐算法
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[4] 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦.
基于双目叠加仿生的微换衣行人再识别
Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation
计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[5] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[6] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[8] 董振恒, 任维平, 游新冬, 吕学强.
融入新能源领域术语知识的机器翻译方法
Machine Translation Method Integrating New Energy Terminology Knowledge
计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117
[9] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[10] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[11] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[12] 宁秋怡, 史小静, 段湘煜, 张民.
基于风格感知的无监督领域适应算法
Unsupervised Domain Adaptation Based on Style Aware
计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094
[13] 刘俊鹏, 苏劲松, 黄德根.
融合特定语言适配模块的多语言神经机器翻译
Incorporating Language-specific Adapter into Multilingual Neural Machine Translation
计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005
[14] 于东, 谢婉莹, 谷舒豪, 冯洋.
基于语种关联度课程学习的多语言神经机器翻译
Similarity-based Curriculum Learning for Multilingual Neural Machine Translation
计算机科学, 2022, 49(1): 24-30. https://doi.org/10.11896/jsjkx.210800254
[15] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!