计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 86-91.doi: 10.11896/jsjkx.200200003
纪明轩1, 宋玉蓉2
JI Ming-xuan1, SONG Yu-rong2
摘要: 在机器翻译任务中,自注意力机制由于高度可并行化的计算能力而减少了模型的训练时间,并且可以有效地捕捉到上下文中所有单词之间的语义相关度而受到了广泛的关注。然而,不同于循环神经网络,自注意力机制的高效源于忽略上下文单词之间的位置结构信息。为了使模型能够利用单词之间的位置信息,基于自注意力机制的机器翻译模型Transformer使用正余弦位置编码方式表示单词的绝对位置信息,然而,这种方法虽然能够反应出相对距离,但却缺乏方向性。文中将对数位置表示方法与自注意力机制相结合,提出一种机器翻译新模型。该模型不仅继承了自注意力机制的高效性,还可以保留单词之间的距离信息与方向性信息。研究表明,与传统的自注意力机制模型以及其它模型相比,文中所提新模型能够显著地提高机器翻译的准确性。
中图分类号:
[1] SAK H,SENIOR A,BEAUFAVS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Fifteenth Annual Conference of the International Speech Communication Association.2014. [2] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014. [3] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016. [4] JOZEFOWICZ R,VINYALS O,MIKE S M,et al.Exploring the limits of language modeling[J].arXiv:1602.02410,2016. [5] NIE Y,HAN Y,HUANG J,et al.Attention-based encoder-decoder model for answer selection in question answering[J].Frontiers of Information Technology & Electronic Engineering,2017,18(4):535-544. [6] SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[J].arXiv:1701.06538,2017. [7] CAO J,LI R.Fixed-time synchronization of delayed memristor-based recurrent neural networks[J].Science China Information Sciences,2017,60(3):108-122. [8] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014. [9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:1243-1252. [10] CHO K,VAN M B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014. [11] BAEVSKI A,EDUNOV S,LIU Y,et al.Cloze-driven pretraining of self-attention networks[J].arXiv:1903.07785,2019. [12] SHEN T,ZHOU T,LONG G,et al.Disan:directional self-attention network for rnn/cnn-free language understanding[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018. [13] XU D,RUAN C,KORPEOGLU E,et al.Self-attention with functional time representation learning[C]//Advances in Neural Information Processing Systems.2019:15889-15899. [14] LIANG X B,REN F L,LIU Y K,et al.N-reader:machine reading comprehension model based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):134-141. [15] HAO J,WANG X,SHI S,et al.Towards better modeling hierarchical structure for self-attention with ordered neurons[J].arXiv:1909.01562,2019 [16] SHENY,TAN S,SORDONI A,et al.Ordered neurons:integrating tree structures into recurrent neural networks[J].arXiv:1810.09536,2018. [17] HAO J,WANG X,SHI S,et al.Multi-granularity self-attention for neural machine translation[J].arXiv:1909.02222,2019. [18] YANG B,WANG L,WONG D F,et al.Convolutional self-attention networks[J].arXiv:1904.03107,2019. [19] FAN Z W,ZHANG M,LI Z H.BiLSTM-based implicit dis-course relation classification combining self-attention mechanism and syntactic information[J].Computer Science,2019,46(5):221-227. [20] WANG Y S,LEE H Y,CHEN Y N.Tree Transformer:Integrating Tree Structures into Self-Attention[J].arXiv:1909.06639,2019. [21] ZHAO H,ZHANG Y,LIU S,et al.PSAnet:point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283. [22] WANG X,TU Z,WANG L,et al.Self-attention with structural position representation[J].arXiv:1909.00383,2019. [23] MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one?[C]//Advances in Neural Information Processing Systems.2019:14014-14024. [24] VOITA E,TALBOT D,MOISEEV F,et al.Analyzing multi-head self-attention:specialized heads do the heavy lifting,the rest can be pruned[J].arXiv:1905.09418,2019. [25] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. |
[1] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[2] | 吴子仪, 李邵梅, 姜梦函, 张建朋. 基于自注意力模型的本体对齐方法 Ontology Alignment Method Based on Self-attention 计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190 |
[3] | 方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011 |
[4] | 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦. 基于双目叠加仿生的微换衣行人再识别 Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation 计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140 |
[5] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[6] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[7] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[8] | 董振恒, 任维平, 游新冬, 吕学强. 融入新能源领域术语知识的机器翻译方法 Machine Translation Method Integrating New Energy Terminology Knowledge 计算机科学, 2022, 49(6): 305-312. https://doi.org/10.11896/jsjkx.210500117 |
[9] | 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123 |
[10] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[11] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[12] | 宁秋怡, 史小静, 段湘煜, 张民. 基于风格感知的无监督领域适应算法 Unsupervised Domain Adaptation Based on Style Aware 计算机科学, 2022, 49(1): 271-278. https://doi.org/10.11896/jsjkx.201200094 |
[13] | 刘俊鹏, 苏劲松, 黄德根. 融合特定语言适配模块的多语言神经机器翻译 Incorporating Language-specific Adapter into Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 17-23. https://doi.org/10.11896/jsjkx.210900005 |
[14] | 于东, 谢婉莹, 谷舒豪, 冯洋. 基于语种关联度课程学习的多语言神经机器翻译 Similarity-based Curriculum Learning for Multilingual Neural Machine Translation 计算机科学, 2022, 49(1): 24-30. https://doi.org/10.11896/jsjkx.210800254 |
[15] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
|