Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 86-91.doi: 10.11896/jsjkx.200200003

• Artificial Intelligence • Previous Articles     Next Articles

New Machine Translation Model Based on Logarithmic Position Representation and Self-attention

JI Ming-xuan1, SONG Yu-rong2   

  1. 1 College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 College of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:JI Ming-xuan,born in 1995,postgradua-te.His main research interests include machine translation and emotion analysis.
    SONG Yu-rong,born in 1971,Ph.D,professor,is a member of China Computer Federation.Her main research interests include network information dissemination and its control.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61672298,61873326,61802155) and Key Research Projects of Philosophy and Social Sciences in Jiangsu Universities (2018SJZDI142).

Abstract: In the task of machine translation,self-attention mechanism has attracted widespread attention due to its highly parallelizable computing ability,which significantly reduces the training time of the model,and its ability to effectively capture the semantic relevance between all words in the context.However,unlike recurrent neural networks,the efficiency of self-attention mechanism stems from ignoring the position information between the words of the context.In order to make the model utilize the position information between the words,the machine translation model called Transformer,which is based on self-attention mechanism,represents the absolute position information of the words with sine function and cosine function.Although this method can reflect the relative distance,it lacks directionality.Therefore,based on the logarithmic position representation and self-attention mechanism,a new model of machine translation is proposed.This model not only inherits the efficiency of self-attention mechanism,but also retains distance and directionality between words.The results show that the new modelcan significantly improve the accuracy of machine translation compared with the traditional self-attention mechanism model and other models.

Key words: Logarithmic position representation, Machine translation, Position encoding, Position information, Self-attention

CLC Number: 

  • TP181
[1] SAK H,SENIOR A,BEAUFAVS F.Long short-term memory recurrent neural network architectures for large scale acoustic modeling[C]//Fifteenth Annual Conference of the International Speech Communication Association.2014.
[2] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[3] WU Y,SCHUSTER M,CHEN Z,et al.Google's neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
[4] JOZEFOWICZ R,VINYALS O,MIKE S M,et al.Exploring the limits of language modeling[J].arXiv:1602.02410,2016.
[5] NIE Y,HAN Y,HUANG J,et al.Attention-based encoder-decoder model for answer selection in question answering[J].Frontiers of Information Technology & Electronic Engineering,2017,18(4):535-544.
[6] SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[J].arXiv:1701.06538,2017.
[7] CAO J,LI R.Fixed-time synchronization of delayed memristor-based recurrent neural networks[J].Science China Information Sciences,2017,60(3):108-122.
[8] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[9] GEHRING J,AULI M,GRANGIER D,et al.Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70.JMLR.org,2017:1243-1252.
[10] CHO K,VAN M B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[J].arXiv:1406.1078,2014.
[11] BAEVSKI A,EDUNOV S,LIU Y,et al.Cloze-driven pretraining of self-attention networks[J].arXiv:1903.07785,2019.
[12] SHEN T,ZHOU T,LONG G,et al.Disan:directional self-attention network for rnn/cnn-free language understanding[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[13] XU D,RUAN C,KORPEOGLU E,et al.Self-attention with functional time representation learning[C]//Advances in Neural Information Processing Systems.2019:15889-15899.
[14] LIANG X B,REN F L,LIU Y K,et al.N-reader:machine reading comprehension model based on double layers of self-attention[J].Journal of Chinese Information Processing,2018,32(10):134-141.
[15] HAO J,WANG X,SHI S,et al.Towards better modeling hierarchical structure for self-attention with ordered neurons[J].arXiv:1909.01562,2019
[16] SHENY,TAN S,SORDONI A,et al.Ordered neurons:integrating tree structures into recurrent neural networks[J].arXiv:1810.09536,2018.
[17] HAO J,WANG X,SHI S,et al.Multi-granularity self-attention for neural machine translation[J].arXiv:1909.02222,2019.
[18] YANG B,WANG L,WONG D F,et al.Convolutional self-attention networks[J].arXiv:1904.03107,2019.
[19] FAN Z W,ZHANG M,LI Z H.BiLSTM-based implicit dis-course relation classification combining self-attention mechanism and syntactic information[J].Computer Science,2019,46(5):221-227.
[20] WANG Y S,LEE H Y,CHEN Y N.Tree Transformer:Integrating Tree Structures into Self-Attention[J].arXiv:1909.06639,2019.
[21] ZHAO H,ZHANG Y,LIU S,et al.PSAnet:point-wise spatial attention network for scene parsing[C]//Proceedings of the European Conference on Computer Vision (ECCV).2018:267-283.
[22] WANG X,TU Z,WANG L,et al.Self-attention with structural position representation[J].arXiv:1909.00383,2019.
[23] MICHEL P,LEVY O,NEUBIG G.Are sixteen heads really better than one?[C]//Advances in Neural Information Processing Systems.2019:14014-14024.
[24] VOITA E,TALBOT D,MOISEEV F,et al.Analyzing multi-head self-attention:specialized heads do the heavy lifting,the rest can be pruned[J].arXiv:1905.09418,2019.
[25] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[3] FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.
[4] CHEN Kun-feng, PAN Zhi-song, WANG Jia-bao, SHI Lei, ZHANG Jin. Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation [J]. Computer Science, 2022, 49(8): 165-171.
[5] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[6] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[7] DONG Zhen-heng, REN Wei-ping, YOU Xin-dong, LYU Xue-qiang. Machine Translation Method Integrating New Energy Terminology Knowledge [J]. Computer Science, 2022, 49(6): 305-312.
[8] ZHAO Dan-dan, HUANG De-gen, MENG Jia-na, DONG Yu, ZHANG Pan. Chinese Entity Relations Classification Based on BERT-GRU-ATT [J]. Computer Science, 2022, 49(6): 319-325.
[9] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[10] HU Yan-li, TONG Tan-qian, ZHANG Xiao-yu, PENG Juan. Self-attention-based BGRU and CNN for Sentiment Analysis [J]. Computer Science, 2022, 49(1): 252-258.
[11] NING Qiu-yi, SHI Xiao-jing, DUAN Xiang-yu, ZHANG Min. Unsupervised Domain Adaptation Based on Style Aware [J]. Computer Science, 2022, 49(1): 271-278.
[12] LIU Jun-peng, SU Jin-song, HUANG De-gen. Incorporating Language-specific Adapter into Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 17-23.
[13] YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[14] HOU Hong-xu, SUN Shuo, WU Nier. Survey of Mongolian-Chinese Neural Machine Translation [J]. Computer Science, 2022, 49(1): 31-40.
[15] LIU Yan, XIONG De-yi. Construction Method of Parallel Corpus for Minority Language Machine Translation [J]. Computer Science, 2022, 49(1): 41-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!