计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 178-183.doi: 10.11896/jsjkx.190600149
张鹏飞, 李冠宇, 贾彩燕
ZHANG Peng-fei, LI Guan-yu, JIA Cai-yan
摘要: 在自然语言理解任务中,注意力机制由于可以有效捕获词在上下文语境中的重要程度并提高自然语言理解任务的有效性而受到了人们的普遍关注。基于注意力机制的非递归深度网络Transformer,不仅以非常少的参数和训练时间取得了机器翻译学习任务的最优性能,还在自然语言推理(Gaussian-Transformer)、词表示学习(Bert)等任务中取得了令人瞩目的成绩。目前Gaussian-Transformer已成为自然语言推理任务性能最好的方法之一。然而,在Transformer中引入Gaussian先验分布对词的位置信息进行编码,虽然可以大大提升邻近词的重要程度,但由于Gaussian分布中非邻近词的重要性会快速趋向于0,对当前词的表示有重要作用的非邻近词的影响会随着距离的加深消失殆尽。因此,文中面向自然语言推理任务,提出了一种基于截断高斯距离分布的自注意力机制,该方法不仅可以凸显邻近词的重要性,还可以保留对当前词表示具有重要作用的非邻近词的信息。在自然语言推理基准数据集SNLI和MultiNLI上的实验结果证实,截断高斯距离分布自注意力机制能够更有效地提取句子中词语的相对位置信息。
中图分类号:
[1]HERMANN K M,KOCISKY T,GREFENSTETTE E,et al.Teaching machines to read and comprehend[J].Neural Information Processing Systems,2015:1693-1701. [2]DU X,SHAO J,CARDIE C,et al.Learning to Ask:NeuralQuestion Generation for Reading Comprehension[C]//Meeting of the Association for Computational Linguistics.2017:1342-1352. [3]LAN W,XU W.Neural Network Models for Paraphrase Identification,Semantic Textual Similarity,Natural Language Infe-rence,and Question Answering[C]//International Conference on Computational Linguistics.2018:3890-3902. [4]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Neural Information Processing Systems.2017:5998-6008. [5]HE K,ZHANG X,REN S,et al.Deep Residual Learning for Image Recognition[C]//Computer Vision and Pattern Recognition.2016:770-778. [6]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofDeep Bidirectional Transformers for Language Understanding[C]//North American Chapter of the Association for Computational Linguistics.2019:4171-4186. [7]SHEN T,ZHOU T,LONG G,et al.DiSAN:Directional Self-Attention Network for RNN/CNN-Free Language Understanding[C]//National Conference on Artificial Intelligence.2018:5446-5455. [8]BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Empirical Methods in Natural Language Processing.2015:632-642. [9]IM J,CHO S.Distance-based Self-Attention Network for Natural Language Inference[J].arXiv:1712.02047,2017. [10]GUO M,ZHANG Y,LIU T,et al.Gaussian Transformer:aLightweight Approach for Natural Language Inference[C]//National Conference on Artificial Intelligence.2019:6489-6496. [11]KLAMBAUER G,UNTERTHINER T,MAYR A,et al.SelfNormalizing Neural Networks[C]//Neural Information Processing Systems.2017:971-980. [12]PENNINGTON J,SOCHER R,MANNING C D,et al.Glove:Global Vectors for Word Representation[C]//Empirical Me-thods in Natural Language Processing.2014:1532-1543. [13]MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781. [14]BOJANOWSKI P,GRAVE E,JOULIN A,et al.EnrichingWord Vectors with Subword Information[J].Transactions of the Association for Computational Linguistics,2017,5(1):135-146. [15]CHEN Q,ZHU X,LING Z,et al.Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference[C]//Workshop on Evaluating Vector Space Representations for Nlp.2017:36-40. [16]WILLIAMS A,NANGIA N,BOWMAN S R,et al.A BroadCoverage Challenge Corpus for Sentence Understanding through Inference [C]//North American Chapter of the Association for Computational Linguistics.2018:1112-1122. [17]KINGMA D P,BA J.Adam:A Method for Stochastic Optimization[J].arXiv:1412.6980,2014. [18]ZEILER M D.ADADELTA:An Adaptive Learning Rate Method[J].arXiv:1212.5701,2012. [19]SRIVASTAVA N,HINTON G E,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958. [20]ABADI M,AGARWAL A,BARHAM P,et al.TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems[J].arXiv:1603.04467,2016. |
[1] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[2] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[3] | 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123 |
[4] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[5] | 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法 Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context 计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067 |
[6] | 王习, 张凯, 李军辉, 孔芳, 张熠天. 联合自注意力和循环网络的图像标题生成 Generation of Image Caption of Joint Self-attention and Recurrent Neural Network 计算机科学, 2021, 48(4): 157-163. https://doi.org/10.11896/jsjkx.200300146 |
[7] | 周小诗, 张梓葳, 文娟. 基于神经网络机器翻译的自然语言信息隐藏 Natural Language Steganography Based on Neural Machine Translation 计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015 |
[8] | 康雁,崔国荣,李浩,杨其越,李晋源,王沛尧. 融合自注意力机制和多路金字塔卷积的软件需求聚类算法 Software Requirements Clustering Algorithm Based on Self-attention Mechanism and Multi- channel Pyramid Convolution 计算机科学, 2020, 47(3): 48-53. https://doi.org/10.11896/jsjkx.190700146 |
[9] | 张义杰, 李培峰, 朱巧明. 基于自注意力机制的事件时序关系分类方法 Event Temporal Relation Classification Method Based on Self-attention Mechanism 计算机科学, 2019, 46(8): 244-248. https://doi.org/10.11896/j.issn.1002-137X.2019.08.040 |
[10] | 凡子威, 张民, 李正华. 基于BiLSTM并结合自注意力机制和句法信息的隐式篇章关系分类 BiLSTM-based Implicit Discourse Relation Classification Combining Self-attention Mechanism and Syntactic Information 计算机科学, 2019, 46(5): 214-220. https://doi.org/10.11896/j.issn.1002-137X.2019.05.033 |
|