基于双向注意力机制和门控图卷积网络的文本分类方法

doi:10.11896/jsjkx.211100095

摘要/Abstract

摘要： 现有基于图卷积网络的文本分类模型通常只是通过邻接矩阵简单地融合不同阶的邻域信息来更新节点表示,导致节点的词义信息表达不够充分。此外,基于常规注意力机制的模型只是对单词向量进行正向加权表示,忽略了产生消极作用的单词对最终分类的影响。为了解决上述问题,文中提出了一种基于双向注意力机制和门控图卷积网络的模型。该模型首先利用门控图卷积网络有选择地融合图中节点的多阶邻域信息,保留了之前阶的信息,以此丰富节点的特征表示;其次通过双向注意力机制学习不同单词对分类结果的影响,在给予对分类起积极作用的单词正向权重的同时,对产生消极作用的单词给予负向权重以削弱其在向量表示中的影响,从而提升模型对文档中不同性质节点的甄别能力;最后通过最大池化和平均池化融合单词的向量表示,得到文档表示用于最终分类。在4个基准数据集上进行了实验,结果表明,该方法明显优于基线模型。

关键词: 文本分类, 图卷积网络, 注意力机制, 文本表示, 深度学习, 自然语言处理

Abstract: Existing text classification models based on graph convolutional networks usually simply fuse the neighborhood information of different orders through the adjacency matrix to update the representation of node in graph,resulting in insufficientrepresentation of the word sense information of the nodes.In addition,the model based on conventional attention mechanism only provides a positive weighted representation of the word embedding,ignoring the impact of words that produce negative effects on the final classification.To overcome the above problems,a model based on bidirectional attention mechanism and gated graph convolutional networks is proposed in the paper.Firstly,the model uses the gated graph convolutional networks to selectively fuse the multi-order neighborhood information of nodes in the graph,retaining the information of previous orders,to enrich the feature representation of nodes in graph.Secondly,the model learns the influence of different words on text classification results by the bidirectional attention mechanism,giving positive weights to words with positive effects on the classification and negative weights to words with negative effects to weaken their influence in the vector representation,to improve the model's ability to distinguish nodes with different properties in the document.Finally,the maximum pooling and average pooling are used to fuse the word representation in text to get the document representation for the final classification,where the average pooling can make each word play a role in generating a graph-level representation of the document and the maximum pooling can make the important words play a greater role in document embedding.Extensive experiments on four benchmark datasets show that the proposed model significantly outperforms the baseline model.

Key words: Text classification, Graph convolutional networks, Attention mechanism, Text representation, Deep learning, Natural language processing

中图分类号:

TP391

郑诚, 梅亮, 赵伊研, 张苏航. 基于双向注意力机制和门控图卷积网络的文本分类方法[J]. 计算机科学, 2023, 50(1): 221-228. https://doi.org/10.11896/jsjkx.211100095

ZHENG Cheng, MEI Liang, ZHAO Yiyan, ZHANG Suhang. Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks[J]. Computer Science, 2023, 50(1): 221-228. https://doi.org/10.11896/jsjkx.211100095

参考文献

[1]WANG Q,GARRITY G M,TIEDJE J M,et al.Naive BayesianClassifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy[J].Applied and Environ-mental Microbiology,2007,73(16):5261-5267.
[2]FORMAN G.BNS Feature Scaling:An Improved Representation over TF-IDF for SVM Text Classification[C]//Proceedings of the 17th ACM Conference on Informationand Knowledge Management.New York:ACM,2008:263-270.
[3]TAN S.An Effective Refinement Strategy for KNN Text Classifier[J].Expert Systems with Applications,2006,30(2):290-298.
[4]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[C]//International Conference on Learning Representation.On Line:ICLR,2017:101-112.
[5]LUONG M T,PHAM H,MANNING C D.Effective Approaches to Attention-Based Neural Machine Translation[J].Proceeding of the 2015 Conference on Empirical Methods in Natural Language Processing,2015,28(2):1412-1421.
[6]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Empirical Method in Natural Language Processing.Stroudsburg:ACL,2014:1746-1751.
[7]ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networksfor text classification[C]//Conference and Workshop on Neural Information Processing Systems.Montreal:NIPS,2015:649-657.
[8]GRAVES A,JAITLY N,MOHAMED A.Hybrid Speech Recognition with Deep Bi-Directional LSTM[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understan-ding.New York:IEEE,2013:273-278.
[9]CHEN K J,LIU H.Chinese Text Classification Method Based on Improved BiGRU-CNN[J].Computer Engineering,2022,48(5):59-66,73.
[10]LIU P,QIU X,CHEN X,et al.Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2015:2326-2335.
[11]MNIH V,HEES N,GRAVS A.Recurrent Models of Visual Attention[C]//Advances in Neural Information Processing Systems.Cambridge:MIT Press,2014:2204-2212.
[12]BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].Computer Science,2014,15(3):152-161.
[13]ZHOU P,SHI W,TIAN J,et al.Attention-Based BidirectionalLong Short-term Memory Networks for Relation Classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:ACL,2016:207-212.
[14]YANG Z,YANG D,DYER C,et al.Hierarchical Attention Networks for Document Classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics.Washington:NAACL,2016:1480-1489.
[15]DING C H,XIA H B,LIU Y.Short Text Classification Model Combining Knowledge Graph and Attention Mechanism[J].Computer Engineering,2021,47(1):94-100.
[16]PENG H,LI J,HE Y,et al.Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN [C]//Proceedings of the 2018 World Wide Web Conference.New York:ACM,2018:1063-1072.
[17]YAO L,MAO C,LUO Y.Graph Convolutional Network forText Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2019:7370-7377.
[18]HUANG L,MA D,LI S,et al.Text Level Graph Neural Network for Text Classification [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2019:2216-2225.
[19]GILMER J,SCHOENHOLZS,RILEY P F,et al.Neural Mes-sage Passing for Quantum Chemistry[C]//International Conference on Machine Learning.New York:ACM,2017:1263-1272.
[20]YUAN Z Y,GAO S,CAO J,et al.Method for Few-Shot Short Text Classification Based on Heterogeneous Graph Convolu-tional Network[J].Computer Engineering,2021,47(12):87-94.
[21]PENNINGTON J,SOCHER R,MANNIG C D.Glove:Global Vector for Word Representation [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.Stroudsburg:ACL,2014:1532-1543.
[22]CER D,YANG Y,KONG S Y,et al.Universal Sentence Encoder[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.Stroudsburg:ACL,2018:1422-1433.
[23]TANG J,QU M,MEI Q.Pte:Predictive Text EmbeddingthroughLarge-Scale Heterogene-ous Text Network[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2015:1165-1174.
[24]JOULIN A,GRAVE E,BOJANOWKI P,et al.Bag of Tricks for Efficient Text Classification [C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:ACL,2017:427-431.
[25]SHEN D,WANG G,WANG W,et al.Baseline Needs MoreLove:On Simple Word-Embedding based Models and Associated Pooling Mechanisms[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:ACL,2018:440-450.
[26]WANG Z,WANG C,ZHANG H,et al.Learning Dynamic Hie-rarchical Topic Graph with Graph Convolutional Network for Document Classification[C]//International Conference on Artificial Intelligence and Statistics.BOSTON:JMLR,2020:3959-3969.
[27]ZHU H,KONIUSZ P.Simple Spectral Graph Convolution[C]//International Conference on Learning Representation.On Line:ICLR,2021:151-163.
[28]XIE Q,HUANG J,DU P,et al.Inductive Topic VariationalGraph Auto-Encoder for Text Classification[C]//Proceeding of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics.Washington:NAACL,2021:4218-4227.
[29]KINGMA D,BA J.Adam:A Method for Stochastic Optimization[J].Computer Science,2014,8(2):4123-4131.

相关文章 15

[1]	李雪辉, 张拥军, 史殿习, 徐化池, 史燕燕. 融合注意力特征的无锚框视觉目标跟踪方法 AFTM:Anchor-free Object Tracking Method with Attention Features 计算机科学, 2023, 50(1): 138-146. https://doi.org/10.11896/jsjkx.211000083
[2]	赵倩, 周冬明, 杨浩, 王长城. 残差注意力与多特征融合的图像去模糊 Image Deblurring Based on Residual Attention and Multi-feature Fusion 计算机科学, 2023, 50(1): 147-155. https://doi.org/10.11896/jsjkx.211100161
[3]	孙凯丽, 罗旭东, 罗有容. 预训练语言模型的应用综述 Survey of Applications of Pretrained Language Models 计算机科学, 2023, 50(1): 176-184. https://doi.org/10.11896/jsjkx.220800223
[4]	梁浩玮, 王石, 曹存根. 非完美多分类标签体系下的领域短文本分类方法研究 Study on Short Text Classification with Imperfect Labels 计算机科学, 2023, 50(1): 185-193. https://doi.org/10.11896/jsjkx.211100278
[5]	李小玲, 吴昊天, 周涛, 鲁辉. 一种基于强化学习的口令猜解模型 Password Guessing Model Based on Reinforcement Learning 计算机科学, 2023, 50(1): 334-341. https://doi.org/10.11896/jsjkx.211100001
[6]	蔡肖, 陈志华, 盛斌. 基于移位窗口金字塔Transformer的遥感图像目标检测 SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing 计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208
[7]	张婧媛, 王宏霞, 何沛松. 基于Transformer的多任务图像拼接篡改检测算法 Multitask Transformer-based Network for Image Splicing Manipulation Detection 计算机科学, 2023, 50(1): 114-122. https://doi.org/10.11896/jsjkx.211100269
[8]	王斌, 梁宇栋, 刘哲, 张超, 李德玉. 亮度自调节的无监督图像去雾与低光图像增强算法研究 Study on Unsupervised Image Dehazing and Low-light Image Enhancement Algorithms Based on Luminance Adjustment 计算机科学, 2023, 50(1): 123-130. https://doi.org/10.11896/jsjkx.211100058
[9]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[10]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[11]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[12]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[13]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[14]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[15]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed