用于短文本分类的DC-BiGRU_CNN模型

doi:10.11896/jsjkx.180901702

摘要/Abstract

摘要： 文本分类是自然语言处理中一项比较基础的任务,如今深度学习技术被广泛用于处理文本分类任务。在处理文本序列时,卷积神经网络可以提取局部特征,循环神经网络可以提取全局特征,它们都表现出了不错的效果。但是,卷积神经网络不能很好地捕获文本的上下文相关语义信息,循环神经网路对语义的关键信息不敏感。另外,利用更深层次的网络虽然可以更好地提取特征,但是容易产生梯度消失或梯度爆炸问题。针对以上问题,文中提出了一种基于密集连接循环门控单元卷积网络的混合模型(DC-BiGRU_CNN)。该模型首先用一个标准的卷积神经网络训练出字符级词向量,然后将其与词级词向量进行拼接并作为网络输入层。受密集连接卷积网络的启发,在对文本进行高级语义建模阶段时,采用文中提出的密集连接双向门控循环单元,其可以弥补梯度消失或梯度爆炸的缺陷,并且加强了每一层特征之间的传递,实现了特征复用;对前面提取的深层高级语义表示进行卷积和池化操作以获得最终的语义特征表示,再将其输入到softmax层,实现对文本的分类。在多个公开数据集上的研究结果表明,DC-BiGRU_CNN模型在执行文本分类任务时准确率有显著提升。此外,通过实验分析了模型的不同部件对性能提升的作用,研究了句子的最大长度值、网络的层数、卷积核的大小等参数对模型效果的影响。

关键词: 卷积神经网络, 密集连接, 双向门控循环单元, 文本分类, 字符级词向量

Abstract: Text classification is a basic task in natural language processing.Nowadays,it is more and more popular to use deep learning technology to deal with text classification tasks.When processing text sequences,convolutional neural networks can extract local features,and recurrent neural networks can extract global features,all of which show good effect.However,convolutional neural networks can not capture the context-related semantic information of text very well,and recurrent networks are not sensitive to the key semantic information.In addition,although deeper networks can better extract features,they are prone to gradient disappearance or gradient explosion.To solve these problems,this paper proposed a hybrid model based on densely connected gated recurrent unit convolutional networks (DC-BiGRU_CNN).Firstly,a standard convolutional neural network is used to train the character-level word vector,and then the character-level word vector is spliced with the word-level word vector to form the network input layer.Inspired by the densely connected convolutional network,a proposed densely connected bidirectional gated recurrent unit is used in the stage of high-level semantic modeling of text,which can alleviate the defect of gradient disappearance or gradient explosion and enhance the transfer between features of each layer,thus achieving feature reuse.Next,the convolution and pooling operation are conducted for the deep high-level semantic representation to obtain the final semantic feature representation,which is then input to the softmax layer to complete text classification task.The experimental results on several public datasets show that DC-BiGRU_CNN has a significant performance improvement in terms of the accuracy for text classification tasks.In addition,this paper analyzed the effect of different components of the model onperfor-mance improvement,and studied the effect of parameters such as the maximum length of sentence,the number of layers ofthe network and the size of the convolution kernel on the model.

Key words: Bi-directional gated recurrent unit, Character-level word vector, Convolutional neural network, Dense connection, Text classification

中图分类号:

TP391.1

郑诚, 薛满意, 洪彤彤, 宋飞豹. 用于短文本分类的DC-BiGRU_CNN模型[J]. 计算机科学, 2019, 46(11): 186-192. https://doi.org/10.11896/jsjkx.180901702

ZHENG Cheng, XUE Man-yi, HONG Tong-tong, SONG Fei-bao. DC-BiGRU_CNN Model for Short-text Classification[J]. Computer Science, 2019, 46(11): 186-192. https://doi.org/10.11896/jsjkx.180901702

参考文献

[1]JOACHIMS T.Text categorization with Support Vector Machines:Learning with many relevant features[C]∥European Conference on Machine Learning.Berlin:Springer,1998:137-142.
[2]CHEN Z,SHI G,WANG X.Text Classification Based on Naive Bayes Algorithm with Feature Selection[J].International Journal on Information,2012,15(10):4255-4260.
[3]VRIES A D,MAMOULIS N,NES N,et al.Efficient KNNsearch on vertically decomposed data[C]∥Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.Madiso:ACM Press,2002:322-333.
[4]TOMAS M,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]∥In Advances in Neural Information Processing Systems.2013:3111-3119.
[5]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]∥Empirical Methods in Natural Language Processing (EMNLP).2014:1532-1543.
[6]KIM Y.Convolutional Neural Networks for Sentence Classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1746-1751.
[7]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[C]∥Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:655-665.
[8]ZHANG X,ZHAO J B,LECUN Y.Character-level convolutionalnetworks for text classification[C]∥Proceedings of the International Conference on Neural Information Processing Systems.Montreal,2015:649-657.
[9]LIU L F,YANG L,ZHANG S W,et al.Convolutional Neural Networks for Chinese Micro-blog Sentiment Analysis[J].Journal of Chinese Information Processing,2015,29(6):141-149.(in Chinese)
刘龙飞,杨亮,张绍武,等.基于卷积神经网络的微博情感倾向性分析[J].中文信息学报,2015,29(6):141-149.
[10]SANTOS C,GATTI M.Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts[C]∥Proc ofInternationalConference on Computational Linguistics.2014:69-78.
[11]ZHANG Y,CHEN G G,YU D,et al.Highway long short-term memory RNNS for distant speech recognition[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2016:5755-5759.
[12]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]∥Proceedings of the Conference on the North American Chapter of the Association for Computational Linguistics.Human Language Technologies,2016:1480-1489.
[13]NIE Y,BANSAL M.Shortcut-Stacked Sentence Encoders forMulti-Domain Inference[C]∥The Workshop on Evaluating Vector Space Representations for Nlp.2017:41-45.
[14]QIAN Q,HUANG M,LEI J H,et al.Linguistically regularized LSTMs for Sentiment Classification[C]∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Canada:ACL,2017:1679-1689.
[15]ZHOU P,QI Z,ZHENG S,et al.Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling[C]∥Proceedings of COLING 2016,the 26th Inter-national Conference on Computational Linguistics:Technical Papers.2016:3485-3495.
[16]JOHNSON R,ZHANG T.Deep Pyramid Convolutional Neural Networks for Text Categorization[C]∥Meeting of the Association for Computational Linguistics.2017:562-570.
[17]CONNEAU A,SCHWENK H,BARRAULT L,et al.VeryDeep Convolutional Networks for Text Classification[C]∥Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.2016:1107-1116.
[18]HUANG G,LIU Z,MAATEN V D L,et al.Densely connected convolutional networks[C]∥In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Hawaii,USA:IEEE,2017:2261-2269.
[19] YIN W,SCHUTZE H.Multichannel variable-size convolution for sentence classification[C]∥Proceedings of the Conference on Natural Language Learning (CoNLL).2015:204-214.
[20]TAI K S,SOCHER R,MANNING C D.Improved semantic representations from tree-structured long short-term memory networks[C]∥Annual Meeting of the Association for Computational Linguistics(ACL 2015).Beijing,China,2015:1556-1566. [21]LIU P,QIU X,HUANG X.Recurrent neural network for text classification with multi-task learning[C]∥International Joint Conference on Artificial Intelligence.AAAI Press,2016:2873-2879.
[22]ZHOU C,SUN C,LIU Z,et al.A C-LSTM Neural Network for Text Classification [J].Computer Science,2015,1(4):39-44.
[23]ZHANG R,LEE H,RADEV D.Dependency sensitive convolutional neural networks for modeling sentences and documents[C]∥Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics.Stroudsburg,PA:ACL,2016:1512-1521.
[24]WANG C L,JIANG F J,YANG H X.A hybrid framework for text modeling with convolutional rnn[C]∥Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:2061-2069.

相关文章 15

[1]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[4]	武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[5]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[6]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[7]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[9]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[10]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[11]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[12]	刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[13]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14]	邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[15]	杨玥, 冯涛, 梁虹, 杨扬. 融合交叉注意力机制的图像任意风格迁移 Image Arbitrary Style Transfer via Criss-cross Attention 计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed