计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 288-293.doi: 10.11896/jsjkx.211100016
钟桂凤1, 庞雄文2, 隋栋3
ZHONG Gui-feng1, PANG Xiong-wen2, SUI Dong3
摘要: 为了提高文本分类的准确性和运行效率,提出一种Word2Vec文本表征和改进注意力机制AlexNet-2的文本分类方法。首先,利用Word2Vec对文本词特征进行嵌入表示,并训练词向量,将文本表示成分布式向量的形式;然后,利用改进的AlexNet-2对长距离词相依性进行有效编码,同时对模型添加注意力机制,以高效学习目标词的上下文嵌入语义,并根据词向量的输入与最终预测结果的相关性,进行词权重的调整。实验在3个公开数据集中进行评估,分析了大量样本标注和少量样本标注的情形。实验结果表明,与已有的优秀方法相比,所提方法可以明显提高文本分类的性能和运行效率。
中图分类号:
[1] LUO X.Efficient English Text Classification Using SelectedMachine Learning Techniques[J].Alexandria Engineering Journal,2021,60(3):3401-3409. [2] AL-SALEMI B,AYOB M,NOAH S.Feature Ranking for Enhancing Boosting-based Multi-label Text Categorization[J].Expert Systems with Applications,2018,113(12):531-543. [3] JIN G,XU L.Long Text Classification Method,Device,Com-puter Equipment and Storage Medium Based on Word Bag Mo-del:CN Patent 110096591 A[P].2019. [4] XU A D,ZHAO Y K,ZHANG Y Q,et al.Text Classification Method Based on Heterogeneous Space and Multiple-Classifier Fusion[J].Journal of Sichuan Ordnance,2019,40(12):136-141. [5] AL-SALEMI B,AB-AZIZ M J,MOHD-NOAH S A.et al.LDA-AdaBoost.MH:Accelerated AdaBoost.MH Based on Latent Dirichlet Allocation for Text Categorization[J].Journal of Information Science,2015,41(1):27-40. [6] SHI Z S,DU Y,DU T,et al.The Turnout Abnormality Diagnosis Based on Semi-Supervised Learning Method[J].International Journal of Software Engineering and Knowledge Engineering,2020,30(7):961-976. [7] WU Y J,LI J,SONG C F,et al.High Utility Neural Networks for Text Classification[J].Acta Electronica Sinica,2020,48(2):279-284. [8] DU L,CAO D,LIN S Y,et al.Extraction and Automatic Classi-fication of TCM Medical Records Based on Attention Mechanism of BERT and Bi-LSTM[J].Computer Science,2020,47(S2):426-430. [9] YAO L,MAO C S,LUO Y.Graph Convolutional Networks for Text Classification[J].arXiv:1809.05679,2018. [10] RAGESH R,SELLAMANNICKAM S,IYER A,et al.HeteGCN:Heterogeneous Graph Convolutional Networks for Text Classification[C]//The Fourteenth ACM International Conference on Web Search and Data Mining.Virtual Event,Israel:ACM,2021:105-115. [11] KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[C]//International Conference on Learning Representations.Toulon,France:IEEE,2017:109-117. [12] XIAO L,CHEN B L,HUANG X,et al.Multi-label Text Classification Method Based on Label Semantic Information[J].Journal of Software,2020,31(4):1079-1089. [13] JANG B,KIM M,HARERIMANA G,et al.Bi-LSTM Model to Increase Accuracy in Text Classification:Combining Word2vec CNN and Attention Mechanism[J].Applied Sciences,2020,10(17):5841-5750. [14] LUO X,WANG X H.Research on Multi-feature Fusion Text Classification Model Based on Self-attention Mechanism[J].Journal of Physics:Conference Series,2020,1693(1):012071-012077. [15] TU N,THU H,NGUYEN V A.Language Model Combinedwith Word2Vec for Product’s Aspect Based Extraction[J].ICIC Express Letters,2020,14(11):1033-1040. [16] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[J].Advances in Neural Information Processing Systems,2012,25(2):1-9. [17] KRIZHEVSKY A,SUTSKEVER I,GEOFFREY E.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90. [18] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].Computer Science,2014,47(3):1-15. [19] LIU J Y,JIA B B.Combining One-vs-One Decomposition andInstance-Based Learning for Multi-Class Classification[J].IEEE Access,2020,85(8):499-507. [20] GAO C L,XU H,GAO K.Attention-based Bi-LSTM Network with Part-of-speech Features for Chinese Text Classification[J].Journal of Hebei University of Science and Technology,2018,39(5):73-80. [21] WANG T,LI M.Research on Comment Text Mining Based on LDA Model and Semantic Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2019,36(4):9-16. |
[1] | 周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085 |
[2] | 戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032 |
[3] | 周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026 |
[4] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[5] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[6] | 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[9] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[10] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[11] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[12] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[13] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[14] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[15] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
|