计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 144-149.doi: 10.11896/jsjkx.210500205

• 智能计算 • 上一篇    下一篇

基于不平衡数据与集成学习的属性级情感分类

林夕, 陈孜卓, 王中卿   

  1. 苏州大学计算机科学与技术学院 江苏 苏州 215006
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 王中卿(wangzq@suda.edu.cn)
  • 作者简介:(linxi350904583@foxmail.com)

Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning

LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing   

  1. School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:LIN Xi,born in 2000.His main research interests include natural language processing and so on.
    WANG Zhong-qing,born in 1987,Ph.D,is a member of China Computer Federation.His main research interests include natural language processing and sentiment analysis.

摘要: 情感分类一直是自然语言处理领域的重要研究部分。该任务一般是将带有情感色彩的样本分类成正类和负类两种类别。在很多理论模型中,都假设正负类数据样本是平衡的,而在现实中正负类样本一般是不平衡的。提出一种基于属性级的LSTM集成学习的方法,针对不平衡样本数据进行属性级情感分类。首先,对数据集进行欠采样处理,将其分成多组;其次,为每组数据分配一种分类算法进行训练;最后,将多组模型融合,得到最终分类结果。一系列的实验结果显示,基于属性级的LSTM集成学习的方法明显提高了分类的准确性,其性能优于传统的LSTM模型分类方法。

关键词: LSTM, 不平衡数据, 集成学习, 情感分类, 属性词

Abstract: Sentiment classification remains an important part of the field of natural language processing.The general task is to classify the emotional data into two categories,which is positive and negative.In many models,it is assumed that the positive and negative data are balanced.Contrarily,the two class of data are always imbalanced in reality.This paper proposes an ensemble learning model based on aspect-levelLSTM to process aspect-level problem.Firstly,the data sets are under-sampled and divided into multiple groups.Secondly,a classification algorithm is assigned to each group of data for training.Finally,it yields the classification result through joining all models.The experimental results show that the ensemble learning model based on aspect-level LSTM significantly improves the accuracy of classification,and its performance is better than the traditional LSTM model.

Key words: Aspect word, Ensemble learning, Imbalanced data, LSTM, Sentiment classification

中图分类号: 

  • TP391
[1] ZHAO Y Y,QIN B,LIU T.Text sentiment analysis[J].Journal of Software,2010,21(8):1834-1848.
[2] BARANDELA R,SANCHEZ B J S,GARCIA V,et al.Strategies for learning in class imbalance problems[J].Pattern Recognition,2003,36(3):849-851.
[3] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[4] TANG D,QIN B,FENG X,et al.Effective LSTMs for Target-Dependent Sentiment Classification[J].arXiv:1512.01100,2015.
[5] XU F,PAN Z,XIA R.E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework[J].Information Processing & Management,2020,57(5):102221.
[6] MULLEN T,COLLIER N.Sentiment analysis using supportvector machines with diverseinformation sources[C]//Procee-dings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:412-418.
[7] XIE X,GE S,HU F,et al.An improved algorithm for sentiment analysis based on maximum entropy[J].Soft Computing,2019,23(2):599-611.
[8] PANG B,LEE L,VAITHYANATHAN S.Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//2002 Conference on Empirical Methods in Natural Language Processing.2002:79-86.
[9] JAYANAG B,VINEELA K,VASAVI S.Feature Subsumption for Sentiment Classification of Dynamic Data in Social Networks using SCDDF[J].International Journal of Advanced Computer Science and Applications,2012,3(9):1575-1605.
[10] GRAVES A.Supervised sequence labelling with recurrent neural networks [M].Berlin:Springer,2012.
[11] LONG F,ZHOU K,OU W.Sentiment analysis of text based on bidirectional LSTM with multi-head attention[J].IEEE Access,2019,7:141960-141969
[12] WANG Y,HUANG M,ZHU X,et al.Attention-based LSTM for Aspect-level Sentiment Classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016.
[13] WU Z,ONG D C.Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis[J].arXiv:2010.07523,2020.
[14] JIANG N,TIAN F,LI J,et al.MAN:Mutual Attention Neural Networks Model for Aspect-Level Sentiment Classification in SIoT[J].IEEE Internet of Things Journal,2020,7(4):2901-2913.
[15] WANG Z H,WANG Z Q,LI S S,et al.Feature Selection for Imbalanced Sentiment Classification[J].Journal of Chinese Information Processing,2013,27(4):113-119.
[16] YE F,JIANG Y S.Unbalanced classification method based on clustering and under-sampling[J].Computer Application and Software,2020,37(1):298-303.
[17] LIN W C.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409-410:17-26.
[18] LIU X Y,WU J,ZHOU Z H.Exploratory Undersampling for Class-Imbalance Learning[J].IEEE Transactions on Systems Man & Cybernetics Part B,2009,39(2):539-550.
[19] KITTLER J,HATEF M.On combining classifiers[J].IEEETransactions on Pattern Analysis & Machine Intelligence,1998,20(3):226-239.
[20] LI J,LUONG M T,JURAFSKY D,et al.When Are Tree Structures Necessary for Deep Learning of Representations?[C]//The 2015 Conference on Empirical Methods in Natural Language Processing.2015:2304-2314.
[21] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473,2014.
[1] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[2] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[3] 于家畦, 康晓东, 白程程, 刘汉卿.
一种新的中文电子病历文本检索模型
New Text Retrieval Model of Chinese Electronic Medical Records
计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198
[4] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[5] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[6] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[7] 董奇达, 王喆, 吴松洋.
结合注意力机制与几何信息的特征融合框架
Feature Fusion Framework Combining Attention Mechanism and Geometric Information
计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[8] 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁.
融合双重权重机制和图卷积神经网络的微博细粒度情感分类
Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network
计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[9] 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松.
基于交互注意力图卷积网络的方面情感分类
Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification
计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[10] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[11] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[12] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[13] 袁景凌, 丁远远, 盛德明, 李琳.
基于视觉方面注意力的图像文本情感分析模型
Image-Text Sentiment Analysis Model Based on Visual Aspect Attention
计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074
[14] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[15] 郑建华, 李小敏, 刘双印, 李迪.
融合级联上采样与下采样的改进随机森林不平衡数据分类算法
Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling
计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!