计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 251-257.doi: 10.11896/jsjkx.201200202

• 大数据&数据科学 • 上一篇    下一篇

基于用户兴趣词典和LSTM的个性化情感分类方法

王友卫1, 朱晨1, 朱建明1, 李洋1, 凤丽洲2, 刘江淳1   

  1. 1 中央财经大学信息学院 北京100081
    2 天津财经大学统计学院 天津300222
  • 出版日期:2021-11-10 发布日期:2021-11-12
  • 通讯作者: 朱晨(774651475@qq.com)
  • 作者简介:ywwang15@126.com
  • 基金资助:
    国家社科基金项目(18CTJ008);教育部人文社科项目(19YJCZH178);国家自然科学基金项目(61906220);天津市自然科学基金项目(18JCQNJC69600);内蒙古纪检监察大数据实验室2020-2021年度开放课题(IMDBD202002,IMDBD202004)

User Interest Dictionary and LSTM Based Method for Personalized Emotion Classification

WANG You-wei1, ZHU Chen1, ZHU Jian-ming1, LI Yang1, FENG Li-zhou2, LIU Jiang-chun1   

  1. 1 School of Information,Central University of Finance and Economics,Beijing 100081,China
    2 School of Statistics,Tianjin University of Finance and Economics,Tianjin 300222,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:WANG You-wei,born in 1987,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include machine lear-ning and data mining.
    ZHU Chen,born in 1992,postgraduate.Her main research interests include data mining and natural language proces-sing.
  • Supported by:
    National Social Science Foundation of China(18CTJ008),Ministry of Education of Humanities and Social Science Project(19YJCZH178),National Natural Science Foundation of China(61906220),Natural Science Foundation of Tianjin Province(18JCQNJC69600) and Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory 2020-2021 Open Project(IMDBD202002,IMDBD202004).

摘要: 微博是一个可以分享生活、发表看法、发泄情感的社交平台,由于数据量大且易于获取,微博数据已被广泛用于网络用户情感分析。传统对微博进行情感预测的研究没有考虑用户的用词喜好、语言风格等个性化因素的影响,使得情感分类结果的准确性不高。首先通过分析用户兴趣特征构建用户兴趣词典,在此基础上提出基于用户兴趣词典的情感分类模型;然后利用长短期记忆网络(Long Short-Term Memory,LSTM)分类准确性高的特点训练一个通用的LSTM分类模型;最后利用支持向量机融合不同模型以得到最终的情感分类结果。实验结果表明,与支持向量机、朴素贝叶斯等传统分类器相比,基于用户兴趣词典与LSTM的个性化情感分类方法在分类精度上有较大提升;与LSTM、循环神经网络等深度学习方法相比,该方法在保证运行效率的前提下能获得更高的分类精度。

关键词: LSTM模型, 情感分类, 用户兴趣词典, 支持向量机

Abstract: Microblog is a social platform that people can share life,express opinions and vent emotions.Due to the large amount of data and easy access,the Microblog data has been widely used in emotion prediction for the web users.The traditional research on emotion classification of Microblog simply stays on the meaning of words,without considering the influence from the individuation of each person's language preference and style,which results a lower accuracy of the emotion classification.Firstly,this paper constructs a user interest dictionary by analyzing user interest characteristics and proposes a user interest dictionary basedemotion classification model.Secondly,by using the advantage of high classification accuracy of Long Short-Term Memory (LSTM),this paper trains a common LSTM based classification model.Finally,this paper fuses different models by using Support Vector Machine to obtain the final emotion classification results.The experimental results show that,compared with traditional classifiers such as SVM and Naive Bayesian,the personalized emotion classification method based on user interest dictionary and LSTM has a great improvement on classification accuracy.Compared with typical deep learning methods like LSTM andRecurrent Neural Network,the proposed method can obtain higher classification accuracy while ensuring the execution efficiency.

Key words: Emotion classification, LSTM model, Support vector machine, User interest dictionary

中图分类号: 

  • TP301.6
[1]WU X,ZHUO S.Chinese text sentiment analysis utilizing emotion degree lexicon and fuzzy semantic model[J].International Journal of Software Science and Computational Intelligence(IJSSCI),2014,6(4):20-32.
[2]ZHANG D.Research on improved method of constructing Chinese emotional dictionary[C]//International Conference on Computer Science and Network Technology (ICCSNT).2016:396-400.
[3]GE N L,FAN J J.Comment sentiment analysis based on naive Bayes and support vector machine[J].Computer & Digital Engineering,2020,48(7):1700-1704.
[4]PANG B,LEE L,VAITHYANATHAN S.Thumbs up:sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing.2002:79-86.
[5]GO A,BHAYANI R,HUANG L.Twitter sentiment classification using distant supervision[J].CS224N Project Report,Stanford,2009,1(12):2009.
[6]MERTIYA M,SINGH A.Combining naive bayes and adjective analysis for sentiment detection on Twitter[C]//2016 International Conference on Inventive Computation Technologies (ICICT).2016:1-6.
[7]ZHU J,LIU J Y,ZHANG T F,et al.Emotion polarity classification method based on sentiment dictionary and ensemble learning[J].Journal of Computer Applications,2018,38(S1):95-98.
[8]CAO Y,XU R.Combining convolutional neural network andsupport vector machine for sentiment classification [C]//Chinese National Conference on Social Medial Processing.Springer,Singapore,2015:144-155.
[9]DUAN Y X.Weibo sentiment classification method based onLSTM_CNNS sentiment enhancement model[J].Journal of University of Science and Technology Beijng,2019,34(6):1-6.
[10]YANG C,SONG X N,SONG W.SentiBERT:Pretraining lan-guage model combined with emotional information[J].Journal of Frontiers of Computer Science & Technology,2020,14(9):1563-1570.
[11]WANG M W,HONG H,JIANG A W,et al.Informationretrieval graph model based on word importance[J].Chinese Information,2016,30(4):134-141.
[12]KONG C Y,YU J.Sentiment analysis of real estate agency reviews combining semantic rules and sentiment dictionary[J].Information Technology And Informatization,2020(4):20-24.
[13]LI X,XIE H,LI L J.Study on the calculation of sentence semantic similarity based on Word2vec[J].Computer Science,2017,44(9):256-260.
[14]SHARMA A K,CHAURASIA S,SRIVASTAVA D K.Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec[J].Procedia Computer Ence,2020,167:1139-1147.
[15]LI W,QI F,TANG M,et al.Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification[J].Neurocomputing,2020,387:63-77.
[16]LÜ W,LI Z,CHU J.Adaptive Ensemble Undersampling-Boost:A Novel Learning Framework for Imbalanced Data[J].Journal of Systems & Software,2017,132(10):272-282.
[1] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[2] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[3] 单晓英, 任迎春.
基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别
Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm
计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216
[4] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[5] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[6] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[7] 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁.
融合双重权重机制和图卷积神经网络的微博细粒度情感分类
Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network
计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[8] 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松.
基于交互注意力图卷积网络的方面情感分类
Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification
计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[9] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[10] 郭福民, 张华, 胡瑢华, 宋岩.
一种基于表面肌电信号的腕部肌力估计方法研究
Study on Method for Estimating Wrist Muscle Force Based on Surface EMG Signals
计算机科学, 2021, 48(6A): 317-320. https://doi.org/10.11896/jsjkx.200600021
[11] 霍帅, 庞春江.
基于Transformer和多通道卷积神经网络的情感分析研究
Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network
计算机科学, 2021, 48(6A): 349-356. https://doi.org/10.11896/jsjkx.200800004
[12] 卓雅倩, 欧博.
噪声环境下的人脸防伪识别算法研究
Face Anti-spoofing Algorithm for Noisy Environment
计算机科学, 2021, 48(6A): 443-447. https://doi.org/10.11896/jsjkx.200900207
[13] 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇.
基于整车EMC标准测试和机器学习的反向诊断方法
Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning
计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204
[14] 陈千, 车苗苗, 郭鑫, 王素格.
一种循环卷积注意力模型的文本情感分类方法
Recurrent Convolution Attention Model for Sentiment Classification
计算机科学, 2021, 48(2): 245-249. https://doi.org/10.11896/jsjkx.200100078
[15] 曹素娥, 杨泽民.
基于聚类分析算法和优化支持向量机的无线网络流量预测
Prediction of Wireless Network Traffic Based on Clustering Analysis and Optimized Support Vector Machine
计算机科学, 2020, 47(8): 319-322. https://doi.org/10.11896/jsjkx.190800075
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!