计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 231-236.doi: 10.11896/jsjkx.181102130

• 人工智能 • 上一篇    下一篇

采用改进粒子群优化的SVM方法实现中文文本情感分类

王立志1,慕晓冬1,刘宏岚2   

  1. (火箭军工程大学信息工程系 西安710025)1;
    (北京科技大学计算机科学与通信工程学院 北京100083)2
  • 收稿日期:2018-11-09 发布日期:2020-01-19
  • 通讯作者: 慕晓冬(wascom4@sina.com)
  • 基金资助:
    国家自然科学基金(61702525)

Using SVM Method Optimized by Improved Particle Swarm Optimization to Analyze Emotion of Chinese Text

WANG Li-zhi1,MU Xiao-dong1,LIU Hong-lan2   

  1. (Department of Information Engineering,Rocket Force University of Engineering,Xi’an 710025,China)1;
    (School of Computer & Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China)2
  • Received:2018-11-09 Published:2020-01-19
  • About author:WANG Li-zhi,born in 1994,Ph.D.His main research interests include natural language processing and computer vision;MU Xiao-dong,born in 1965,Ph.D supervisor.His main research interests include intelligent information processing and computer simulation.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61702525).

摘要: 近年来,随着网络用户量的不断增加,用户评论数量也呈爆炸式增长,伴随而来的是大量可用于参考和深度挖掘的信息,文本情感分类应运而生。分类模型的预测精度和执行速度是衡量模型优劣的关键。使用传统的SVM进行文本情感分类,算法简单,易于实现,但其模型参数决定了分类准确率。针对这种情况,文中将改进粒子群优化算法与SVM分类方法相结合,采用了改进粒子群算法优化的SVM方法对影视剧评论的情感进行了研究分析。首先,通过网络爬虫获取豆瓣电影评论数据,将数据预处理后利用加权word2vec向量化文本信息,将其作为支持向量机可识别的输入;然后,使用自适应惯性递减策略并引入交叉算子来改进粒子群算法,并对SVM模型的损失函数、惩罚参数及核函数的参数进行优化;最后,实现文本的情感分类。在同一数据集上的实验结果表明,所提方法有效规避了传统的情感词典方法受词语顺序和不同语境影响的缺陷及使用卷积出现梯度消失或弥散的问题,同时也克服了粒子群算法易陷入局部最优的不足。相较于其他方法,所提分类模型的执行速度更快,有效地提高了分类准确率。

关键词: SVM分类, 惯性递减, 粒子群优化, 情感分析, 网络爬虫

Abstract: In recent years,with the increasing number of network users,the number of user comments has also increased explosively,accompanied by a large number of information that can be used for reference and deep excavation.Text sentiment classification arises at this historic moment,the prediction accuracy and the execution speed of classification model are the keys to mea-sure the quality of the model.Traditional algorithm by using SVM for text sentiment classification is simple and easy to implement,and its model parameters determine the classification accuracy.In this case,this paper combined the improved particle swarm optimization algorithm with the SVM classification method,used the SVM method optimized by improved particle swarm optimization to analyze the emotion of the movie and TV drama review.Firstly,Douban movie review data are obtained by internet crawler.Then the text information is vectorized by weighted word2vec after pre-processing,which becomes the recognizable input of support vector machine.Adaptive inertia decreasing strategy and crossover operator are used to improve particle swarm optimization algorithm.The loss function,penalty parameter and kernel parameter of SVM model are optimized by improved PSO.Finally,the text is classified by this model.Experimental results on the same data show that this method effectively avoids the shortcomings of traditional affective dictionary method affected by word order and different contexts,and solves the problem of gradient disappearance or dispersion caused by convolution.It also overcomes the possibility that PSO itself is easily trapped in local optimum.Compared with other methods,the proposed classification model performs faster and improves classification accuracy effectively.

Key words: Inertia diminishing, Internet worm, Particle swarm optimization, Sentiment analysis, SVM classification

中图分类号: 

  • TP391
[1]冯志伟.自然语言处理简明教程[M].上海:上海外语教育出版社,2012.
[2]KAUR H,MANGAT V,NIDHI.A survey of sentiment analysis techniques[C]∥International Conference on I-Smac.IEEE,Palladam,India,2017:921-925.
[3]DAVE,KUSHAL,LAWRENCE,et al.Mining the peanut gallery:opinion extraction and semantic classification of product re-views[C]∥Proceedings of the 12th International Conference on World Wide Web.NewYork:ACM,2003.
[4]GO A,BHAYANI R,HUANG L.Twitter sentiment classification using distant supervision[J].Processing,2009,150(12).
[5]JOSHI A,BALAMURALI A R,BHATTACHARYYA P,et al.C-Feel-It:A Sentiment Analyzer for Micro-blogs[C]∥International Conference on Networked Computing & Advanced Information Management.IEEE Computer Society,2008:220-225.
[6]GAMON M,AUE A,CORSTON-OLIVER S,et al.Pulse:mi- ning customer opinions from free text[C]∥International Symposium on Intelligent Data Analysis.Berlin:Springer-Verlag,2005:121-132.
[7]LI S S,HUANG C R,ZHOU G D,et al.Employing personal/impersonal views in supervised and semi-supervised sentiment classification[C]∥Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Uppsala:ACL,2010.
[8]LI Y G,ZHOU X G,SUN Y,et al.Research and Implementation of Chinese Microblog Sentiment Classification[J].Journal of Software,2017,28(12):3183-3205.
[9]JOHNSON R,ZHANG T.Effective Use of Word Order for Text Categorization with Convolutional Neural Networks[J].arXiv:1412.1058.
[10]XUE W,LI T.Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]∥Association for Computational Linguistics.Melbourne,Australia,2018:2514-2523.
[11]PARUPALLI S,RAO V A,MAMIDI R.BCSAT:A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations[C]∥Association for Computational Linguistics.Melbourne,Australia,2018:99-104.
[12]ANGELIDIS S,LAPATA M.Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis[C]∥TACL:Transactions of the Association for Computational Linguistics.Melbourne,Australia,2018:17-31.
[13]GUI L,HU J,HE Y,et al.A Question Answering Approach to Emotion Cause Extraction[C]∥Empirical Methods in Natural Language Processing.Copenhagen,Denmark,2017:1593-1602.
[14]YUAN Z,JASON R,DANIEL G,et al.A Fast,Compact,Accurate Model for Language Identification of Codemixed Text [C]∥EMNLP:Empirical Methods in Natural Language Processing.Brussels,Belgium,2018:328-337.
[15]BORDOLOI M,BISWAS S K.Graph-Based Sentiment Analysis Model for E-Commerce Websites’ Data[C]∥CISC:Cognitive Informatics and Soft Computing.Singapore:Springer,2019:453-462.
[16]LI R Y,ZHANG W J,ZHOU Z Y.Improved PSO Algorithm and Its Load Distribution Optimization of Hot Strip Mills[J].Computer Science,2018,45(7):214-218,225.
[17]KENNEDY J.Particle Swarm Optimization[C]∥Icnn95-international Conference on Neural Networks.IEEE,2002.
[18]SHI Y,EBERHART R C.A modified particle swarm optimizer[C]∥Proceedings IEEE Congress on Evolutionary Computation (CEC’98).Anchorage,1998:69-73.
[19]KOU X L.Swarm Intelligence Algorithms and Their Application[D].Xi’an:Xidian University,2009.
[20]RAPAIC' M R,KANOVIC' Ž.Time-varying PSO-convergence analysis,convergence-related parameterization and new parameter adjustment schemes[J].Information Processing Letters,2009,109(11):548-552.
[21]MARTÍNEZ J L F,GARCÍA E.The PSO family:deduction,stochastic analysis and comparison[J].Swarm Intelligence,2009,3(4):245-273.
[22]SHI Y,EBERHART R C.A modified particle swarm optimizer[C]∥Proceedings IEEE Congress on Evolutionary Computation (CEC’98).Anchorage,1998:69-73.
[23]EBERHART R C,SHI Y.Tracking and optimizing dynamic systems with particle swarms[C]∥Congress on Evolutionary Computation.IEEE,2001.
[24]SHI Y,EBERHART R C.Empirical study of particle swarm optimization[C]∥Congress on Evolutionary Computation.Washi-ngton:IEEE,2002.
[25]LIANG J J,QU B Y,SUGANTHAN P N.Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization[R].Technical Report 201311,2013.
[1] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[2] 刘漳辉, 郑鸿强, 张建山, 陈哲毅.
多无人机使能移动边缘计算系统中的计算卸载与部署优化
Computation Offloading and Deployment Optimization in Multi-UAV-Enabled Mobile Edge Computing Systems
计算机科学, 2022, 49(6A): 619-627. https://doi.org/10.11896/jsjkx.210600165
[3] 丁锋, 孙晓.
基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取
Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF
计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046
[4] 袁景凌, 丁远远, 盛德明, 李琳.
基于视觉方面注意力的图像文本情感分析模型
Image-Text Sentiment Analysis Model Based on Visual Aspect Attention
计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074
[5] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[6] 戴宏亮, 钟国金, 游志铭, 戴宏明.
基于Spark的舆情情感大数据分析集成方法
Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark
计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280
[7] 张瑾, 段利国, 李爱萍, 郝晓燕.
基于注意力与门控机制相结合的细粒度情感分析
Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism
计算机科学, 2021, 48(8): 226-233. https://doi.org/10.11896/jsjkx.200700058
[8] 屈立成, 吕娇, 屈艺华, 王海飞.
基于模糊神经网络的运动目标智能分配定位算法
Intelligent Assignment and Positioning Algorithm of Moving Target Based on Fuzzy Neural Network
计算机科学, 2021, 48(8): 246-252. https://doi.org/10.11896/jsjkx.200600050
[9] 史伟, 付月.
考虑语境的微博短文本挖掘:情感分析的方法
Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis
计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089
[10] 潘芳, 张会兵, 董俊超, 首照宇.
基于高效Transformer的中文在线课程评论方面情感分析
Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer
计算机科学, 2021, 48(6A): 264-269. https://doi.org/10.11896/jsjkx.200800116
[11] 张明阳, 王刚, 彭起, 张岩峰.
学术论文公开评审平台数据分析
Data Analysis of OpenReview
计算机科学, 2021, 48(6): 63-70. https://doi.org/10.11896/jsjkx.200500138
[12] 尹久, 池凯凯, 宦若虹.
基于ATT-DGRU的文本方面级别情感分析
Aspect-level Sentiment Analysis of Text Based on ATT-DGRU
计算机科学, 2021, 48(5): 217-224. https://doi.org/10.11896/jsjkx.200500076
[13] 李梦荷, 许宏吉, 石磊鑫, 赵文杰, 李娟.
基于骨骼关键点检测的多人行为识别
Multi-person Activity Recognition Based on Bone Keypoints Detection
计算机科学, 2021, 48(4): 138-143. https://doi.org/10.11896/jsjkx.200300042
[14] 李建兰, 潘岳, 李小聪, 刘子维, 王天宇.
基于CiteSpace的中文评论文本研究现状与趋势分析
Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace
计算机科学, 2021, 48(11A): 17-21. https://doi.org/10.11896/jsjkx.210300172
[15] 杨青, 张亚文, 朱丽, 吴涛.
基于注意力机制和BiGRU融合的文本情感分析
Text Sentiment Analysis Based on Fusion of Attention Mechanism and BiGRU
计算机科学, 2021, 48(11): 307-311. https://doi.org/10.11896/jsjkx.201000075
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!