计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 19-26.doi: 10.11896/jsjkx.191000531C

• 大数据与数据科学* • 上一篇    下一篇

基于关键词和关键句抽取的用户评论情感分析

喻影1, 陈珂1,2, 寿黎但1,2, 陈刚1,2, 吴晓凡3   

  1. (浙江大学计算机科学与技术学院 杭州310027)1
    (浙江省大数据智能计算重点实验室(浙江大学) 杭州310027)2
    (网易(杭州)网络有限公司 杭州310051)3
  • 收稿日期:2018-07-11 修回日期:2018-09-15 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 陈珂(1977-),女,博士,副教授,硕士生导师,CCF会员,主要研究领域为时空数据库、数据挖掘以及数据隐私保护等,E-mail:chenk@zju.edu.cn。
  • 作者简介:喻影(1993-),女,硕士,主要研究领域为数据挖掘、情感分析等;寿黎但(1974-),男,博士,教授,博士生导师,CCF会员,主要研究领域为空间数据库、数据挖掘、数据可视化等;陈刚(1973-),男,博士,教授,博士生导师,主要研究领域为大数据管理;吴晓凡(1984-),男,博士,主要研究领域为大数据智能。
  • 基金资助:
    本文受国家重点研发项目(2017YFB1201001),国家自然科学基金项目(61672455),浙江省自然科学基金(LY18F020005)资助。

Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences

YU Ying1, CHEN Ke1,2, SHOU Li-dan1,2, CHEN Gang1,2, WU Xiao-fan3   

  1. (College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China)1
    (Key Laboratory of Big Data Intelligent Computing of Zhejiang Province (Zhejiang University),Hangzhou 310027,China)2
    (Netease (Hangzhou) Network Co.,Ltd,Hangzhou 310051,China)3
  • Received:2018-07-11 Revised:2018-09-15 Online:2019-10-15 Published:2019-10-21

摘要: 情感分析的一项主要研究任务是根据文档内容对其情感极性(即正类和负类)进行判断。在判断文档的情感极性时,不同的词语和句子具有不同的情感贡献度,因此如何从整个文档中准确地提取与情感分类更相关的词语和句子,从而提升分类性能,成为了一个重要问题。在有监督实验中,基于依存句法关系分析句子的逻辑结构,提取出了与表达情感更相关的词语进行加权,提高了分类性能。在半监督实验中,使用基于中文评论的关键句抽取和分类器融合算法,对整篇文档中包含更多情感词和总结意味的关键句进行了抽取,充分考虑了句子的情感词属性、位置属性、标点符号属性和关键词属性,并且使用分类器融合算法,让置信度最高的子分类器决定分类效果。在大众点评网和头条新闻的数据集上将所提算法与已有的经典算法进行对比,发现所提方法的性能更高,从而证明了基于依存句法分析的关键词抽取和基于特征的中文关键句抽取算法的有效性。

关键词: 半监督学习, 关键句抽取, 情感分析, 协同训练, 依存分析

Abstract: One of the main task of sentiment analysis is to determine the polarity of a review whether it is positive or negative according to the content of the document.When determining the emotional polarity of a document,different sentences and words have different emotional contribution on the classification result,so how to extract more related words and sentences becomes an important problem.In the experiment of the supervised classification,this paper used the dependency syntactic analysis to extract the words which are more related to the emotion and can improve the classification effect.In the semi-supervised classification experiment,the key sentence extraction and the combining-classifier method based on the Chinese comments have been used.For key sentence extraction,the proposed approach takes the following attributes into account:sentiment attribute,position attribute,special word attribute and punctuation attri-bute.This approach extracts key sentences containing more emotional words and summary meaning,then uses combining-classifier method to make the sub classifier with the highest confidence to determine the classification effect.The results show that the performance of the proposed method is better than the baseline methods,which proves the validity of keyword extraction based on the dependency parsing and key Chinese sentence extraction algorithms.

Key words: Co-training method, Dependency parsing, Key sentence extraction, Semi-supervised learning, Sentiment analysis

中图分类号: 

  • TP391
[1]TURNEY P D.Thumbs up or thumbs down?:semantic orienta-tion applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:417-424.
[2]NAKAGAWA T,INUI K,KUROHASHI S.Dependency tree-based sentiment classification using CRFs with hidden variables[C]//Human Language Technologies:Conference of the North American Chapter of the Association of Computational Linguistics.Los Angeles,California,USA,DBLP,2010:786-794.
[3]MCDONALD R T,HANNAN K,NEYLON T,et al.Structured Models for Fine-to-Coarse Sentiment Analysis[C]//Proceedings of the,Meeting of the Association for Computational Linguistics(ACL 2007).Prague,Czech Republic,DBLP,2007:30-32.
[4]ABBASI A,FRANCE S,ZHANG Z,et al.Selecting Attributes for Sentiment Classification Using Feature Relation Networks[J].IEEE Transactions on Knowledge & Data Engineering,2011,23(3):447-462.
[5]AGARWAL A,XIE B,VOVSHA I,et al.Sentiment analysis of twitter data[C]//Proceedings of the Workshop on Language in Social Media (LSM 2011).2011:30-38.
[6]LIU J W,LIU Y,LUO X Q.Semi-supervised learning method[J].Journal of Computer Science,2015(8):1592-1617.
[7]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781.
[8]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//International Conference on Machine Learning.2014:1188-1196.
[9]ZHOU X,WAN X,XIAO J.Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:1403-1412.
[10]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].arXiv:1607.04606.
[11]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].arXiv:1802.05365.
[12]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805.
[13]TAN S,WANG Y,CHENG X.Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008:743-744.
[14]CAMBRIA E,PORIA S,HAZARIKA D,et al.SenticNet 5:Discovering conceptual primitives for sentiment analysis by means of context embeddings[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[15]SCUDDER H J.Probability of error of some adaptive pattern-recognition machines[J].IEEE Transactions on Information Theory,1965,11(3):363-371.
[16]FRALICK S C.Learning to recognize patterns without a teacher[J].IEEE Transactions on Information Theory,1967,13(1):57-64.
[17]AGRAWALA A K.Learning with a probabilistic teacher[J].IEEE Transactions on Information Theory,1970,16(4):373-379.
[18]PARK S B,ZHANG B T.Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information[J].Information Processing &Management,2004,40(3):421-439.
[19]KIRITCHENKO S,MATWIN S.Email classification with co-training[C]//Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research.IBM Corp,2011:301-312.
[20]SU Y,JU S,WANG Z,et al.Semi-supervised sentiment classification with random feature subspace method [J].Journal of Chinese Information Processing,2012,26(4):85-91.
[1] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[2] 庞兴龙, 朱国胜.
基于半监督学习的网络流量分析研究
Survey of Network Traffic Analysis Based on Semi Supervised Learning
计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131
[3] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[4] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[5] 许华杰, 陈育, 杨洋, 秦远卓.
基于混合样本自动数据增强技术的半监督学习方法
Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques
计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156
[6] 丁锋, 孙晓.
基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取
Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF
计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046
[7] 袁景凌, 丁远远, 盛德明, 李琳.
基于视觉方面注意力的图像文本情感分析模型
Image-Text Sentiment Analysis Model Based on Visual Aspect Attention
计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074
[8] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[9] 戴宏亮, 钟国金, 游志铭, 戴宏明.
基于Spark的舆情情感大数据分析集成方法
Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark
计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280
[10] 张瑾, 段利国, 李爱萍, 郝晓燕.
基于注意力与门控机制相结合的细粒度情感分析
Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism
计算机科学, 2021, 48(8): 226-233. https://doi.org/10.11896/jsjkx.200700058
[11] 史伟, 付月.
考虑语境的微博短文本挖掘:情感分析的方法
Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis
计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089
[12] 潘芳, 张会兵, 董俊超, 首照宇.
基于高效Transformer的中文在线课程评论方面情感分析
Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer
计算机科学, 2021, 48(6A): 264-269. https://doi.org/10.11896/jsjkx.200800116
[13] 张明阳, 王刚, 彭起, 张岩峰.
学术论文公开评审平台数据分析
Data Analysis of OpenReview
计算机科学, 2021, 48(6): 63-70. https://doi.org/10.11896/jsjkx.200500138
[14] 尹久, 池凯凯, 宦若虹.
基于ATT-DGRU的文本方面级别情感分析
Aspect-level Sentiment Analysis of Text Based on ATT-DGRU
计算机科学, 2021, 48(5): 217-224. https://doi.org/10.11896/jsjkx.200500076
[15] 李建兰, 潘岳, 李小聪, 刘子维, 王天宇.
基于CiteSpace的中文评论文本研究现状与趋势分析
Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace
计算机科学, 2021, 48(11A): 17-21. https://doi.org/10.11896/jsjkx.210300172
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!