计算机科学 ›› 2012, Vol. 39 ›› Issue (Z11): 146-148.

• 软件工程 • 上一篇    下一篇

基于表情图片与情感词的中文微博情感分析

张 珊,于留宝,胡长军   

  1. (北京科技大学计算机与通信工程学院 北京100083)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Sentiment Analysis of Chinese Micro-blogs Based on Emoticons and Emotional Words

  • Online:2018-11-16 Published:2018-11-16

摘要: 微博是Web2.0时代新生的社会化媒体平台,网民通过微博抒发自己的情感,表达自己的喜怒哀乐与爱恶,从而产生了海量的情感文本信息。通过对情感信息的分析,可以得到网民的情绪状况、对某个社会现象的观点、某个产品的喜好等信息,其不仅有一定的商业价值,还对社会的稳定有所帮助。利用微博中的表情图片,并结合情感词语的方法来构建中文微博情感语料库,既保证了语料库的规模与准确性,又省去了人工的负担;在情感语料库的基础上,构建贝叶斯分类器;最后利用嫡的概念对语料库进行优化,提高了分类的准确性,并比较了使用不同n-gram特征项的性能。最终发现,使用UniGram特征项并用墒进行优化之后,分类的效果最好,召回率和准确率都可以达到85%以上,F值甚至可以达到89%以上。

关键词: 情感分析,表情图片,情感词,微博

Abstract: Micro-blog is a new social media platform based on Web 2. 0. Internet users express their feelings, emotions,favorites and disgust through micro-blogs, resulting in a large number of emotional text information. We can know the emotional state of the Internet users, the point of a social phenomenon and preference of a product, through analysis of the emotional text information, which not only has a certain kind of commercial value, and is helpful to the stability of the society. In this paper,we use the emoticons form micro-blogs,combined with emotional words to build the Chinese emotional corpus,ensuring the scale and accuracy of the corpus,eliminating the need for artificial burden. Based on the corpus,we construct Baycs classifier and use the entropy to improve the performance. We compare different performance while changing the type of n-gram. Finally, we get the best classification results using unigrams as features and optimizing with entropy. Recall rate and accuracy can be achieved above 85 0 o,the F measure can even reach more than 89%.

Key words: Sentiment analysis, Emoticons, Emotional words, Micro-blog

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!