计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 42-47.doi: 10.11896/j.issn.1002-137X.2017.01.008

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

基于Word2Vec的情感词典自动构建与优化

杨小平,张中夏,王良,张永俊,马奇凤,吴佳楠,张悦   

  1. 中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(71271209),北京市自然科学基金(4132067),教育部人文社会科学青年基金(11YJC630268),数字出版技术国家重点实验室开放课题资助

Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec

YANG Xiao-ping, ZHANG Zhong-xia, WANG Liang, ZHANG Yong-jun, MA Qi-feng, WU Jia-nan and ZHANG Yue   

  • Online:2018-11-13 Published:2018-11-13

摘要: 情感词典的构建是文本挖掘领域中重要的基础性工作。近几年,情感词典的极性标注从二元褒贬标注向多元情绪标注发展,词典的领域特性也日趋明显。但是情感类别的手工标注不但费时费力,而且情感强度难以得到准确量化,同时对领域性的过分关注也大大限制了情感词典的适用性[1]。通过神经网络语言模型对大规模中文语料进行统计训练,并在此基础上提出了基于转换约束集的多维情感词典自动构建方法;然后研究了基于词分布密度的感情色彩消歧方法,对兼具褒贬意味词语的感情极性进行区分和识别,并分别计算两种感情色彩下的情感类别与强度;最后提出基于多个语义资源的全局优化方案,得到包含10种情绪标注的多维汉语情感词典SentiRuc。实验证实该词典1)在类别标注检验、强度标注检验、情感消歧效果及情感分类任务中均具有良好的效果,其中的情感强度检验证实该词典具有极强的情感语义描述力。

关键词: 情感分析,多元情感分类,神经网络语言模型,情感消歧,情感强度优化框架

Abstract: The construction of sentiment lexicon plays an important role in text mining.In recent years,the lexicon annotating format gradually evolves from binary annotation to multiple annotation,and sentiment lexicons of a single specific domain have caught more and more attentions of researchers.However,manual annotation costs too much labor work and time,and it is also difficult to get accurate quantification of emotional intensity.Besides,the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons[1].This paper implemented statistical training for large-scale Chinese corpus through neural network language model,and proposed an automatic me-thod of constructing a multidimensional sentiment lexicon based on constraints of Euclidean distance group.In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts,we further presented a sentiment disambiguation algorithm to increase the flexibility of our lexicon.Lastly,we presented a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc.Experiments show the superior performance of SentiRuc lexicon in category labeling test,intensity labeling test and sentiment classification tasks.It is worth mentioning that in intensity label test,SentiRuc outperforms the second place by 23%.

Key words: Sentiment analysis,Multivariate sentiment classification,Neural network language model,Sentiment disambiguation,Optimization framework of sentiment intensity

[1] WANG Hong-ning,LU Yue,ZHAI Cheng-xiang.Latent aspect rating analysis on review text data:a rating regression approach[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010.Washington,DC,USA,2010:783-792.
[2] CHOI Y,CARDIE C.Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification[C]∥Conference on Empirical Methods in Natural Language Processing.2009:590-598.
[3] ESULI A,SEBASTOAMO F.Sentiwordnet:a publicly available lexical resource for opinion mining[C]∥Proceedings of LREC.Genoa-Italy:LREC,2006:417-422.
[4] BACCIANELLA S,ESULI A,SEBASTIANI F.SentiWordNet3.0:An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining[C]∥International Conference on Language Resources and Evaluation,Lrec 2010.Valletta,Malta,2010:83-90.
[5] TANG Da-ta.National Taiwan University:simplified Chinese emotional dictionary [EB/OL].[2013-03-05].http://www.datatang.com/data/11837.
[6] XU Lin-hong,LIN Hong-fei,PAN Yu,et al.Constructing the affective lexicon ontology [J].Journal of the China Society for Scientific and Technical Information,2008,27(2):180-185.(in Chinese) 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
[7] NEVIAROUSKAYA A,PRENDINGER H,ISHIZUKA M.Sen-tiFul:A Lexicon for Sentiment Analysis [J].IEEE Transactions on Affective Computing,2011,2(1):22-36.
[8] OSGOOD C E.The nature and measurement of meaning [J].Psychological Bulletin,1952,49(3):197-237.
[9] QUAN Chang-qin,REN Fu-ji.Construction of a blog emotioncorpus for Chinese emotional expression analysi[C]∥Procee-dings of the 2009 Conference on Empirical Methods in Natural Language Processing:Volume 3,Association for Computational Linguistics.2009:1446-1454.
[10] FELLBAUM C,MILLER G.WordNet:An Electronic Lexical Database[M].Bradford Book,1998.
[11] General Inquirer (GI).Harvard University.[EB/OL].[2012-04-25].http://www.wjh.harvard.edu/~inquirer.
[12] 董振东.知网情感分析用词语集[CP/OL].[2012-04-25].http://www.keenage.com.
[13] HE Feng-ying.Orientation analysis for Chinese blog text based on semantic comprehension [J].Journal of Computer Applications,2011,31(8):2130-2133.(in Chinese) 何凤英.基于语义理解的中文博文倾向性分析[J].计算机应用,2011,31(8):2130-2133.
[14] LI Rong-jun,WANG Xiao-jie,ZHOU Yan-quan.Semantic Orien-tation Computing Using PageRank Model [J].Journal of Beijing University of Posts and Telecommunications,2010,5(5):141-144.(in Chinese) 李荣军,王小捷,周延泉.PageRank模型在中文情感词极性判别中的应用[J].北京邮电大学学报,2010,5(5):141-144.
[15] COLACE F,SANTO M D,GRECO L.SAFE:A Sentiment Analysis Framework for E-Learning[J].International Journal of Emerging Technologies in Learning,2014,9(6):37-41.
[16] MUKKAMALA R R,HUSSAIN A,VATRAPU R.Fuzzy-Set Based Sentiment Analysis of Big Social Data[C]∥ IEEE 18th International Enterprise Distributed Object Computing Confe-rence (EDOC),2014.IEEE,2014:71-80.
[17] TURNEY P D,LITTMAN M L.Measuring Praise and Criti-cism:Inference of Semantic Orientation from Association[J].ACM Transactions on Information Systems,2003,21(4):315-346.
[18] CHEN Lu,WANG Wen-bo,NAGARAJAN M,et al.Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter[C]∥The Sixth International AAAI Conference on Weblogs and Social Media(ICWSM).2012.
[19] JO Y,OH A H.Aspect and sentiment unification model for online review analysis[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.ACM,2011:815-824.
[20] NEVIAROUSKAYA A,PRENDINGER H,ISHIZUKA M.Sen-tiFul:Generating a reliable lexicon for sentiment analysis[C]∥3rd International Conference on Affective Computing and Intelligent Interaction and Workshops,2009(ACII 2009).IEEE,2009:1-6.
[21] SAIF M,CODY D,Bonnie D.Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus[C]∥Proc.of 2009 Conference on Empirical Methods in Natural Language Processing(EMNLP’09).2009:599-608.
[22] CONTE H R,PLUTCHIK R.A circumplex model for interpersonal personality traits[J].Journal of Personality & Social Psychology,1981(4):701-711.
[23] TOM M.Statistical Language Models based on Neural Networks[D].Brno University of Technology,2012.
[24] TOM M,KARAFI T M,BURGET L,et al.Recurrent neural network based language model[C]∥Conference of the International Speech Communication Association,2010.Makuhari,Chiba,Japan,2010:1045-1048.
[25] CHEN Jian-mei,LIN Hong-fei,YANG Zhi-hao.Word Emotion Disambiguation Based on Bayesian Model[C]∥The Ninth China National Conference on Computational Linguistics,2007.(in Chinese) 陈建美,林鸿飞,杨志豪.基于贝叶斯模型的词汇情感消歧[C]∥内容计算的研究与应用前沿——第九届全国计算语言学学术会议论文集.2007.
[26] DING Ru-yi,ZHOU Hui,LIN Ma.Cognitive Appraisal Basis of Gratitude.[J].Acta Psychologica Sinica,2014,46(10):1463-1475.(in Chinese) 丁如一,周晖,林玛.感激情绪的认知评估体系[J].心理学报,2014,46(10):1463-1475.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!