Computer Science ›› 2017, Vol. 44 ›› Issue (1): 42-47, 74.doi: 10.11896/j.issn.1002-137X.2017.01.008

Previous Articles     Next Articles

Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec

YANG Xiao-ping, ZHANG Zhong-xia, WANG Liang, ZHANG Yong-jun, MA Qi-feng, WU Jia-nan and ZHANG Yue   

  • Online:2018-11-13 Published:2018-11-13

Abstract: The construction of sentiment lexicon plays an important role in text mining.In recent years,the lexicon annotating format gradually evolves from binary annotation to multiple annotation,and sentiment lexicons of a single specific domain have caught more and more attentions of researchers.However,manual annotation costs too much labor work and time,and it is also difficult to get accurate quantification of emotional intensity.Besides,the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons[1].This paper implemented statistical training for large-scale Chinese corpus through neural network language model,and proposed an automatic me-thod of constructing a multidimensional sentiment lexicon based on constraints of Euclidean distance group.In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts,we further presented a sentiment disambiguation algorithm to increase the flexibility of our lexicon.Lastly,we presented a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc.Experiments show the superior performance of SentiRuc lexicon in category labeling test,intensity labeling test and sentiment classification tasks.It is worth mentioning that in intensity label test,SentiRuc outperforms the second place by 23%.

Key words: Sentiment analysis,Multivariate sentiment classification,Neural network language model,Sentiment disambiguation,Optimization framework of sentiment intensity

[1] WANG Hong-ning,LU Yue,ZHAI Cheng-xiang.Latent aspect rating analysis on review text data:a rating regression approach[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010.Washington,DC,USA,2010:783-792.
[2] CHOI Y,CARDIE C.Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification[C]∥Conference on Empirical Methods in Natural Language Processing.2009:590-598.
[3] ESULI A,SEBASTOAMO F.Sentiwordnet:a publicly available lexical resource for opinion mining[C]∥Proceedings of LREC.Genoa-Italy:LREC,2006:417-422.
[4] BACCIANELLA S,ESULI A,SEBASTIANI F.SentiWordNet3.0:An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining[C]∥International Conference on Language Resources and Evaluation,Lrec 2010.Valletta,Malta,2010:83-90.
[5] TANG Da-ta.National Taiwan University:simplified Chinese emotional dictionary [EB/OL].[2013-03-05].
[6] XU Lin-hong,LIN Hong-fei,PAN Yu,et al.Constructing the affective lexicon ontology [J].Journal of the China Society for Scientific and Technical Information,2008,27(2):180-185.(in Chinese) 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185.
[7] NEVIAROUSKAYA A,PRENDINGER H,ISHIZUKA M.Sen-tiFul:A Lexicon for Sentiment Analysis [J].IEEE Transactions on Affective Computing,2011,2(1):22-36.
[8] OSGOOD C E.The nature and measurement of meaning [J].Psychological Bulletin,1952,49(3):197-237.
[9] QUAN Chang-qin,REN Fu-ji.Construction of a blog emotioncorpus for Chinese emotional expression analysi[C]∥Procee-dings of the 2009 Conference on Empirical Methods in Natural Language Processing:Volume 3,Association for Computational Linguistics.2009:1446-1454.
[10] FELLBAUM C,MILLER G.WordNet:An Electronic Lexical Database[M].Bradford Book,1998.
[11] General Inquirer (GI).Harvard University.[EB/OL].[2012-04-25].
[12] 董振东.知网情感分析用词语集[CP/OL].[2012-04-25].
[13] HE Feng-ying.Orientation analysis for Chinese blog text based on semantic comprehension [J].Journal of Computer Applications,2011,31(8):2130-2133.(in Chinese) 何凤英.基于语义理解的中文博文倾向性分析[J].计算机应用,2011,31(8):2130-2133.
[14] LI Rong-jun,WANG Xiao-jie,ZHOU Yan-quan.Semantic Orien-tation Computing Using PageRank Model [J].Journal of Beijing University of Posts and Telecommunications,2010,5(5):141-144.(in Chinese) 李荣军,王小捷,周延泉.PageRank模型在中文情感词极性判别中的应用[J].北京邮电大学学报,2010,5(5):141-144.
[15] COLACE F,SANTO M D,GRECO L.SAFE:A Sentiment Analysis Framework for E-Learning[J].International Journal of Emerging Technologies in Learning,2014,9(6):37-41.
[16] MUKKAMALA R R,HUSSAIN A,VATRAPU R.Fuzzy-Set Based Sentiment Analysis of Big Social Data[C]∥ IEEE 18th International Enterprise Distributed Object Computing Confe-rence (EDOC),2014.IEEE,2014:71-80.
[17] TURNEY P D,LITTMAN M L.Measuring Praise and Criti-cism:Inference of Semantic Orientation from Association[J].ACM Transactions on Information Systems,2003,21(4):315-346.
[18] CHEN Lu,WANG Wen-bo,NAGARAJAN M,et al.Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter[C]∥The Sixth International AAAI Conference on Weblogs and Social Media(ICWSM).2012.
[19] JO Y,OH A H.Aspect and sentiment unification model for online review analysis[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining.ACM,2011:815-824.
[20] NEVIAROUSKAYA A,PRENDINGER H,ISHIZUKA M.Sen-tiFul:Generating a reliable lexicon for sentiment analysis[C]∥3rd International Conference on Affective Computing and Intelligent Interaction and Workshops,2009(ACII 2009).IEEE,2009:1-6.
[21] SAIF M,CODY D,Bonnie D.Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus[C]∥Proc.of 2009 Conference on Empirical Methods in Natural Language Processing(EMNLP’09).2009:599-608.
[22] CONTE H R,PLUTCHIK R.A circumplex model for interpersonal personality traits[J].Journal of Personality & Social Psychology,1981(4):701-711.
[23] TOM M.Statistical Language Models based on Neural Networks[D].Brno University of Technology,2012.
[24] TOM M,KARAFI T M,BURGET L,et al.Recurrent neural network based language model[C]∥Conference of the International Speech Communication Association,2010.Makuhari,Chiba,Japan,2010:1045-1048.
[25] CHEN Jian-mei,LIN Hong-fei,YANG Zhi-hao.Word Emotion Disambiguation Based on Bayesian Model[C]∥The Ninth China National Conference on Computational Linguistics,2007.(in Chinese) 陈建美,林鸿飞,杨志豪.基于贝叶斯模型的词汇情感消歧[C]∥内容计算的研究与应用前沿——第九届全国计算语言学学术会议论文集.2007.
[26] DING Ru-yi,ZHOU Hui,LIN Ma.Cognitive Appraisal Basis of Gratitude.[J].Acta Psychologica Sinica,2014,46(10):1463-1475.(in Chinese) 丁如一,周晖,林玛.感激情绪的认知评估体系[J].心理学报,2014,46(10):1463-1475.

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .