计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 296-301.doi: 10.11896/j.issn.1002-137X.2017.10.053

• 人工智能 • 上一篇    下一篇

基于语义相似度的情感特征向量提取方法

林江豪,周咏梅,阳爱民,陈锦   

  1. 广东外语外贸大学语言工程与计算实验室 广州510006,广东外语外贸大学语言工程与计算实验室 广州510006;广东外语外贸大学思科信息学院 广州510006,广东外语外贸大学语言工程与计算实验室 广州510006;广东外语外贸大学思科信息学院 广州510006,广东外语外贸大学语言工程与计算实验室 广州510006;广东外语外贸大学国际学院 广州510420
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家社科基金项目(12BYY045)资助

Extraction Method of Sentimental Feature Vector Based on Semantic Similarity

LIN Jiang-hao, ZHOU Yong-mei, YANG Ai-min and CHENG Jin   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对现有情感特征在语义表达和领域拓展等方面的不足,提出了一种基于语义相似度的情感特征向量提取方法。利用25万篇sogou新闻语料和50万条微博语料,训练得到Word2vec模型;选择80个情感明显、内容丰富、词性多样化的情感词作为种子词集;通过计算候选情感词与种子词的词向量之间的语义相似度,将情感词映射到高维向量空间,实现了情感词的特征向量表示(Senti2vec)。将Senti2vec应用于情感近义词和反义词相似度分析、情感词极性分类和文本情感分析任务中,实验结果表明Senti2vec能实现情感词的语义表示和情感表示。基于大规模语料的语义相似计算,使得提取的情感特征更具有领域拓展性。

关键词: 情感特征向量,语义相似度,情感词,Word2vec

Abstract: In order to fill the gap of the semantic representation and domain expansion on sentimental features,in this paper,an extraction method of sentimental feature vector based on semantic similarity was proposed.First of all,the Word2vec model is trained based on 250 thousand sogou news texts and 500 thousand micro-blog texts.Eighty sentimental words,which are obvious sentiment,rich content and diverse POS,are chosen as a set of seed words.Then,the semantic similarity between the candidate sentimental words and the seed words are calculated based on their word vectors.The sentimental words are mapped to the high dimensional vector space and the feature vector representation (Senti2vec) is extracted.Senti2vec is applied into the similarity analysis of sentimental synonyms and antonyms,polarity classification of sentimental words and sentimental text analysis.The experimental results show that Senti2vec can represent the meaning and sentiment of the sentimental words.Senti2vec is based on semantic similarity calculation from large scale of data,which enables this method more adaptable into different domains.

Key words: Sentimental feature vector,Semantic similarity,Sentiment word,Word2vec

[1] XU G,MENG X F,WANG H F.Build Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources[C]∥Proceedings of the 23rd International Conference on Computational Linguistics.2010:1209-1217.
[2] BACCIANELLA S,ESUL A,SEBASTIANI F.SentiWordNet3.0:An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining[C]∥International Conference on Language Resources and Evaluation(Lrec 2010).Valletta,Malta,2010:83-90.
[3] DAI L L,XIA Y N,LIU B,et al.Measuring Semantic Similarity between Words Using HowNet[C]∥Proceedings of the 2008 International Conference on Computer Science and Information Technology.2008:601-605.
[4] TABOADA M,BROOKE J,TOFILOSKI M,et al.Lexicon-based methods for sentiment analysis[J].Computational linguistics,2011,37(2):267-307.
[5] DRAGUT E C,WANG H,SISTLA P,et al.Polarity Consistency Checking for Domain Independent Sentiment Dictionaries[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(3):838-851.
[6] VO D T,ZHANG Y.Don’t Count,Predict! An Automatic Approach to Learning Sentiment Lexicons for Short Text[C]∥The 54th Annual Meeting of the Association for Computational Linguistics.2016:219.
[7] ZHU Y L,MIN J,ZHOU Y Q,et al.Semantic orientation computing based on HowNet[J].Journal of Chinese Information Processing,2006,0(1):14-20.(in Chinese) 朱嫣岚,闵锦,周雅倩,等.基于hownet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20.
[8] LIU W P,ZHU Y H,LI C L,et al.Research on building Chinesebasic semantic lexicon[J].Journal of Computer Applications,2009,29(11):2882-2884.(in Chinese) 柳位平,朱艳辉,栗春亮,等.中文基础情感词词典构建方法研究[J].计算机应用,2009,9(11):2882-2884.
[9] ZHOU Y M,YANG A M,YANG J N.Construction Method of Sentiment Lexicon for News Reviews[J].Computer Science,2014,41(8):67-69.(in Chinese) 周咏梅,阳爱民,杨佳能.一种新闻评论情感词典的构建方法[J].计算机科学,2014,41(8):67-69.
[10] YANG A M,LIN J H,ZHON Y M,et al.Research on Building a Chinese Sentiment Lexicon Based on SO-PMI[J].Applied Mechanics and Materials,2013,263-266:1688-1693.
[11] ZHOU Y M,YANG A M,LIN J H.A method of building Chinese microblog sentiment lexicon[J].Journal of Shandong University (Engineering Science),2014,44(3):36-40.(in Chinese) 周咏梅,阳爱民,林江豪.中文微博情感词典构建方法[J].山东大学学报(工学版),2014,44(3):36-40.
[12] WANG G W,ARAKI K.Modifying SO-PMI for Japanese Web-log Opinion Mining by Using a Balancing Factor and Detecting Neutral Expressions[C]∥Proceedings of NAACL HLT.2007:189-192.
[13] PENG L Z,WU Y Y.Semantic Similarity Computing Based on Community Mining of Wikipedia[J].Computer Science,2016,43(4):45-49.(in Chinese) 彭丽针,吴扬扬.基于维基百科社区挖掘的词语语义相似度计算[J].计算机科学,2016,43(4):45-49.
[14] TAO F M,GAO J,WANG T J,et al.Topic Oriented Sentimental Feature Selection Method for News Comments[J].Journal of Chinese Information Processing,2010,24(3):37-43.(in Chinese) 陶富民,高军,王腾蛟,等.面向话题的新闻评论的情感特征选取[J].中文信息学报,2010,24(3):37-43.
[15] LI S K,JIANG Y B.Semi-Supervised Sentiment ClassificationBased on Sentiment Feature Clustering[J].Journal of Computer Research and Development,2013,0(12):2570-2577.(in Chinese) 李素科,蒋严冰.基于情感特征聚类的半监督情感分类[J].计算机研究与发展,2013,0(12):2570-2577.
[16] HE F Y,HE Y X,LIU N,et al.A Microblog Short Text Oriented Multi-class Feature Extraction Method of Fine-Grained Sentiment Analysis [J].Acta Scientiarum Naturalium Universitatis Pekinensis,2014,50(1):48-54.(in Chinese) 贺飞艳,何炎祥,刘楠,等.面向微博短文本的细粒度情感特征抽取方法[J].北京大学学报(自然科学版),2014,0(1):48-54.
[17] WU J Y,JI J Z,ZHAO X W,et al.Weight Calculation of Emotional Word Based on Feature Selection Technique[J].Journal of Beijing University of Technology,2016,2(1):142-151.(in Chinese) 吴金源,冀俊忠,赵学武,等.基于特征选择技术的情感词权重计算[J].北京工业大学学报,2016,2(1):142-151.
[18] PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]∥Conference on Empirical Methods in Natural Language Processing.2014:1532-1543.
[19] TSVETKOV Y,FARUQUI M,DYER C.Correlation-based Intrinsic Evaluation of Word Vector Representations[C]∥The Workshop on Evaluating Vector-Space Representations for Nlp.2016:111-115.
[20] CAMACHO-COLLADOS J,NAVIGLI R.Find the word thatdoes not belong:A Framework for an Intrinsic Evaluation of Word Vector Representations[C]∥The Workshop on Evaluating Vector-Space Representations for Nlp.2016:43-50.
[21] HAMOUDA A,MAREI M,ROHAIM M.Building MachineLearning Based Senti-word Lexicon for Sentiment Analysis[J].J ournal of Advances in Information Technology,2011,2(4):199-203.
[22] VAN DER MAATEN L J P.Accelerating t-SNE using Tree-Based Algorithms[J].Journal of Machine Learning Research,2014,15(1):3221-3245.
[23] ZHOU Y M,YANG J N,YANG A M.A method on building Chinese sentiment lexicon for text sentiment analysis[J].Journal of Shandong University (Engineering Science),2013,3(6):27-33.(in Chinese) 周咏梅,杨佳能,阳爱民.面向文本情感分析的中文情感词典构建方法[J].山东大学学报(工学版),2013,3(6):27-33.
[24] YANG D,YANG A M.Classification approach of Chinesetexts sentiment based on semantic lexicon and nave Bayesian[J].Application Research of Computers,2010,27(10):3737-3739,3743.(in Chinese) 杨鼎,阳爱民.一种基于情感词典和朴素贝叶斯的中文文本情感分类方法[J].计算机应用研究,2010,27(10):3737-3739,3743.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .