计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 283-288.doi: 10.11896/j.issn.1002-137X.2017.10.051

• 人工智能 • 上一篇    下一篇

基于特征扩展与深度学习的短文本情感判定方法

杜永萍,陈守钦,赵晓铮   

  1. 北京工业大学计算机学院 北京100124,北京工业大学计算机学院 北京100124,北京工业大学计算机学院 北京100124
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家科技支撑计划子课题(2013BAH21B02-01),北京市自然科学基金资助

Method of Short Text Opinion Recognition Based on Feature Extension and Deep Learning

DU Yong-ping, CHEN Shou-qin and ZHAO Xiao-zheng   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对中文短文本信息量少、特征稀疏等特点,面向微博短文本进行情感分类研究,为了更好地提取短文本情感特征,从评论转发等上下文内容中挖掘具有语义递进关系的语料对原文本进行扩展,并抽取具有潜在感情色彩的特征词,采用Word2vec计算词语相似度以进行候选特征词扩展,最后引入深度信念网络(Deep Belief Network,DBN)对候选特征词进行深度自适应学习。在COAE(Chinese Opinion Analysis Evaluation)2015任务评测数据集上的实验表明,该方法能够有效地缓解短文本特征稀疏问题,并且能够较为准确地挖掘情感特征,提高情感分类的准确率。

关键词: 情感挖掘,短文本,特征扩展,深度信念网络

Abstract: This paper put forward the opinion recognition method on microblog short text,which contains a small amount of information,and the feature is sparse.The review and repost information of microblog were used to reconstruct the original microblog text.The tool of Word2vec was adopted to cluster the similar sentiment word for feature extension.And also the feature was learned by deep belief network,which achieves the high-quality sentiment feature.The experimental result on the data of COAE (Chinese opinion analysis evaluation) 2015 denotes that our method alleviates the problem of feature sparseness and also more effective sentimental features are mined.The system performance is improved with the precision of 64.1%。

Key words: Opinion mining,Short text,Feature extension,Deep belief network

[1] ZHANG C G,LIU P Y,ZHU Z F,et al.A sentiment analysis method based on a polarity lexicon[J].Journal of Shandong University(Natural Science),2012,47(3):47-50.(in Chinese) 张成功,刘培玉,朱振方,等.一种基于极性词典的情感分析方法[J].山东大学学报(理学版),2012,47(3):47-50.
[2] PANG B,LEE L,VAITHYANATHAN S.Thumbs up? Sentiment classification using machine learning techniues[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing(EMNLP).Philadelphia,PA,USA,2002:79-86.
[3] SUN Y,ZHOU X G,FU W.Unsupervised Topic and Sentiment Unification Model for Sentiment Analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2013,49(1):102-108.(in Chinese) 孙艳,周学广,付伟.基于主题情感混合模型的无监督文本情感分析[J].北京大学学报(自然科学版),2013,49(1):102-108.
[4] MEI Q Z,LING X,WONDRA M,et al.Topic sentiment mixture:modeling facets and opinions in weblogs[C]∥International Conference on World Wide Web.2007:171-180.
[5] YANG Z,LAI Y X,DUAN L J,et al.Short Text Sentiment Classification Based on Context Reconstruction[J].Acta Automatica Sinca,2012,38(1):55-67.(in Chinese) 杨震,赖英旭,段立娟,等.基于上下文重构的短文本情感极性判别研究[J].自动化学报,2012,38(1):55-67.
[6] WANG M,LIN L F,WANG F.Short text expansion and classification based on pseudo-relevance feedback[J].Journal of Zhejiang University(Engineering Science),2014,48(10):1835-1842.(in Chinese) 王蒙,林兰芬,王锋.基于伪相关反馈的短文本扩展与分类[J].浙江大学学报(工学版),2014,48(10):1835-1842.
[7] TURNEY P D.Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C]∥Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.Philadelphia,PA,USA,2002:417-424.
[8] HE T X,ZHANG H,LI B,et al.Sentiment classification combined with sentiment lexicon network for Chinese short texts[J].Application Research of Computers,2015,32(10):2905-2909.(in Chinese) 何天翔,张晖,李波,等.结合情感词网的中文短文本情感分类[J].计算机应用研究,2015,32(10):2905-2909.
[9] HE F Y,HE Y X,LIU N,et al.A Microblog Short Text Oriented Multi-class Feature Extraction Method of Fine-Grained Sentiment Analysis[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2014,50(1):48-54.(in Chinese) 贺飞艳,何炎祥,刘楠,等.面向微博短文本的细粒度情感特征抽取方法[J].北京大学学报(自然科学版),2014,50(1):48-54.
[10] XIA M N,DU Y P,ZUO B X.Micro-blog opinion analysis based on syntactic dependency and feature combination[J].Journal of Shandong University (Natural Science),2014,49(11):22-30.(in Chinese) 夏梦南,杜永萍,左本欣.基于依存分析与特征组合的微博情感分析[J].山东大学学报(理学版),2014,49(11):22-30.
[11] HE Y X,LIU J B,SUN S T,et al.Product reviews sentiment classification in Micro-blog based on cascaded conditional random field[J].Journal of Shandong University (Natural Science),2015,0(11):67-73.(in Chinese) 何炎祥,刘健博,孙松涛,等.基于层叠条件随机场的微博商品评论情感分类[J].山东大学学报(理学版),2015,50(11):67-73.
[12] RAO Y H,XIE H R,LI J,et al.Social emotion classification of short text via topic-level maximum entropy model[J].Information & Management,2016,53(8):978-986.
[13] ODBAL,WANG Z F.Emotion Analysis Model Using Compositional Semantics[J].Acta Automatica Sinica,2015,41(12):2125-2137.
[14] WANG X R,ZHANG Q H.Text Emotion CassificationResearch Based on Improved Latent Semantic Analysis Algorithm[C]∥Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering.Hangzhou,China,2013:210-213.
[15] ZHANG J M,WANG B,TANG H H,et al.Unsupervised Sentiment Orientation Analysis on Micro-blog Based on Biterm Topic Model[J].Computer Engineering,2015,41(7):219-223.(in Chinese) 张佳明,王波,唐浩浩,等.基于Biterm主题模型的无监督微博情感倾向性分析[J].计算机工程,2015,41(7):219-223.
[16] SU Y,JU S F,WANG Z Q,et al.Semi-supervised Sentiment Classification with Random Feature Subspace Method[J].Journal of Chinese Information Processing,2012,26(4):85-90.(in Chinese) 苏艳,居胜峰,王中卿,等.基于随机特征子空间的半监督情感分类方法研究[J].中文信息学报,2012,26(4):85-90.
[17] Google开源深度学习工具Wordvec.https://code.google.com/p/word2vec.
[18] 搜狗实验室全网新闻数据(SogouCA).http://download.labs.sogou.com/dl/ca.html.
[19] HINTON G E,OSINDERO S,THE Y W.A Fast Learning Algorithm for Deep Belief Nets[J].Neural Computation,2006,18(7):1527-1554.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .