计算机科学 ›› 2016, Vol. 43 ›› Issue (Z6): 418-421.doi: 10.11896/j.issn.1002-137X.2016.6A.099

• 数据挖掘 • 上一篇    下一篇

基于word2vec和SVMperf的中文评论情感分类研究

张冬雯,杨鹏飞,许云峰   

  1. 河北科技大学信息科学与工程学院 石家庄050018,河北科技大学信息科学与工程学院 石家庄050018,河北科技大学信息科学与工程学院 石家庄050018
  • 出版日期:2018-11-14 发布日期:2018-11-14

Research of Chinese Comments Sentiment Classification Based on Word2vec and SVMperf

ZHANG Dong-wen, YANG Peng-fei and XU Yun-feng   

  • Online:2018-11-14 Published:2018-11-14

摘要: 利用有监督的机器学习的方法来对中文产品评论文本进行情感分类,该方法结合了word2vec和SVMperf两种工具。先由word2vec训练出语料中每个词语的词向量,通过计算相互之间的余弦距离来达到相似概念词语聚类的目的,通过相似特征聚类将高相似度领域词汇扩充到情感词典;再使用word2vec训练出词向量的高维度表示;然后采用主成分分析方法(PCA)对高维度向量进行降低维度处理,形成特征向量;最后使用两种方法抽取有效的情感特征,由SVMperf进行训练和预测,从而完成文本的情感分类。实验结果表明,采用相似概念聚类方法对词典进行扩充任务或情感分类任务都可以获得很好的效果。

关键词: 情感分类,word2vec,SVMperf,语义特征,PCA

Abstract: In this paper,we used the machine learning method to classify the sentiment classification of Chinese product reviews.The method combines SVMperf and word2vec.Word2vec trains out each word of the corpus of word vectors.By computing the cosine distance between each other,a similar concept word clustering is achieved,and with similar feature clustering, the vocabulary of the high similarity in the field is expanded to sentiment lexicon.The high dimensional representation of the word vector is trained out using word2vec.PCA principal component analysis method is used to reduce the dimension of the high dimensional vector,and the feature vector is formed.We used two different method to extract the effective affective feature,which is trained and predicted by SVMperf,so as to complete the sentiment classification of the text.The experimental results show that the method can obtain good results,regardless using the similar concept clustering method to expand the task or complete the emotional classification task.

Key words: Sentiment classification,Word2vec,SVMperf,Semantic features,PCA

[1] 杨经,林世平.基于SVM的文本词句情感分析[J].计算机应用与软件,2011,28(9):225-228
[2] Raaijmakers S,Kraaij W.A shallow approach to subjectivityclassification[C]∥Proceedings of the Second International Conference on Weblogs and Social Media(CWSM).2008:216-217
[3] Xia R,Zong C.Exploring the use of word relation features for sentiment classification[C]∥Proceedings of the 23rd International Conference on Computational Linguistics(COLING).Beijing:ACL,2010,1336-1344
[4] Abbasi A,France S,Zhang Z,et al.Selecting attributes for sentiment classification using feature relation networks[J].IEEE Transactions on Knowledge and Data Engineering(TKDE),2011,23(3):447-462
[5] Yao J,Wang H,Yin P.Sentiment feature identification fromChinese Online reviews[C]∥Proceedings of 2011 International Conference on Computer Science and Education:Communications in Computer and Information Science.2011:315-322
[6] Wang H,Yin P,Zheng L,et al.Sentiment classification of Online reviews:using sentence-based language model[J].Journal of Experimental & Theoretical Artificial Intelligence,2014,26(1):13-31
[7] Mikolov T,Chen K,Corrado G,et al.Efficient estimation ofword representations in vector space[C]∥Proceedings of Workshop at ICLR.2013
[8] Mikolov T,Sutskever I,Chen K,et al.Distributed representa-tions of words and phrases and their compositionality[C]∥Proceedings of NIPS.2013
[9] Mikolov T,Yih W,Zweig G.Linguistic regularities in continuous space word representations[C]∥Proceedings of NAACL HLT.2013
[10] Joachims T.Training linear SVMs in linear time[C]∥Procee-dings of the ACM Conference on Knowledge Discovery and Data Mining (KDD).2006
[11] Joachims T.A support vector method for multivariate performance measures[C]∥Proceedings of the International Conference on Machine Learning(ICML).2005
[12] Liu B,Zhang L.A survey of opinion mining and sentiment analysis[J].Synthesis Lectures on Human Language Technologies,2010,2:459-526
[13] Tang H,Tan S,Cheng X.A survey on sentiment detection of reviews[J].Expert Systems with Applications,2009:10760-10773
[14] Joachims T,Yu C.Sparse kernel SVMs via Cutting-Plane trai-ning[M]∥Machine Learning and Knowledge Discovery in Dtabases.Springer.Berlin Heidelbergy.2009
[15] Tang H,Tan S,Cheng X.A survey on sentiment detection of reviews[J].Expert Systems with Applications,2009,36:10760-10773
[16] Zhai Z,Liu B,Xu H,et al.Grouping product features usingsemi-supervised learning with soft-constraints[C]∥Proceedings of the 23rd International Conference on Computational Linguistics (COLING).Beijing:ACL,2010:1272-1280
[17] Zhai Z,Liu B,Xu H,et al.Clustering product features for opi-nion mining[C]∥Proceedings of the Fourth ACM International Conference on Web Search and Data Mining(WSDM).Hong Kong:ACM,2011:347-354
[18] Jose C.A Fast On-line Algorithm for PCA and Its Convergence Characteristics[J].IEEE Transactions on Neural Network,2000,4(2):299-307

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!