计算机科学 ›› 2014, Vol. 41 ›› Issue (9): 248-252.doi: 10.11896/j.issn.1002-137X.2014.09.047

• 人工智能 • 上一篇    下一篇

一种基于语义距离的Web评论SVM情感分类方法

肖正,刘辉,李兵   

  1. 湖南大学信息科学与工程学院 长沙410012;湖南大学信息科学与工程学院 长沙410012;湖南大学信息科学与工程学院 长沙410012
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受湖南大学湖南省自然科学基金(13JJ4038),湖南大学“青年教师成长计划”资助

SVM Sentiment Classifier Based on Semantic Distance for Web Comments

XIAO Zheng,LIU Hui and LI Bing   

  • Online:2018-11-14 Published:2018-11-14

摘要: 情感倾向分析本质上可以看作是一个情感极性分类问题。在海量数据处理的大背景下,为了提高文本情感判断的准确率,提出了一种结合潜在语义分析LSA(Latent Semantic Analysis)和支持向量机SVM(Supported Vector Machine)的文本褒贬情感倾向分类方法。从语义的角度利用潜在语义分析方法建立“词-文档”的语义距离向量空间模型,然后使用具有良好分类精度和泛化能力的支持向量机进行情感分类。实验结果表明,该方法在句子简短、情感倾向比较明显的Web评论中的准确率较传统的SVM方法有了一定的提高,在测试集上的分类准确率接近88%。

关键词: 文本处理,语义距离,情感极性分类,潜在语义分析

Abstract: The analysis of sentimental orientation can be regarded as a problem of classification on emotional polarity.Under the background of the mass data processing,we proposed a classification approach in terms of sentimental orientation of texts based on LSA(Latent Semantic Analysis) and SVM(Supported Vector Machine),in order to improve the accuracy of the text emotional judgment.On the concept of semantics,we established a space model of "word-document" semantic distance vectors by the latent semantic analysis,and then on account of the privileges of accuracy and generalization of support vector machine,designed a SVM classifier with semantic distance as the input feature vectors.Experimental results validate that our method effectively improves the classification accuracy compared with the traditional SVM method.The classification accuracy rate rises to near 88% on the test set of Web comments with short sentences and explicit sentimental orientation.

Key words: Text processing,Semantic distance,Sentimental orientation classification,Latent semantic analysis

[1] Subasic P,Huettner A.Affect analysis of text using fuzzy se-mantic typing[J].IEEE Transactions on Fuzzy Systems,2001,9(4):483-496
[2] 熊德兰,程菊明,田胜利.基于HowNet的句子褒贬倾向性研究[J].计算机工程与应用,2008,4(22):143-145
[3] 唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究[J].中文信息学报,2007,1(6):88-94
[4] 徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,1(6):95-100
[5] 闻彬,何婷婷,罗乐,等.基于语义理解的文本情感分类方法研究[J].计算机科学,2010,7(6):261-264
[6] 段建勇,谢宇超,张梅.基于句法语义的网络舆论情感倾向性评价技术研究[J].情报杂志,2012,1(1):147-150
[7] 高伟,王中卿,李寿山.基于集成学习的半监督情感分类方法研究[J].中文信息学报,2013,7(3):120-126
[8] 代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32
[9] Ramos J.Using tf-idf to determine word relevance in document queries[C]∥Proceedings of the First Instructional Conference on Machine Learning.2003
[10] Dennis S,Landauer T,Kintsch W,et al.Introduction to latent semantic analysis[C]∥Slides from the tutorial given at the 25th Annual Meeting of the Cognitive Science Society.Boston,2003
[11] Landauer T K.Latent semantic analysis[M]∥Encyclopedia ofCognitive Science.Nature Pub Group,2006
[12] Chang C C,Lin C J.LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST),2011,2(3):27
[13] Kalman D.A singularly valuable decomposition:the SVD of a matrix[J].College Math Journal,1996
[14] Golub G H,Van Loan C F.Matrix computations [M].Baltimore,MD,USA:Johns Hopkins University Press,1996:374-426
[15] 盖杰,王怡,武港山.潜在语义分析理论及其应用[J].计算机应用研究,2004,1(3):9-12
[16] 宁健,林鸿飞.基于改进潜在语义分析的跨语言检索 [J].中文信息学报,2010,4(3):105-111
[17] 于江生,俞士汶.中文概念词典的结构[J].中文信息学报,2002,6(4):12-20
[18] 王卫国,徐炜民.基于潜在语义分析的个性化查询扩展模型[J].Computer Engineering,2010,36(21):43-45

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!