计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 254-258.doi: 10.11896/j.issn.1002-137X.2017.10.046

• 人工智能 • 上一篇    下一篇

基于主题模型和情感分析的垃圾评论识别方法研究

金相宏,李琳,钟珞   

  1. 武汉理工大学计算机科学与技术学院 武汉430070,武汉理工大学计算机科学与技术学院 武汉430070,武汉理工大学计算机科学与技术学院 武汉430070
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家社会科学基金(15BGL048),国家863计划项目(2015AA015403)资助

Review Spam Detection Approach Based on Topic Model and Sentiment Analysis

JIN Xiang-hong, LI Lin and ZHONG Luo   

  • Online:2018-12-01 Published:2018-12-01

摘要: 随着电子商务的飞速发展,网络购物越来越被消费者认同,而随之产生的产品评论给消费者的购买决策带来了影响。产品评论是指用户在购物站点上对商品的评价信息,而 经过分析和研究发现这些评论中充斥着大量的垃圾评论,因此垃圾评论的识别成了电子商务在提高服务质量的过程中需解决的重要问题之一。根据垃圾评论的主要特点提出LDA-SP(LDA-Sentiment Polarity)垃圾评论识别方法。首先利用LDA主题模型过滤出内容型垃圾评论,然后结合情感分析识别出欺骗型垃圾评论。对网络商城的大量评论数据进行准确度分析实验的结果表明,LDA-SP方法的识别准确度高于传统的LDA主题模型和单一的情感极性分析方法,能够有效地检测垃圾评论,从而使产品评论信息更加客观准确,为电子商务用户提供了有效的参考信息。

关键词: 产品评论,垃圾评论,主题模型,情感分析

Abstract: With the rapid development of e-commerce,consumers have accepted online shopping increasingly,and the product reviews then have an great influence on consumers’ purchase decision.Product reviews refer to the evaluation or comment information of items or products written by online shopping users.These comments usually include some review spams that may hurt user shopping experiences.Review spam detection,therefore,becomes one of the important problems to improve service quality.In this paper,a review spam detection approach called LDA-SP(LDA-sentiment polarity) was proposed by carefully analyzing the main characteristic of review spams.First,we used LDA topic model to filter the irrelevant reviews,and then applied sentiment analysis to identify the untruthful reviews.Experiments were conducted on a large number of reviews data on a online shopping mall.Our experimental results show that the detection accuracy of LDA-SP method is higher than that of the traditional LDA topic model and the single sentiment polarity analysis method.It can effectively detect review spams,so that more objective and accurate information about products will be displayed to the users of e-commerce.

Key words: Product reviews,Review spam,Topic model,Sentiment analysis

[1] DoubleClick Search before the purchase Understanding BuyerSearch Activity as it Builds to Online purchase.http:// www.Doubleclick.com /insight/pdfs/searchpurchase_0502.pdf.
[2] HEYDARI A,ALITAVAKOLI M,SALIM N,et al.Detection of review spam:A survey[J].Expert Systems with Applications,2015,2(7):3634-3642.
[3] SUN S Y,TIAN X.Product review comment spam detection research[J].Computer Science,2011,38(10):198-201.(in Chinese) 孙升芸,田萱.产品垃圾评论检测研究综述[J].计算机科学,2011,38(10):198-201.
[4] GILBERT E,KARAHALIOS K.Understanding Deja Reviewers [C]∥Proc.of ACM Conference on Computer Supported Coo-perative Work.New York,USA,2010:225-228.
[5] JINDAL N,LIU B.Opinion spam and analysis[C]∥Internatio-nal Conference on Web Search and Data Mining.ACM,2008:219-230.
[6] OTT M,CHOI Y J,CARDIE C,et al.Finding Deceptive Opi-nion Spam by Any Stretch of the Imagination[C]∥Proceedings of the 49th Annual Meeting of the Association for Computatio-nal Linguistics:Human Language Technologies.Stroudsburg,PA,USA:Association for Computational Linguistics,2011:309-319.
[7] LIU B.Sentiment Analysis and Opinon Mining[M].Chicago:Morgan & Clayppol,2012:113-115.
[8] MA Y,LI F.Detecting review spam:Challenges and opportunities[C]∥2012 8th International Conference on Collaborative Computing:Networking,Applications and Worksharing (CollaborateCom).IEEE,2012:651-654.
[9] DIAO Y F,LIN H F.LDA-based Opionion Spam Discovering[J].Journal of Chinese Information Processing,2011,5(1):41-47.(in Chinese) 刁宇峰,林鸿飞.基于LDA模型的博客垃圾评论发现[J].中文信息学报,2011,5(1):41-47.
[10] LAI C L,XU K Q,LAU R Y K,et al.Toward a language mo-deling approach for consumer review spam detection[C]∥2010 IEEE 7th International Conference on e-Business Engineering (ICEBE).IEEE,2010:1-8.
[11] JIN J,JI P.Co-training Algorithm for Quality Analysis of Online Customer Reviews[J].Journal of Shanghai University(Natural Science Edition),2014,0(3):289-295.(in Chinese) 靳健,季平.用于在线产品评论质量分析的Co-trainning算法[J].上海大学学报(自然科学版),2014,0(3):289-295.
[12] 中科院分词系统[DB/OL].http://ictclas.org.
[13] XU L H,LIN H F,PAN Y,et al.The structure of the emotional vocabulary ontology[J].Journal of Emotion,2008,27(2):180-185.(in Chinese) 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情感学报,2008,27(2):180-185.
[14] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003(3):993-1022.
[15] QIU Y F,WANG J K,SHAO L S.Research on Product Review Spammer Detection Based on Users’ Behavior[J].Computer Engineering,2012,8(11):254-257.(in Chinese) 邱云飞,王建坤,邵良杉.基于用户行为的产品垃圾评论者检测研究[J].计算机工程,2012,38(11):254-257.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .