计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 50-55.doi: 10.11896/j.issn.1002-137X.2019.08.008

• 大数据与数据科学* • 上一篇    下一篇

基于主题分析的用户评论聚类方法

张会兵1, 钟昊1, 胡晓丽2   

  1. (桂林电子科技大学广西可信软件重点实验室 广西 桂林541004)1
    (桂林电子科技大学教学实践部 广西 桂林541004)2
  • 收稿日期:2018-07-19 出版日期:2019-08-15 发布日期:2019-08-15
  • 通讯作者: 胡晓丽(1978-),女,硕士,讲师,主要研究方向为计算机应用技术、物联网,E-mail:huxiaoli@guet.edu.cn
  • 作者简介:张会兵(1976-),男,博士,副教授,主要研究方向为物联网/移动互联网、嵌入式与移动计算;钟昊(1993-),男,硕士生,主要研究方向为农业物联网、社交网络计算与可信计算,E-mail:401924605@qq.com
  • 基金资助:
    国家自然科学基金项目(61662013,U1501252,U1711263,61662015,61562014),广西科技重大专项(AA17202024),广西自然科学基金项目(2017GXNSFAA198372,2016GXNSFAA380149),广西师范大学教育发展基金会第四批“教师成长基金”项目(EDF2015005)

User Reviews Clustering Method Based on Topic Analysis

ZHANG Hui-bing1, ZHONG Hao1, HU Xiao-li2   

  1. (Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)1
    (Practice and Experiment Station,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)2
  • Received:2018-07-19 Online:2019-08-15 Published:2019-08-15

摘要: 在社会化商务中对用户评论进行合理的聚类分析有利于商家提供精准服务或推荐信息,文中提出了一种基于主题分析的用户评论聚类方法。根据主题词在用户评论中的互信息强度以及主题词之间的相似度计算主题词权重,并依此构建用户评论主题向量。在此基础上,提出了一种基于用户评论相似度自动选择canopy聚类算法初始阈值的自适应canopy+kmeans聚类算法,对主题向量进行聚类分析。在亚马逊的评论数据上进行测试,结果表明:该方法充分描述了用户评论中不同主题词对用户观点的突出程度不同,并改善了K-means聚类算法易陷入局部最优的缺点,与传统的LDA+K-means算法相比,取得了更好的效果。

关键词: 用户评论, 主题分析, 主题向量, 自适应聚类

Abstract: The rational clustering analysis of user reviews in social business is beneficial to providing accurate service or recommendation information.This paper proposed a user reviews clustering method based on topic analysis.According to the mutual information intensity of topic words in user reviews and the similarity between topic words,the weight of topic words is calculated,and the topic vector of user reviews is constructed.On this basis,an adaptive canopy+kmeans clustering algorithm based on user comment similarity to automatically select the initial threshold of canopy clustering algorithm is proposed,which is used to cluster and analyze the subject vector.On Amazon’s review data,the results show that the proposed method makes full use of the weight of different topic words in the user’s reviews and improves the disadvantage of the K-means clustering algorithm easily falling into the local optimal.Compared with the traditional LDA+K-means algorithm,the proposed method can achieve better results

Key words: Adaptive clustering, Topic analysis, Topic vector, User reviews

中图分类号: 

  • TP399
[1]QIAO Z,ZHANG X,ZHOU M,et al.A Domain Oriented LDA Model for Mining Product Defects from Online Customer Reviews[C]∥The,Hawaii International Conference on System Sciences.2017.
[2]JO Y,OH A H.Aspect and sentiment unification model for online review analysis[C]∥ACM International Conference on Web Search and Data Mining.ACM,2011:815-824.
[3]IVAN T,MCDONALD R.A joint model of text and aspect ra- tings for sentiment summarization[C]∥Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.ACL,2008:308-316.
[4]ZHANG W,XU M,JIANG Q.Opinion Mining and Sentiment Analysis in Social Media:Challenges and Applications[C]∥International Conference on HCI in Business,Government,and Organizations.Springer,Cham,2018.
[5]XIAO R,KONG L,ZHANG Y.A text clustering model for diverse versions discovery[J].Caai Transactions on Intelligent Systems,2012,2(4):1113-1117.
[6]YANG Y,WANG J,HUANG W,et al.TopicPie:An Interactive Visualization for LDA-Based Topic Analysis[C]∥IEEE Second International Conference on Multimedia Big Data.IEEE,2016:25-32.
[7]KHANMOHAMMADI S,ADIBEIG N,SHANEHBANDY S. An Improved Overlapping k-Means Clustering Method for Medi-cal Applications[J].Expert Systems with Applications,2017,67(1):12-18.
[8]FISH L S.Hierarchical Relationship Development:Parents and Children[J].Journal of Marital & Family Therapy,2010,26(4):501-510.
[9]WU J,HOU S X,JIN M M,et al.LDA Feature Selection Based Text Classification and User Clustering in Chinese Online Health Community[J].Journal of the China Society for Scien-tific and Technical Information,2017(11):1183-1191.(in Chinese) 吴江,侯绍新,靳萌萌,等.基于LDA模型特征选择的在线医疗社区文本分类及用户聚类研究[J].情报学报,2017(11):1183-1191.
[10]ALHARBI A S,LI Y,XU Y.Integrating LDA with Clustering Technique for Relevance Feature Selection[C]∥Australasian Joint Conference on Artificial Intelligence.Springer,Cham,2017:274-286.
[11]CAI Y,CHEN X,PENG P X,et al.A LDA Feature Grouping Method for Subspace Clustering of Text Data[C]∥Pacific Asia Workshop on Intelligence and Security Informatics.Springer International Publishing,2014:78-90.
[12]SHI M,LIU J,ZHOU D,et al.WE-LDA:A Word Embeddings Augmented LDA Model for Web Services Clustering[C]∥IEEE International Conference on Web Services.IEEE Computer So-ciety,2017:9-16.
[13]XIAOBO T.Research on Micro-blog Topic Retrieval Model Based on the Integration of Text Clustering with LDA[J].Information Studies:Theory & Application,2013,36(8):85-90.
[14]SUADAA L H,PURWARIANTI A.Combination of Latent Dirichlet Allocation (LDA)and Term Frequency-Inverse Cluster Frequency (TFxICF)in Indonesian text clustering with labeling[C]∥International Conference on Information and Communication Technology.IEEE,2016:1-6.
[15]LI C,YANG C,JIANG Q.The research on text clustering based on LDA joint model[J].Journal of Intelligent & Fuzzy Systems,2017,32(5):3655-3667.
[1] 徐海燕,姜瑛.
基于用户评论的代码质量识别与分析
Code Quality Recognition and Analysis Based on User’s Comments
计算机科学, 2020, 47(3): 41-47. https://doi.org/10.11896/jsjkx.191100132
[2] 王莹, 郑丽伟, 张禹尧, 张晓妘.
面向中文APP用户评论数据的软件需求挖掘方法
Software Requirement Mining Method for Chinese APP User Review Data
计算机科学, 2020, 47(12): 56-64. https://doi.org/10.11896/jsjkx.201200031
[3] 冉猛,姜瑛.
APP软件的用户评论模式分析方法
Analytical Method for APP Software’s User Comment Patterns
计算机科学, 2017, 44(11): 181-186. https://doi.org/10.11896/j.issn.1002-137X.2017.11.027
[4] 陈庄,黄勇,邹航.
基于离群点挖掘的工业控制系统异常检测
Anomaly Detection of Industrial Control System Based on Outlier Mining
计算机科学, 2014, 41(5): 178-181. https://doi.org/10.11896/j.issn.1002-137X.2014.05.037
[5] 张博锋 苏金树.
文本分类中用于协同的特征集分割

计算机科学, 2009, 36(2): 142-145.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!