计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 50-55.doi: 10.11896/j.issn.1002-137X.2019.08.008
张会兵1, 钟昊1, 胡晓丽2
ZHANG Hui-bing1, ZHONG Hao1, HU Xiao-li2
摘要: 在社会化商务中对用户评论进行合理的聚类分析有利于商家提供精准服务或推荐信息,文中提出了一种基于主题分析的用户评论聚类方法。根据主题词在用户评论中的互信息强度以及主题词之间的相似度计算主题词权重,并依此构建用户评论主题向量。在此基础上,提出了一种基于用户评论相似度自动选择canopy聚类算法初始阈值的自适应canopy+kmeans聚类算法,对主题向量进行聚类分析。在亚马逊的评论数据上进行测试,结果表明:该方法充分描述了用户评论中不同主题词对用户观点的突出程度不同,并改善了K-means聚类算法易陷入局部最优的缺点,与传统的LDA+K-means算法相比,取得了更好的效果。
中图分类号:
[1]QIAO Z,ZHANG X,ZHOU M,et al.A Domain Oriented LDA Model for Mining Product Defects from Online Customer Reviews[C]∥The,Hawaii International Conference on System Sciences.2017. [2]JO Y,OH A H.Aspect and sentiment unification model for online review analysis[C]∥ACM International Conference on Web Search and Data Mining.ACM,2011:815-824. [3]IVAN T,MCDONALD R.A joint model of text and aspect ra- tings for sentiment summarization[C]∥Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.ACL,2008:308-316. [4]ZHANG W,XU M,JIANG Q.Opinion Mining and Sentiment Analysis in Social Media:Challenges and Applications[C]∥International Conference on HCI in Business,Government,and Organizations.Springer,Cham,2018. [5]XIAO R,KONG L,ZHANG Y.A text clustering model for diverse versions discovery[J].Caai Transactions on Intelligent Systems,2012,2(4):1113-1117. [6]YANG Y,WANG J,HUANG W,et al.TopicPie:An Interactive Visualization for LDA-Based Topic Analysis[C]∥IEEE Second International Conference on Multimedia Big Data.IEEE,2016:25-32. [7]KHANMOHAMMADI S,ADIBEIG N,SHANEHBANDY S. An Improved Overlapping k-Means Clustering Method for Medi-cal Applications[J].Expert Systems with Applications,2017,67(1):12-18. [8]FISH L S.Hierarchical Relationship Development:Parents and Children[J].Journal of Marital & Family Therapy,2010,26(4):501-510. [9]WU J,HOU S X,JIN M M,et al.LDA Feature Selection Based Text Classification and User Clustering in Chinese Online Health Community[J].Journal of the China Society for Scien-tific and Technical Information,2017(11):1183-1191.(in Chinese) 吴江,侯绍新,靳萌萌,等.基于LDA模型特征选择的在线医疗社区文本分类及用户聚类研究[J].情报学报,2017(11):1183-1191. [10]ALHARBI A S,LI Y,XU Y.Integrating LDA with Clustering Technique for Relevance Feature Selection[C]∥Australasian Joint Conference on Artificial Intelligence.Springer,Cham,2017:274-286. [11]CAI Y,CHEN X,PENG P X,et al.A LDA Feature Grouping Method for Subspace Clustering of Text Data[C]∥Pacific Asia Workshop on Intelligence and Security Informatics.Springer International Publishing,2014:78-90. [12]SHI M,LIU J,ZHOU D,et al.WE-LDA:A Word Embeddings Augmented LDA Model for Web Services Clustering[C]∥IEEE International Conference on Web Services.IEEE Computer So-ciety,2017:9-16. [13]XIAOBO T.Research on Micro-blog Topic Retrieval Model Based on the Integration of Text Clustering with LDA[J].Information Studies:Theory & Application,2013,36(8):85-90. [14]SUADAA L H,PURWARIANTI A.Combination of Latent Dirichlet Allocation (LDA)and Term Frequency-Inverse Cluster Frequency (TFxICF)in Indonesian text clustering with labeling[C]∥International Conference on Information and Communication Technology.IEEE,2016:1-6. [15]LI C,YANG C,JIANG Q.The research on text clustering based on LDA joint model[J].Journal of Intelligent & Fuzzy Systems,2017,32(5):3655-3667. |
[1] | 徐海燕,姜瑛. 基于用户评论的代码质量识别与分析 Code Quality Recognition and Analysis Based on User’s Comments 计算机科学, 2020, 47(3): 41-47. https://doi.org/10.11896/jsjkx.191100132 |
[2] | 王莹, 郑丽伟, 张禹尧, 张晓妘. 面向中文APP用户评论数据的软件需求挖掘方法 Software Requirement Mining Method for Chinese APP User Review Data 计算机科学, 2020, 47(12): 56-64. https://doi.org/10.11896/jsjkx.201200031 |
[3] | 冉猛,姜瑛. APP软件的用户评论模式分析方法 Analytical Method for APP Software’s User Comment Patterns 计算机科学, 2017, 44(11): 181-186. https://doi.org/10.11896/j.issn.1002-137X.2017.11.027 |
[4] | 陈庄,黄勇,邹航. 基于离群点挖掘的工业控制系统异常检测 Anomaly Detection of Industrial Control System Based on Outlier Mining 计算机科学, 2014, 41(5): 178-181. https://doi.org/10.11896/j.issn.1002-137X.2014.05.037 |
[5] | 张博锋 苏金树. 文本分类中用于协同的特征集分割 计算机科学, 2009, 36(2): 142-145. |
|