Computer Science ›› 2019, Vol. 46 ›› Issue (8): 50-55.doi: 10.11896/j.issn.1002-137X.2019.08.008

• Big Data & Data Science • Previous Articles     Next Articles

User Reviews Clustering Method Based on Topic Analysis

ZHANG Hui-bing1, ZHONG Hao1, HU Xiao-li2   

  1. (Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)1
    (Practice and Experiment Station,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)2
  • Received:2018-07-19 Online:2019-08-15 Published:2019-08-15

Abstract: The rational clustering analysis of user reviews in social business is beneficial to providing accurate service or recommendation information.This paper proposed a user reviews clustering method based on topic analysis.According to the mutual information intensity of topic words in user reviews and the similarity between topic words,the weight of topic words is calculated,and the topic vector of user reviews is constructed.On this basis,an adaptive canopy+kmeans clustering algorithm based on user comment similarity to automatically select the initial threshold of canopy clustering algorithm is proposed,which is used to cluster and analyze the subject vector.On Amazon’s review data,the results show that the proposed method makes full use of the weight of different topic words in the user’s reviews and improves the disadvantage of the K-means clustering algorithm easily falling into the local optimal.Compared with the traditional LDA+K-means algorithm,the proposed method can achieve better results

Key words: User reviews, Topic analysis, Topic vector, Adaptive clustering

CLC Number: 

  • TP399
[1] QIAO Z,ZHANG X,ZHOU M,et al.A Domain Oriented LDA Model for Mining Product Defects from Online Customer Reviews[C]∥The,Hawaii International Conference on System Sciences.2017.
[2] JO Y,OH A H.Aspect and sentiment unification model for online review analysis[C]∥ACM International Conference on Web Search and Data Mining.ACM,2011:815-824.
[3] IVAN T,MCDONALD R.A joint model of text and aspect ra- tings for sentiment summarization[C]∥Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.ACL,2008:308-316.
[4] ZHANG W,XU M,JIANG Q.Opinion Mining and Sentiment Analysis in Social Media:Challenges and Applications[C]∥International Conference on HCI in Business,Government,and Organizations.Springer,Cham,2018.
[5] XIAO R,KONG L,ZHANG Y.A text clustering model for diverse versions discovery[J].Caai Transactions on Intelligent Systems,2012,2(4):1113-1117.
[6] YANG Y,WANG J,HUANG W,et al.TopicPie:An Interactive Visualization for LDA-Based Topic Analysis[C]∥IEEE Second International Conference on Multimedia Big Data.IEEE,2016:25-32.
[7] KHANMOHAMMADI S,ADIBEIG N,SHANEHBANDY S. An Improved Overlapping k-Means Clustering Method for Medi-cal Applications[J].Expert Systems with Applications,2017,67(1):12-18.
[8] FISH L S.Hierarchical Relationship Development:Parents and Children[J].Journal of Marital & Family Therapy,2010,26(4):501-510.
[9] WU J,HOU S X,JIN M M,et al.LDA Feature Selection Based Text Classification and User Clustering in Chinese Online Health Community[J].Journal of the China Society for Scien-tific and Technical Information,2017(11):1183-1191.(in Chinese) 吴江,侯绍新,靳萌萌,等.基于LDA模型特征选择的在线医疗社区文本分类及用户聚类研究[J].情报学报,2017(11):1183-1191.
[10] ALHARBI A S,LI Y,XU Y.Integrating LDA with Clustering Technique for Relevance Feature Selection[C]∥Australasian Joint Conference on Artificial Intelligence.Springer,Cham,2017:274-286.
[11] CAI Y,CHEN X,PENG P X,et al.A LDA Feature Grouping Method for Subspace Clustering of Text Data[C]∥Pacific Asia Workshop on Intelligence and Security Informatics.Springer International Publishing,2014:78-90.
[12] SHI M,LIU J,ZHOU D,et al.WE-LDA:A Word Embeddings Augmented LDA Model for Web Services Clustering[C]∥IEEE International Conference on Web Services.IEEE Computer So-ciety,2017:9-16.
[13] XIAOBO T.Research on Micro-blog Topic Retrieval Model Based on the Integration of Text Clustering with LDA[J].Information Studies:Theory & Application,2013,36(8):85-90.
[14] SUADAA L H,PURWARIANTI A.Combination of Latent Dirichlet Allocation (LDA)and Term Frequency-Inverse Cluster Frequency (TFxICF)in Indonesian text clustering with labeling[C]∥International Conference on Information and Communication Technology.IEEE,2016:1-6.
[15] LI C,YANG C,JIANG Q.The research on text clustering based on LDA joint model[J].Journal of Intelligent & Fuzzy Systems,2017,32(5):3655-3667.
[1] WANG Ying, ZHENG Li-wei, ZHANG Yu-yao, ZHANG Xiao-yun. Software Requirement Mining Method for Chinese APP User Review Data [J]. Computer Science, 2020, 47(12): 56-64.
[2] CHEN Zhuang,HUANG Yong and ZOU Hang. Anomaly Detection of Industrial Control System Based on Outlier Mining [J]. Computer Science, 2014, 41(5): 178-181.
[3] ZHANG Bo-feng ,SU Jin-shu (School of Computer,National University of Defense Teehnology,Changsha 410073,China). [J]. Computer Science, 2009, 36(2): 142-145.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .