Computer Science ›› 2017, Vol. 44 ›› Issue (2): 257-261, 274.doi: 10.11896/j.issn.1002-137X.2017.02.042

Micro-blog Topic Detection Method Integrating BTM Topic Model and K-means Clustering

LI Wei-jiang, WANG Zhen-zhen and YU Zheng-tao   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Recently,the development of micro-blog provides people with convenient communication.Because every piece of micro-blog is limited in 140 words,large scale of short texts appear.In the meantime,discovering topics from short texts genuinely becomes an intractable problem.It is hard for traditional topic model to model short texts,such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA).They suffer from the severe data sparsity when disposing short texts.Moreover,K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference between topic documents is distinct.In order to improve data sparsity,BTM topic model was employed to process short texts-micro-blog data for alleviating the problem of sparsity in this paper.At the same time,we integrated K-means clustering algorithm into BTM(Bi-term Topic Model) for topics discovery further.The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.

Key words: Short text,Topic model,Topic discovery,K-means clustering

