Computer Science ›› 2016, Vol. 43 ›› Issue (Z11): 443-446, 450.doi: 10.11896/j.issn.1002-137X.2016.11A.099

Short Text Clustering Algorithm Combined with Context Semantic Information

ZHANG Qun, WANG Hong-jun and WANG Lun-wen   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Because short text faces the challenges of information insufficiency,high dimensions and feature sparsity,conventional text clustering method has limited effect when applied to short text.In view of above,this paper proposed a novel short text clustering algorithm combined with the context semantic information.Firstly,drawing lessons from the idea of centrality and prestige in the field of social network analysis,the algorithm improved conventional feature weight calculation by considering the semantic information in the context.And on this basis,it constructs the term-document matrix and then carried out the singular value decomposition on the matrix to map the original high dimensional term vector space to the lower dimensional latent semantic space.Finally it clusters the short text on the lower dimensional latent semantic space by the improved K-means clustering algorithm.Experimental results show that using our scheme can effectively improve the characteristics of information insufficiency,high dimensions and feature sparsity of short text compared to the traditional text clustering method,and greatly improve the evaluation indicators of short text clustering.

Key words: Short text clustering,Context semantic information,Singular value decomposition,K-means clustering algorithm

