Computer Science ›› 2016, Vol. 43 ›› Issue (5): 243-246.doi: 10.11896/j.issn.1002-137X.2016.05.045

Previous Articles     Next Articles

Improved Text Clustering Algorithm Based on Kolmogorov Complexity

WANG You-hua and CHEN Xiao-rong   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Clustering algorithm based on Kolmogorov complexity has the advantages of generality,parameter indepen-dence,but always shows low accuracy when applied to the text semantic information clustering.In order to solve this problem,this paper proposed a text clustering algorithm based on feature extension-DEF-KC.For improving keyword’stheme contribution,DEF-KC applies feature extension to the keyword in the pretreated text by referencing information of specific entry in a baidu encyclopedia,and calculates the text similarity by approximate Kolmogorov complexity of the text.Finally it clusters text using spectral clustering algorithm.The experimental results show that the proposed algorithm has much better accuracy and recall rate compared to the traditional text clustering algorithm based on Kolmogorov complexity.

Key words: Kolmogorov complexity,Text clustering,Feature extension,Spectral clustering

