Computer Science ›› 2016, Vol. 43 ›› Issue (5): 223-229.doi: 10.11896/j.issn.1002-137X.2016.05.041

Previous Articles     Next Articles

Research of Text Data Streams Clustering Algorithm Based on Affinity Propagation

LI Yi-ming, NI Li-ping, FANG Qing-hua and LIU Hui-ting   

  • Online:2018-12-01 Published:2018-12-01

Abstract: With the advent of the era of big data,a large amount of unstructured text data streams have emerged online.Those data streams are dynamic,high-dimensional and sparse.For these characteristics and on the basis of the traditionalAP algorithm,a text data stream clustering algorithm,called OAP-s algorithm,was proposed in this paper.By introducing attenuation factor in the AP algorithm,OAP-s algorithm passes the clustering center of the current window to the next window,while attenuating the results.However,this OAP-s algorithm also has some shortcomings.Therefore,we proposed another text data stream clustering algorithm,called OWAP-s algorithm.Based on the OAP-s algorithm,OWAP-s algorithm defines the weighted similarity,introduces attractive factor and makes the historic clustering center more attractive,thus obtains more accurate clustering results.Meanwhile,both algorithms adopt the sliding time window model,which reflects the temporal characteristics as well as the distribution of the data stream.Experimental results show that both algorithms are flexible and extensible,and they are more accurate and stable than OSKM algorithm.

Key words: Data mining,AP clustering,Text data,Sliding window,Weight

[1] PhridviRaja M S B,Chintakindi,Srinivasb,et al.Clustering Text Data Streams-A Tree Based Approach with Ternary Function and Ternary Feature Vector [J].Procedia Computer Science,2014(31):976-984
[2] Huang Guang-yan,He Jing.Mining Streams of Short Text for Analysis of World-wide Event Evolutions[J].World Wide Web,2015,18(5):1201-1217
[3] Zhang Jian-peng,Chen Fu-cai,Li Shao-mei,et al.Online Clustering Algorithm for Evolutionary Data Stream Based on Affine Propagation [J].Pattern Recognition and Artificial Intelligence,2014,27(5):443-451(in Chinese) 张建朋,陈福才,李邵梅,等.基于仿射传播的进化数据流在线聚类算法[J].模式识别与人工智能,2014,27(5):443-451
[4] Aggarwal C C.Mining Text Streams[M]∥Mining Text Data,2012:297-321
[5] Gong Ling-hui,Zeng Jian-ping,Zhang Shi-yong.Text StreamClustering Algorithm Based on Adaptive Feature Selection[J].Expert Systems with Applications,2011,8(3):1393-1399
[6] Aggarwal C C,Han J W,Wang J Y,et al.A Framework for Clustering Evolving Data Streams[C]∥Proc of the 29th International Conference on Very Large Data Bases.Berlin,Germany,2003:81-92
[7] Aggarwal C C,Yu P S.On Clustering Massive Text and Categorical Data Streams[J].Knowledge and Information Systems,2010,24(2):171-196
[8] Aggarwal C C,Han J W,Wang J Y,et al.A Framework for Projected Clustering of High Dimensional Data Streams[C]∥Proc of the 30th International Conference on Very Large Data Bases.Toronto,Canada,2004:852-863
[9] Liu Y B,Cai J R,Yin J,et al.Clustering Text Data Streams[J].Journal of Computer Science & Technology,2008,23(1):112-128
[10] Shi Zhong.Efficient Streaming Text Clustering[J].Neural Networks,2005,18(5/6):790-798
[11] Frey B J,Dueck D.Clustering by Passing Messages between Data Points[J].Science,2007,315(5814):972-976
[12] Guo Xiu-juan,Chen Ying.Analysis and Application of AP Clustering algorithm [J].Journal of Jilin Jianzhu University,2013,30(4):58-61(in Chinese) 郭秀娟,陈莹.AP聚类算法的分析与应用[J].吉林建筑大学学报,2013,30(4):58-61
[13] Strehl A,Ghosh J,Mooney R J.Impact of Similarity Measures on Web-page Clustering[C]∥AAAI Workshop on AI for Web Search.2000:58-64
[14] Zhang Zhen,Wang Bin-qiang,Yi Peng,et al.A Hierarchical Combination of Semi Supervised Neighbor Propagation Clustering Algorithm [J].Journal of Electronic & Information Technology,2013,5(3):645-651(in Chinese) 张震,汪斌强,伊鹏,等.一种分层组合的半监督近邻传播聚类算法[J].电子与信息学报,2013,5(3):645-651
[15] Zhang X L,Furtlehner C,Sebag M.Data Streaming with Affinity Propagation[C]∥Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases.Antwerp,Belgium,2008:628-643
[16] Karypis G.CLUTO-a clustering toolkit.2002.http://www-users.cs.umn.edu/~karypis/cluto

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!