Computer Science ›› 2015, Vol. 42 ›› Issue (3): 201-205.doi: 10.11896/j.issn.1002-137X.2015.03.041

Previous Articles     Next Articles

Clustering Algorithm CARDBK Improved from K-means Algorithm

ZHU Ye-hang, LI Yan-ling, CUI Meng-tian and YANG Xian-wen   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The difference between our clustering algorithm and batch K-means algorithm is that in our algorithm each point is not only attributable to one cluster,instead affects multiple cluster centroid values,and the degree of influence of a point on a cluster centroid depends on the distance values between this point and the other more near cluster centroids.Our algorithm and a number of different algorithms on a number of different data sets were clustered respectively from the point of view of their clustering result’s five performance index values such as entropy,purity,F1 value,Rand Index and normalized mutual information,and the results show our algorithm has a better clustering results.Our algorithm and a number of different algorithms were clustered respectively on one same data set but under many different initialization conditions,and clustering results of our algorithm are preferably more stable and better.Cluster on different size data sets by our algorithm has a linear scalability and is faster.

Key words: Clustering,Text clustering,Document clustering,K-means,Algorithm

[1] 朱烨行.文档聚类算法研究[D].西安:西北工业大学,2009
[2] Zhao Ying,Karypis G.Criterion functions for document clustering:Experiments and analysis[R/OL].2003-04-23[2008-10-29].http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download
[3] Anon.an Introduction to Cluster Analysis for Data Mining[EB/OL].2000-02-10[2008-12-2].http://www.dol88.com/p-567183494975.html
[4] 刘泉凤,陆蓓,王小华.文本挖掘中聚类算法的比较研究[J].计算机时代,2005(6):7-8,22
[5] 谷波,张永奎.文本聚类算法的分析与比较[J].电脑开发与应用,2003,16(11):4-6
[6] Bernd F.Some Competitive Learning Methods[R/OL].1997-04-05[2008-10-22].http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/JavaPaper/
[7] Ridella S,Rovetta S,Zunino R.Plastic Algorithm for AdaptiveVector Quantisation[J].Neural Computing & Applications,1998,7(1):37-51
[8] Pal N R,Bezdek J C,Tsao E C K.Generalized Clustering Networks and Kohonen’s Self-Organizing Scheme[J].IEEE Transaction on Neural Networks,1993,4(4):549-557
[9] Hansen P,Mladenovic N.J-Means:A New Local Search Heuristic for Minimum Sum-of-Squares Clustering[J].Pattern Recognition,2001,34(2):405-413
[10] 唐春生,张磊,潘东,等.文本分类研究进展[EB/OL].[2008-10-24].http://c.xml.org.cn/blog/uploadfile/20076211443809.PDF
[11] Karypis G.CLUTO- Software for Clustering High-Dimensional Datasets[CP/OL].2008[2008-10-25].http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download
[12] Han E H,Boley D,Gini M,et al.WebACE:A webagent for do-cument categorization and exploration[C]∥Proceedings of the Second International Conference on Autonomous Agents.Minneapolis,Minnesota,United States:ACM,1998:408-415
[13] Beil F,Ester M,Xu Xiao-wei.Frequent term-based text clustering[C]∥Proceedings of the Eighth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining.New York:ACM,2002:436-442
[14] Karypis G,Han Eui-hong.Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval[C]∥Proceedings of the Ninth International Confe-rence on Information and Knowledge Management.New York:ACM,2000:12-19
[15] Hersh W,Buckley C,Leone T,et al.OHSUMED:An Interactive Retrieval Evaluation and New Large Test Collection for Research[C]∥Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin,Ireland:ACM,1994:192-201
[16] Juan A,Vidal E.Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm[C]∥Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition.London,UK:Springer-Verlag,2000:842-852
[17] 梁冯珍,宋占杰,张玉环.应用概率统计[M].天津:天津大学出版社,2004
[18] Hanselman D,Littlefield B.精通MATLAB:综合辅导与掼[M].李人厚,张平安,译.西安:西安交通大学出版社,1998
[19] Velleman P F,Hoaglin D C.Applications,Basics,and Computing of Exploratory Data Analysis[M].Boston:Duxbury Press,c2004
[20] Macqueen J.Some methods of classification and analysis of multivariate observations[M]∥Le Cam L M,Neyman J,ed..Proc.of the fifth Berkeley Symposium on Mathematical Statistics and Probability.Los Angeles,USA:University of California Press,1967:281-297

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .