计算机科学 ›› 2015, Vol. 42 ›› Issue (3): 201-205.doi: 10.11896/j.issn.1002-137X.2015.03.041
朱烨行,李艳玲,崔梦天,杨献文
ZHU Ye-hang, LI Yan-ling, CUI Meng-tian and YANG Xian-wen
摘要: CARDBK聚类算法与批K-means算法的不同之处在于,每个点不是只归属于一个簇,而是同时影响多个簇的质心值,一个点影响某一个簇的质心值的程度取决于该点与其它离该点更近的簇的质心之间的距离值。 从聚类结果的熵、纯度、F1值、Rand Index和NMI等5个性能指标值来看,与多个不同算法在多个不同数据集上分别聚类相比, 该算法具有较好的聚类结果;与多个不同算法在同一数据集上很多不同的初始化条件下分别聚类相比,该算法具有较好且稳定的聚类结果;该算法在不同大小数据集上聚类时具有线性伸缩性且速度较快。
[1] 朱烨行.文档聚类算法研究[D].西安:西北工业大学,2009 [2] Zhao Ying,Karypis G.Criterion functions for document clustering:Experiments and analysis[R/OL].2003-04-23[2008-10-29].http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download [3] Anon.an Introduction to Cluster Analysis for Data Mining[EB/OL].2000-02-10[2008-12-2].http://www.dol88.com/p-567183494975.html [4] 刘泉凤,陆蓓,王小华.文本挖掘中聚类算法的比较研究[J].计算机时代,2005(6):7-8,22 [5] 谷波,张永奎.文本聚类算法的分析与比较[J].电脑开发与应用,2003,16(11):4-6 [6] Bernd F.Some Competitive Learning Methods[R/OL].1997-04-05[2008-10-22].http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/JavaPaper/ [7] Ridella S,Rovetta S,Zunino R.Plastic Algorithm for AdaptiveVector Quantisation[J].Neural Computing & Applications,1998,7(1):37-51 [8] Pal N R,Bezdek J C,Tsao E C K.Generalized Clustering Networks and Kohonen’s Self-Organizing Scheme[J].IEEE Transaction on Neural Networks,1993,4(4):549-557 [9] Hansen P,Mladenovic N.J-Means:A New Local Search Heuristic for Minimum Sum-of-Squares Clustering[J].Pattern Recognition,2001,34(2):405-413 [10] 唐春生,张磊,潘东,等.文本分类研究进展[EB/OL].[2008-10-24].http://c.xml.org.cn/blog/uploadfile/20076211443809.PDF [11] Karypis G.CLUTO- Software for Clustering High-Dimensional Datasets[CP/OL].2008[2008-10-25].http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download [12] Han E H,Boley D,Gini M,et al.WebACE:A webagent for do-cument categorization and exploration[C]∥Proceedings of the Second International Conference on Autonomous Agents.Minneapolis,Minnesota,United States:ACM,1998:408-415 [13] Beil F,Ester M,Xu Xiao-wei.Frequent term-based text clustering[C]∥Proceedings of the Eighth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining.New York:ACM,2002:436-442 [14] Karypis G,Han Eui-hong.Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval[C]∥Proceedings of the Ninth International Confe-rence on Information and Knowledge Management.New York:ACM,2000:12-19 [15] Hersh W,Buckley C,Leone T,et al.OHSUMED:An Interactive Retrieval Evaluation and New Large Test Collection for Research[C]∥Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Dublin,Ireland:ACM,1994:192-201 [16] Juan A,Vidal E.Comparison of Four Initialization Techniques for the K-Medians Clustering Algorithm[C]∥Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition.London,UK:Springer-Verlag,2000:842-852 [17] 梁冯珍,宋占杰,张玉环.应用概率统计[M].天津:天津大学出版社,2004 [18] Hanselman D,Littlefield B.精通MATLAB:综合辅导与掼[M].李人厚,张平安,译.西安:西安交通大学出版社,1998 [19] Velleman P F,Hoaglin D C.Applications,Basics,and Computing of Exploratory Data Analysis[M].Boston:Duxbury Press,c2004 [20] Macqueen J.Some methods of classification and analysis of multivariate observations[M]∥Le Cam L M,Neyman J,ed..Proc.of the fifth Berkeley Symposium on Mathematical Statistics and Probability.Los Angeles,USA:University of California Press,1967:281-297 |
No related articles found! |
|