计算机科学 ›› 2015, Vol. 42 ›› Issue (1): 232-235.doi: 10.11896/j.issn.1002-137X.2015.01.051

• 人工智能 • 上一篇    下一篇

一种AP算法的改进:M-AP聚类算法

甘月松,陈秀宏,陈晓晖   

  1. 江南大学数字媒体学院 无锡214122,江南大学数字媒体学院 无锡214122,江南大学数字媒体学院 无锡214122
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61373055)资助

Improved AP Algorithm:M-AP Clustering Algorithm

GAN Yue-song, CHEN Xiu-hong and CHEN Xiao-hui   

  • Online:2018-11-14 Published:2018-11-14

摘要: Affinity Propagation(AP)聚类算法将所有数据点作为潜在的聚类中心,在相似度矩阵的基础上通过消息传递进行聚类。与传统聚类方法相比,对于大规模数据集,AP是一种快速、有效的聚类方法。但是AP算法在聚类结构复杂的(非团状)数据集上得到的效果并不是很好。因此,在AP的基础上加入一个merge过程,将AP算法改进为M-AP算法,可以有效地解决这种问题。而当样本数目比较大时,将CVM压缩算法融入其中,可以有效地解决大样本问题。

关键词: 聚类,Affinity propagation(AP算法),M-AP,合并过程,CVM压缩

Abstract: Affinity propagation(AP) clustering simultaneously considers all data points as potential exemplars.It takes similarity between pairs of data points as input measures,and clusters gradually during the message-passing procedure.But the result of AP clustering algorithm in the data set of complex structure(non-group)is not very good.Therefore,we proposed a new clustering algorithm by adding a merge process on the basis of AP clustering algorithm,called M-AP algorithm which can effectively solve this kind of problem.When the number of samples is very large, the problem of large sample can be effectively solved by using CVM compression algorithm.

Key words: Clustering algorithm,Affinity propagation,Merge-AP,Merge process,CVM compress

[1] Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,5(5814):972-976
[2] Pollard D.Strong consistency of Kmeans clustering[J].Ailnals of Statistics,1981,9(1):135-140
[3] Zhang T,Ramakrishnan R,Livny M.BIRCH.An efficient data clustering method for very large databases[J].Montreal,1996,6(96):103-114
[4] Pal N R,Bezdek J C.On cluster validity for the fuzzy c-means model[J].IEEE Transactions on Fuzzy Systems,1995,3(3):370-379
[5] Tsang I W,Kwok J T,Cheung P M.Core vector machines:fast SVM training on very large data sets[J].Journal of Machine Learning Research,2005,8(6):363-392
[6] Deng Zhao-hong,Choi K S,Chung F L,et al.Enhanced soft sub-space clustering integrating within cluster and between blusterInformation[J].Pattern Recognition,2010,43(3):767-781
[7] Liu Jun,Mohammed J,Carter J,et al.Distance based clustering of CGH data[J].Bioinformatics,2006,22(16):1971-1978
[8] Chen W Y,Song Y,Bai H,et al.Parallel spectral clustering in distributed systems[J].IEEE Transactions onPattern Analysis and Machine Intelligence,2011,33(3):568-586
[9] Wu M,Schlkopf B.A local learning approach for clustering[J].Proc.Conf.Neural Information Processing Systems,2007(1):1529-1536
[10] Papadimitriou C H,Steiglitz K.Combinatorial Optimization:Algorithms and Complexity[M].Dover Publications,1998
[11] Optdigits数据集.https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/
[12] Iris数据集.http://archive.ics.uci.edu/ml/datasets/Iris
[13] Clean数据集.http://archive.ics.uci.edu/ml/machine-learning-databases/musk/
[14] Soybean数据集.http://archive.ics.uci.edu/ml/datasets/Soybean+(Small)
[15] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].Journal of Software,2008,19(1):48-61
[16] 肖宇,于剑.基于近邻传播算法的半监督聚类[J].Journal of Software,2008,19(11):2803-2813
[17] 牟廉明,詹德川,黎铭,等.基于结构相似性和压缩变换的聚类方法[J].Pattern Recognition and Artificial Intelligence,2011,24(5):637-644
[18] 于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学E辑,2002,2(2):274-280
[19] 杨善林,李永森,胡笑旋,等.K-means算法中的k值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101
[20] 周水庚,周傲英,曹晶.基于数据分区的DBSCAN算法[J].计算机研究与发展,2000(10):1153-1159
[21] Ester M.A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥ Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining,Portland:AAAI Press.1996:226-239
[22] 计华,张化祥,孙晓燕.基于最近邻原则的半监督聚类算法 [J].计算机工程与设计,2011,2(7):2455-2459
[23] 李昆仑,曹铮,曹丽苹.半监督聚类的若干新进展[J].模式识别与人工智能,2009(5):735-742
[24] DBSCAN算法代码.http://wenku.baidu.com/view/47a26ebbba0d4a7302763a9c.html

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!