计算机科学 ›› 2014, Vol. 41 ›› Issue (3): 185-188.

• 人工智能 • 上一篇    下一篇

面向大规模数据的分层近邻传播聚类算法

刘晓楠,尹美娟,李明涛,姚东,陈武平   

  1. 解放军信息工程大学 郑州450001;解放军信息工程大学 郑州450001;解放军信息工程大学 郑州450001;解放军信息工程大学 郑州450001;信息保障技术重点实验室 北京100072
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受信息保障技术重点实验室开放基金(KJ-12-04)资助

Hierarchical Affinity Propagation Clustering for Large-scale Data Set

LIU Xiao-nan,YIN Mei-juan,LI Ming-tao,YAO Dong and CHEN Wu-ping   

  • Online:2018-11-14 Published:2018-11-14

摘要: 近邻传播(Affinity Propagation,AP)聚类具有不需要设定聚类个数、快速准确的优点,但无法适应于大规模数据的应用需求。针对此问题,提出了分层近邻传播聚类算法。首先,将待聚类数据集划分为若干适合AP算法高效执行的子集,分别推举出各个子集的聚类中心;然后对所有子集聚类中心再次执行AP聚类,推举出整个数据集的全局聚类中心;最后根据与这些全局聚类中心的相似度对聚类样本进行划分,从而实现对大规模数据的高效聚类。在真实和模拟数据集上的实验结果均表明,与AP聚类和自适应AP聚类相比,该方法在保证较好聚类效果的同时,极大地降低了聚类的时间消耗。

关键词: 数据聚类,近邻传播,分层推举,聚类中心 中图法分类号TP301.6文献标识码A

Abstract: Affinity Propagation (AP) has advantages on efficiency and accuracy,and has no need to set the number of clusters,but is not suitable for large-scale data clustering.Hierarchical Affinity Propagation (HAP) was proposed to overcome this problem.Firstly,the data set was divided into several subsets that can be effectively clustered by AP to select the exemplars of each subset.Then,AP clustering was implemented again on all the subset exemplars to select exemplars of the whole data set.Finally,all the data points were clustered according to similarities with the exemplars,and realizing efficient clustering of large-scale data set.The experimental results on real and simulated data sets show that,compared with traditional AP and adaptive AP,HAP reduces the time consumption greatly and achieves a good clustering result in the meanwhile.

Key words: Data clustering,Affinity propagation,Hierarchical selecting,Clustering center

[1] Frey B J,Dueck D.Clustering by Passing Messages Between Data Points[J].Science,2007,5(5814):972-976
[2] 王开军,张军英,李丹,等.自适应仿射传播聚类[J].自动化学报,2007,3(12):1241-1246 (下转第192页)(上接第188页)
[3] Wang C,Lai J,Suen C,et al.Multi-Exemplar Affinity Propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(9):2223-2237
[4] Sakellariou A,Sanoudou D,Spyrou G.Combining multiple hypothesis testing and affinity propagation clustering leads to accurate,robust and sample size independent classification on gene expression data[J].BMC bioinformatics,2012,13(1):270
[5] Wang L,Zhang L.Color Image Segmentation Algorithm Based on Affinity Propagation Clustering[J].Foundations of Intelligent Systems.Springer Berlin Heidelberg,2012,122:731-739
[6] 王开军,李健,张军英,等.半监督的仿射传播聚类[J].计算机工程,2007,33(23):197-201
[7] He Yan-cheng,Chen Qing-cai,Xiao-long,et al.An Adaptive Affinity Propagation Document Clustering[C]∥Proceedings of the 7th International Conference on Informatics and Systems.Shenzhen,China,2010:1-7
[8] Zhong Y,Zheng M,Wu J,et al.Search the Optimal Preference of Affinity Propagation Algorithm[C]∥2012Fifth International Conference on Intelligent Computation Technology and Automation (ICICTA).IEEE,2012:304-307
[9] Shang F,Jiao L C,Shi J,et al.Fast affinity propagation clustering:A multilevel approach [J].Pattern recognition,2012,45(1):474-486
[10] 张震,汪斌强,伊鹏,等.一种分层组合的半监督近邻传播聚类算法[J].电子与信息学报,2013,35(3)
[11] Frey B J.Affinity propagation FAQ [EB/OL].http://www.psi.toronto.edu/ affinitypropagation/faq.html,2012-01-05/2012-12-01
[12] Ding Fan,Luo Zhi-gang,Shi Jin-long,et al.Overlapping community detection by kernel-based fuzzy affinity propagation[C]∥Proceedings of International Workshop on Intelligent Systems and Applications.Changsha,China,2010:1-4
[13] Dudoit S,Fridlyand J.A prediction-based resampling method for estimating the number of clusters in a dataset [J].Genome Bio-logy,2002,3(7):1-21
[14] Rousseuw P J.Silhouettes:a graphical aid to the interpretation and validation of cluster analysis [J].Journal of Computational and Applied Mathematics,1987,20:53-65
[15] Blake C L,Merz C J.UCI repository of machine learning databases[EB/OL].http://archive.ics.uci.edu/ml/,2012-05-01/2012-12-01
[16] Fred A L N,Jain A K.Robust Data Clustering[C]∥Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Wisconsin,USA,2003

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!