计算机科学 ›› 2013, Vol. 40 ›› Issue (7): 167-172.

• 人工智能 • 上一篇    下一篇

基于混合测度的并行仿射传播聚类算法

张建朋,陈福才,李邵梅,于洪涛   

  1. 国家数字交换系统工程技术研究中心 郑州450002;国家数字交换系统工程技术研究中心 郑州450002;国家数字交换系统工程技术研究中心 郑州450002;国家数字交换系统工程技术研究中心 郑州450002
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家高技术研究发展计划(“863”计划),国家重点基础研究发展计划(“973”计划)基金资助

Parallel Affinity Propagation Clustering Algorithm Based on Hybrid Measure

ZHANG Jian-peng,CHEN Fu-cai,LI Shao-mei and YU Hong-tao   

  • Online:2018-11-16 Published:2018-11-16

摘要: 针对仿射传播聚类(AP)算法应用于流形结构复杂、密度不均匀的数据集存在的不足,通过学习数据集的低维流形结构,提出了密度自适应的“流形距离核”(ad-MDK)的概念。该距离测度既考虑了数据点的局部密度信息,又包含了数据集全局结构信息,从而提高了算法对这类数据集的处理能力。同时,针对引入流形距离所带来的计算复杂问题,提出了算法的并行化设计方法,有效提高了算法处理效率。通过在多个数据集上的实验验证了所提算法在处理大规模多尺度数据集上的性能优于传统AP算法。

关键词: 仿射传播聚类,流形距离核,共享最近邻,并行计算 中图法分类号TP181文献标识码A

Abstract: Affinity propagation clustering (AP) algorithm has a difficult to get ideal clustering results in a complex manifold structure and non-uniform density datasets.Through studying the low-dimensional manifold structure,this paper drawed out the density auto-adapted "manifold distance kernel " concept(ad-MDK),which takes into account the local density information of data points,also contains the overall structure of the data set globally,making the algorithm can sovle complex distributed data clustering problem.Meanwhile,in order to reduce the manifold distance calculated,parallel algorithm for the proposed algorithm was introduced to the affinity propagation clustering to effectively improve the speed of the algorithm.Experiments on several data sets verify that the proposed algorithm is superior to the traditional AP algorithm performance in dealing with large-scale multi-scale data set.

Key words: Affinity propagation,Manifold distance kernel,Shared nearest neighbor,Parallel computation

[1] Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976
[2] Jain A K.Data clustering:50years beyond K-means[J].Pattern Recognition lett,2009,9(11)
[3] von Luxburg U.A tutorial on spectral clustering[R].Technical report.Max Planck Institute for Biological Cybernetics,2006
[4] Wang K,Zhang J,Li D,et al.Adaptive Affinity PropagationClustering[J].Acta Automatica Sinica,2007,33(12):1242-1246
[5] 肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,9(11):2803-2813
[6] 董俊,王锁萍,熊范纶.可变相似性度量的近邻传播聚类[J].电子与信息学报,2010,32(3):509-514
[7] Zhou D,Bousquet O.Learning with Local and Global Consistency [A]∥Proceeding of advances in Neural Information Proces-sing Systems [C].Cambridge:MIT Press,2004:321-328
[8] Zelnik-Manor L,Perona P.Self-tuning spectral clustering[M].Advances in Neural Information Processing Systems (NIPS),Cambridge,MA:MIT Press,2004
[9] Ertoz L,Steinbach M,Kumar V.A new shared nearest neighbor clustering algorithm and its applications[C]∥Workshop on Clustering High Dimensional Data and its Applications,Second SIAM International Conference on Data Mining.Arlington,VA,USA,2002
[10] Zhang X L,Furtlehner C.Toward autonomic grids:Analyzing the job flow with affinity streaming[C]∥KDD ’09:Proceedings of the 15th ACM SIGKDD international conference on Know-ledge discovery and data mining.2009
[11] Song Y,Chen W-Y,Bai H,et al.Parallel spectral clustering[C]∥Proceedings of Learning and Principles and Practice of Know-ledge Discovery in Databases (ECML/PKDD).2008:374-389
[12] Asuncion A,Newman D J.UCI machine learning repository.University of California,Irvine,School of Information and Computer Sciences[C/OL].http://archive.ics.uci.edu/ml/datasets.html,2007

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!