计算机科学 ›› 2014, Vol. 41 ›› Issue (Z11): 316-319.

• 数据挖掘 • 上一篇    下一篇

云环境下基于Canopy聚类的FCM算法研究

余长俊,张燃   

  1. 武汉理工大学计算机学院 武汉430063;武汉理工大学计算机学院 武汉430063
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受教育部:网络时代的科技论文快速共享研究(20111140004)资助

Research of FCM Algorithm Based on Canopy Clustering Algorithm under Cloud Environment

YU Chang-jun and ZHANG Ran   

  • Online:2018-11-14 Published:2018-11-14

摘要: FCM算法是目前广泛使用的算法之一。,针对FCM聚类质量和收敛速度依赖于初始聚类中心的问题,结合Canopy聚类算法能够粗略快速地对数据集进行聚类的优点,提出了一种基于Canopy聚类的FCM算法。该算法通过将Canopy算法快速获取到的聚类中心作为FCM算法的输入来加快FCM算法收敛速度。并在云环境下设计了其MapReduce化方案,实验结果表明,MapReduce化的基于Canopy聚类的FCM算法比MapReduce化的FCM聚类算法具有更好的聚类质量和运行速度。

关键词: FCM算法(模糊均值聚类算法),聚类,MapReduce,云环境

Abstract: FCM algorithm is one of the widely used algorithms,but the quality and convergence speed of it depend on the quality of the initial cluster centers.Because Canopy algorithm can quickly cluster the data set and get the cluster centers, we proposed the FCM algorithm combining with Canopy cluster algorithm. The algorithm accelerates the convergence rate by making the clustering center obtained by canopy algorithm as the input of FCM.Then we designed its MapReduce scheme in a cloud environment.Experimental results show that the MapReduce of FCM clustering algorithm based on Canopy clustering algorithm has better clustering quality and speed than MapReduce of FCM clustering algorithm.

Key words: FCM algorithm(Fuzzy C Means algorithm),Clustering,MapReduce,Cloud environment

[1] 张建强,郑晓薇,吴华平.模糊C均值聚类算法的并行化研究[J].微型机与应用,2010,29(23):8-18
[2] 虞倩,戴月明.基于MapReduce 的并行模糊C均值算法[J].计算机工程与应用,2013,9(14):133-137
[3] 高新波,裴继红,谢维信.模糊C均值聚类算法中加权指数m的研究[J].电子学报,2000,4:80-83
[4] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61
[5] Esteves R M,Rong C.Using Mahout for clustering Wikipedia’s latest articles:A comparison between k-means and fuzzy c-means in the cloud [C]∥Proceedings of the 2011 Third IEEE International Conference on Cloud Computing Technology and Science.Washington,DC:IEEE Computer Society,2011:565-569
[6] McCallum A,Nigam K,Ungar L H.Efficient clustering of high-dimensional data sets with application to reference matching[C]∥Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2000:169-178
[7] 李应安.基于MapReduce的聚类算法的并行化研究[D].广州:中山大学,2010
[8] Ruspini E H.Numerical methods for fuzzy clustering[J].Information Sciences,1970,2(3):319-350
[9] 赵洪昌.云环境下的关联分析和模糊聚类研究[D].南京:南京信息工程大学,2013
[10] 陈爱平.基于Hadoop的聚类算法并行化分析及应用研究[D].成都:电子科技大学,2012
[11] Ohmann T,Rahal I.Efficient clustering-based source code plagiarism detection using PIY[J].Knowledge and Information Systems,2014,3:1-28
[12] 余丹.关于查全率和查准率的新认识[J].西南民族大学学报,2009(2):283-285

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!