计算机科学 ›› 2014, Vol. 41 ›› Issue (4): 269-272.
缪裕青,张锦杏,刘少兵,文益民,明媚
MIAO Yu-qing,ZHANG Jin-xing,LIU Shao-bing,WEN Yi-min and MING Mei
摘要: 针对现有很多聚类算法不能有效处理大规模数据的问题,基于微簇和等价连接关系,提出一种能在Hadoop 平台实现高效并行化的聚类算法bigKClustering。算法将紧凑的数据抽象成一个向量,然后通过等价关系对这些向量进行连接,得到最终的聚类结果。实验结果表明,bigKClustering算法不仅具有良好的时间效率和聚类效果,而且具有良好的可伸缩性、加速比和时间稳定性。
[1] Malewicz G,Austern M H,et al.Pregel:a system for large-scale graph processing[C]∥Proceedings of the 2010international conference on Management of data.Indiana,USA,2010:135-146 [2] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[C]∥Proceedings of Operating Systems Design and Implementation.San Francisco,CA,2004:137-150 [3] 赵卫中,马慧芳,傅燕翔,等.基于云计算平台Hadoop的并k-means聚类算法设计研究[J].计算机科学,2011,8(10):166-169 [4] Ene A,Im Sung-jin,et al.Fast clustering using MapReduce[C]∥Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining.California,USA,2011:681-689 [5] Bahman B,Benjamin M,et al.Scalable k-means++[J].Proceedings of the VLDB Endowment,2012,5(7):622-633 [6] Aggarwal C C,Han Jia-wei,et al.A Framework for Clustering Evolving Data Streams [C]∥Proceedings of the International Conference on Very Large Data Bases.Berlin,Germany,2003:852-863 [7] Zhang Tian,Ramakrishnan R,et al.BIRCH:an efficient dataclustering method for very large databases[C]∥Proceedings of the 1996ACM SIGMOD international conference on Management of data.Montreal,Quebec,Canada,1996:103-114 [8] Ekanayake J,Li Hui,et al.Twister:a runtime for iterative Map-Reduce[C]∥Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing.Chicago,Illinois,2010:810-818 [9] http://archive.ics.uci.edu/ml/datasets/Cloud [10] http://elki.dbs.ifi.lmu.de/wiki/DataSetGenerator [11] Arthur D,Vassilvitskii S.k-means++:The advantages of careful seeding[C]∥Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.New Orleans,Louisiana,2007:1027-1035 |
No related articles found! |
|