计算机科学 ›› 2013, Vol. 40 ›› Issue (10): 190-193.

• 软件与数据库技术 • 上一篇    下一篇

MR-GSpar:一种基于MapReduce的大图稀疏化算法

陈德华,周蒙,孙延青,郑亮亮   

  1. 东华大学计算机科学与技术学院 上海201620;东华大学计算机科学与技术学院 上海201620;上海市浦东新区电子政务管理中心 上海 200135;东华大学计算机科学与技术学院 上海201620
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目(61070031,61070032)资助

MR-GSpar:A Distributed Large Graph Sparsification Algorithm Based on MapReduce

CHEN De-hua,ZHOU Meng,SUN Yan-qing and ZHENG Liang-liang   

  • Online:2018-11-16 Published:2018-11-16

摘要: 图的稀疏化是图聚类分析中数据预处理的关键操作,已得到广泛的关注。针对图数据日益普及、规模不断增大的现状,提出了一种基于MapReduce的面向大规模图的稀疏化算法,即MR-GSpar算法。该算法在MapReduce并行计算框架的基础上,通过对传统的最小哈希(Minhash)算法的并行化改造,使其可在分布式的集群环境中实现对大规模图数据的高效稀疏化处理。真实数据集上的实验表明了该算法的可行性与有效性。

关键词: 图稀疏化,Minhash,MapReduce框架,MR-GSpar算法

Abstract: As an important data pre-processing operation,graph sparsification has attracted wide attentions from the area of database.Nowadays the graph data is becoming popular and scale.Thus this paper proposed an efficient parallel graph sparsification algorithm,namely the MR-GSpar algorithm.The MR-GSpar algorithm is presented by reforming the traditional Minhash algorithm into a parallel and distributed algorithm using MapReduce framework,which can archive efficient sparsification on large-scale graph data in a large machine cluster environment.Experiments on real datasets show that the algorithm is feasible and effective.

Key words: Graph sparsification,Minhsh,MapReduce framework,MR-GSpar algorithm

[1] Satuluri V,Parthasarathy S.Scalable graph clustering using stochastic flows:applications to community discovery[C]∥ACM SIGKDD.2009:737-746
[2] Kulis B,Basu S,Dhillon I,et al.Semi-supervised Graph Clustering:A Kernel Approach[J].Machine Learning,2009,74(1):1-22
[3] Satuluri V,Parthasarathy S,Ruan Y.Local graph Sparsification for Scalable Clustering[C]∥SIGMOD.2011:737-746
[4] 李建江,崔健,王聃,等.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642
[5] Lin J,Schatz M.Design patterns for efficient graph algorithms in mapreduce[C]∥MLG.2010:78-85
[6] 尹丹,高宏,邹兆年,等.一种新的高效图聚集算法[J].计算机研究与发展,2011,48(10):1831-1841
[7] Lv Qin,Josephson W,Wang Zhe,et al.Multi-probe LSH:efficient indexing for high-dimensional similarity search[C]∥Proc of the 33rd Int Conf on Very Large Data Bases(VLDB’07). Vien-na Austria:VLDB Endowment,2007:950-961

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!