基于Twitter Storm平台并行挖掘最稠密子图

Abstract

Abstract: In large scale graph,finding densest subgraph has a wide range of applications,such as community discovery,spam detection and reference relation extraction．Based on tagged undirected graph,we introduced the concept of QLS and designed an approximation algorithm DSFLC which can quickly find the densest subgraph:users submit a QLS and the algorithm will return the densest subgraph under QLS within the time that user can accept．For any ε＞0,DSFLC only needs to scan large-scale data sets O(log1+εn)times,and can ensure the approximation algorithm is a 2(1＋ε)－approximation algorithm．After analyzing DSFLC,we found this algorithm is easy to parallelize,so we chose Twitter Storm platform to parallel DSFLC algorithm．Finally,the test data sets extracted from the DBLP database verify DSFLC’ practicality and scalability.

Key words: Densest subgraph finding,QLS,DSFLC algorithm,Twitter storm platform

WANG Jin-ming and WANG Yuan-fang. Parallel Mining of Densest Subgraph Based on Twitter Storm[J].Computer Science, 2014, 41(1): 274-278.

References

[1] Yon D,Filippo G,Marco P．Extraction and classification of dense communities in the Web[C]∥WWW 2007．2007:461-470
[2] Newman M E J．Modularity and community structure in net-works[C]∥PNAS 2006．2006:8577-8582
[3] William F G,Steve L,Lee G C．Efficient identification of Web communities[C]∥KDD 2000．2000:150-160
[4] https://github.com/nathanmarz/storm
[5] http://www.dblp.org/db/
[6] Goldberg A V．Finding a maximum density Subgraph[R]．UCB/CSD-84-171．EECS Department,University of California,Berkeley,1984
[7] Charikar M.Greedy approximation algorithms for finding dense components in a graph[C]∥APPROX 2000．2000:84-90
[8] Lawler F．Combinatorial Optimization:Networks and Matroids[M]．Holt,Rinehart,and Winston,1976
[9] Gibson D,Kumar R,Tomkins A．Discovering large dense subgraphs in massive graphs[C]∥VLDB 2005．2005:721-732
[10] Bahmani B,umar R．Sergei Vassilvitskii:Densest Subgraph inStreaming and MapReduce[C]∥VLDB 2012．2012:454-465
[11] Silva A,Meira W Jr,Zaki M J．Mining Attribute-structure Correlated Patterns in Large Attributed Graphs[C]∥VLDB．2012:466-477
[12] Jeffrey D,Sanjay G．MapReduce:simplified data processing onlarge clusters[C]∥ACM 2008．2008:107-113
[13] Bu Ying-yi,Howe B,Balazinska M,et al.HaLoop:Efficient Iterative Data Processing on Large Clusters[C]∥VLDB 2010．2010:285-296
[14] Abouzeid A,Bajda-Pawlikowski K,Abadi D J,et al.HadoopDB:An architectural hybrid of MapReduce and DBMS technologies for analytical workloads[C]∥VLDB 2009．2009:922-933
[15] Condie T,Conway N,Alvaro P,et al.MapReduce Online[C]∥NSDI’10Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation.USENIX Association Berkeley,CA,USA,2010:21

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Parallel Mining of Densest Subgraph Based on Twitter Storm

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0