Computer Science ›› 2018, Vol. 45 ›› Issue (12): 299-307.doi: 10.11896/j.issn.1002-137X.2018.12.048

• Interdiscipline & Frontier • Previous Articles     Next Articles

Method of Similarity Join on Uncertain Graphs Using MapReduce

MIAO Feng-yu1, WANG Hong-zhi2, RUAN Qun-sheng3   

  1. (College of Information & Mechanical and Electrical Engineering,Ningde Normal University,Ningde,Fujian 352100,China)1
    (School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)2
    (College of Software,Xiamen University,Xiamen,Fujian 361000,China)3
  • Received:2017-12-28 Online:2018-12-15 Published:2019-02-25

Abstract: Compared to the similarity join on the deterministic graph,the similarity join on the uncertain graph usually has greater practical application value and higher computational complexity.This paper studied the similarity join on uncertain graph databases based on MapReduce distributed programming framework,and proposed two pruning strategies using the existence probability,namely Map Side Pruning and Reduce Side Pruning.The Map Side Pruning can filter out the uncertain graphs at the Map side which have no chance to be similar with any other uncertain graph at the Map side.The Reduce Side Pruning can reduce the candidate pairs at Reduce side in the process of reduction.Based on the above pruning approaches,this paper proposed a similarity join algorithm MUGSJoin on uncertain graph databases based on MapReduce.The experiment results show that the proposed approach has much better performance and scala-bility than the similar algorithms.

Key words: MapReduce, Similarity join, Uncertain graph

CLC Number: 

  • TP311
[1]ZHAO X,XIAO C,LIN X,et al.Efficient Graph Similarity Joins with Edit Distance Constraints[C]∥Proceedings of IEEE Conference on Data Engineering.IEEE Press,2012:834-845.
[2]YUAN Y,WANG G,CHEN L,et al.Graph similarity search on large uncertain graph databases[J].Vldb Journal,2015,24(2):271-296.
[3]ZHENG W,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]∥ACM International Conference on Conference on Information & Know-ledge Management.ACM,2013:1595-1600.
[4]WANG Y,WANG H,LI J,et al.Efficient graph similarity join for information integration on graphs[J].Frontiers of Computer Science,2016,10(2):317-329.
[5]PANG J,YU G,XU J,et al.Similarity Joins on Massive Data Based on MapReduce Framework[J].Computer Science,2015,42(1):1-5.(in Chinese)
庞俊,于戈,许嘉,等.基于MapReduce框架的海量数据相似性连接研究进展[J].计算机科学,2015,42(1):1-5.
[6]MIAO F Y,WANG H Z.A Method of Similarity Join on Uncertain Graph Database[J].Journal of Software,2018,29(10):3150-3163.(in Chinese)
缪丰羽,王宏志.一种不确定图数据库上的相似性连接方法[J].软件学报,2018,29(10):3150-3163.
[7]LIN J,SCHATZ M.Design patterns for efficient graph algorithms in MapReduce[C]∥Eighth Workshop on Mining & Learning with Graphs(Mlg 10).2010:78-85.
[8]PLIMPTON S J,DEVINE K D.MapReduce in MPI for Large-scale graph algorithms[J].Parallel Computing,2011,37(9):610-632.
[9]QIN L,YU J X,CHANG L,et al.Scalable big graph processing in MapReduce[C]∥Proceedings of the ACM SIGMOD International Conference on Mangement of Data.2014:22-27.
[10]KOLDA T G,PINAR A,PLANTENGA T,et al.Counting Trian-gles in Massive Graphs with MapReduce [J].Siam Journal on Scientific Computing,2013,36(5):48-77.
[11]CHEN Y,ZHAO X,XIAO C,et al.Efficient and Scalable Graph Similarity Joins in MapReduce[J].The Scientific World Journal,2014,2014(3):749028.
[12]CHEN Y F,ZHAO X,HE P J,et al.BMGSJoin:A MapReduce Based Graph Similarity Join Algorithm[J].Pattem Recognition and Artificial Intelligence,2015,28(5):472-480.(in Chinese)
陈一帆,赵翔,何培俊,等.BMGSJoin:一种基于MapReduce的图相似度连接算法[J].模式识别与人工智能,2015,28(5):472-480.
[13]PANG J,GU Y,XU J,et al.Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce[J].Lecture Notes in Computer Science,2014,8485:415-418.
[14]SANJAY G,HOWARD G,SHAN-TAK L.The Google file system[J].ACM SIGOPS Operationg Systems Review,2003,37(5):29-43.
[15]AFRATI F N,ULLMAN J D.Optimizing joins in a map-reduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1282-1298.
[16]OKCAN A,RIEDEWALD M.Processing theta-joins using MapReduce[C]∥ACM SIGMOD International Conference on Management of Data(SIGMOD 2011).Athens,Greece,DBLP,2011:949-960.
[17]YANG H C,DASDAN A,HSIAO R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2007:1029-1040.
[18]AFRATI F N,SARMA A D,MENESTRINA D,et al.FuzzyJoins Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE,2012:498-509.
[19]LUO W,TAN H,MAO H,et al.Efficient Similarity Joins on Massive High-Dimensional Datasets Using MapReduce[C]∥IEEE International Conference on Mobile Data Management.IEEE Computer Society,2012:1-10.
[20]WANG Y,WANG H,LI J,et al.Efficient subgraph join based on connectivity similarity[J].World Wide Web-internet & Web Information Systems,2015,18(4):871-887.
[21]KIM Y,SHIM K.Parallel Top-K Similarity Join Algorithms Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE Computer Society,2012:510-521.
[22]MORRIS R,MORRIS R.ExOR:opportunistic multi-hop routing for wireless networks[C]∥Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.ACM,2005:133-144.
[23]CEOL A,CHATR ARYAMONTRI A,LICATA L,et al.MINT,the molecular interaction database:2009 update.
[J].Nucleic Acids Research,2010,38(suppl1):D532-D539.
[24]CHUA H N,SUNG W K,WONG L.Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions[M]∥Data Mining for Biomedical Applications.Springer Berlin Heidelberg,2006:1623-1630.
[1] LIU Wei-ming, AN Ran, MAO Yi-min. Parallel Support Vector Machine Algorithm Based on Clustering and WOA [J]. Computer Science, 2022, 49(7): 64-72.
[2] ZHANG Yuan-ming, YU Jia-rui, JIANG Jian-bo, LU Jia-wei, XIAO Gang. Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework [J]. Computer Science, 2021, 48(2): 41-46.
[3] WANG Tong, MA Wen-ping, LUO Wei. Information Sharing and Secure Multi-party Computing Model Based on Blockchain [J]. Computer Science, 2019, 46(9): 162-168.
[4] WANG Xiao-xia, SUN De-cai. Q-sample-based Local Similarity Join Parallel Algorithm [J]. Computer Science, 2019, 46(12): 38-44.
[5] HU Ying-shuang, LU Yi-hong. Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion [J]. Computer Science, 2019, 46(11A): 204-207.
[6] QI Yu-dong,HE Cheng,SI Wei-chao. Cloud Resource Selection Algorithm by Skyline under MapReduce Frame [J]. Computer Science, 2018, 45(6A): 411-414.
[7] ZHANG Bin, LE Jia-jin. Hash Join in MapReduce Distributed Environment Based on Column-store [J]. Computer Science, 2018, 45(6A): 471-475.
[8] ZHOU Hua-ping, LIU Guang-zong and ZHANG Bei-bei. Load Balancing Strategy of MapReduce Clustering Based on Index Shift [J]. Computer Science, 2018, 45(5): 303-309.
[9] WANG Hua-jin, LI Jian-hui, SHEN Zhi-hong and ZHOU Yuan-chun. ORC Metadata Based Reducer Load Balancing Method for Hive Join Queries [J]. Computer Science, 2018, 45(3): 158-164.
[10] YING Yi, REN Kai, LIU Ya-jun. Network Log Analysis Technology Based on Big Data [J]. Computer Science, 2018, 45(11A): 353-355.
[11] DING Yong, ZHU Chang-shui, WU Yu-yan. Association Rule Mining Algorithm Based on Hadoop [J]. Computer Science, 2018, 45(11A): 409-411.
[12] SONG Zhe-li, WANG Chao, WANG Zhen-fei. Multi-level Feature Selection Mechanism Based on MapReduce [J]. Computer Science, 2018, 45(11A): 468-473.
[13] TIAN Xian-zhong and SHEN Jie. Personalized Recommendation Based on Probabilistic Matrix Factorization in Big Data Environment [J]. Computer Science, 2017, 44(Z6): 438-441.
[14] ZENG Sai-wen, WEN Zhong-hua, DAI Liang-wei and YUAN Run. Analysis of Network Security Based on Uncertain Attack Graph Path [J]. Computer Science, 2017, 44(Z6): 351-355.
[15] WANG Shuo, SUN Guang-ming, ZOU Jing-zhao and LI Wei-sheng. Parallel Collaborative Filtering Algorithm Based on User Recommended Influence [J]. Computer Science, 2017, 44(9): 250-255.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!