计算机科学 ›› 2018, Vol. 45 ›› Issue (12): 299-307.doi: 10.11896/j.issn.1002-137X.2018.12.048

• 交叉与前沿 • 上一篇    下一篇

一种基于MapReduce的不确定图上的相似性连接方法

缪丰羽1, 王宏志2, 阮群生3   

  1. (宁德师范学院信息与机电工程学院 福建 宁德352100)1
    (哈尔滨工业大学计算机科学与技术学院 哈尔滨150001)2
    (厦门大学软件学院 福建 厦门361000)3
  • 收稿日期:2017-12-28 出版日期:2018-12-15 发布日期:2019-02-25
  • 作者简介:缪丰羽(1983-),女,硕士,讲师,主要研究方向为图数据管理、XML数据库查询优化,E-mail:feathen1983@163.com(通信作者);王宏志(1978-),男,博士,副教授,博士生导师,主要研究方向为大数据、数据质量、图数据管理;阮群生(1979-),男,博士生,副教授,主要研究方向为大数据、人工智能。
  • 基金资助:
    本文受国家自然科学基金重点项目(U1509216),国家重点研发计划项目(2016YFB1000703),国家自然科学基金面上项目(61472099,61602129),福建省自然科学基金面上项目(2018J01555),福建省教育厅中青年项目(JAT170653)资助。

Method of Similarity Join on Uncertain Graphs Using MapReduce

MIAO Feng-yu1, WANG Hong-zhi2, RUAN Qun-sheng3   

  1. (College of Information & Mechanical and Electrical Engineering,Ningde Normal University,Ningde,Fujian 352100,China)1
    (School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)2
    (College of Software,Xiamen University,Xiamen,Fujian 361000,China)3
  • Received:2017-12-28 Online:2018-12-15 Published:2019-02-25

摘要: 相比于确定图上的相似性连接,不确定图上的相似性连接通常具有更大的实际应用价值以及计算复杂性。文中研究了基于MapReduce分布式编程框架的不确定图上的相似性连接问题,提出了基于概率和的Map方剪枝和Reduce方剪枝的两种剪枝策略。Map方剪枝策略在映射过程中过滤掉了不可能具有相似图的不确定图。Reduce方剪枝策略用于减少约减过程中的候选图对。基于这两种剪枝策略,文中提出了一种基于MapReduce框架的不确定图上的相似性连接算法MUGSJoin。实验结果证明,该算法与同类算法相比具有更好的性能和可扩展性。

关键词: MapReduce, 不确定图, 相似性连接

Abstract: Compared to the similarity join on the deterministic graph,the similarity join on the uncertain graph usually has greater practical application value and higher computational complexity.This paper studied the similarity join on uncertain graph databases based on MapReduce distributed programming framework,and proposed two pruning strategies using the existence probability,namely Map Side Pruning and Reduce Side Pruning.The Map Side Pruning can filter out the uncertain graphs at the Map side which have no chance to be similar with any other uncertain graph at the Map side.The Reduce Side Pruning can reduce the candidate pairs at Reduce side in the process of reduction.Based on the above pruning approaches,this paper proposed a similarity join algorithm MUGSJoin on uncertain graph databases based on MapReduce.The experiment results show that the proposed approach has much better performance and scala-bility than the similar algorithms.

Key words: MapReduce, Similarity join, Uncertain graph

中图分类号: 

  • TP311
[1]ZHAO X,XIAO C,LIN X,et al.Efficient Graph Similarity Joins with Edit Distance Constraints[C]∥Proceedings of IEEE Conference on Data Engineering.IEEE Press,2012:834-845.
[2]YUAN Y,WANG G,CHEN L,et al.Graph similarity search on large uncertain graph databases[J].Vldb Journal,2015,24(2):271-296.
[3]ZHENG W,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]∥ACM International Conference on Conference on Information & Know-ledge Management.ACM,2013:1595-1600.
[4]WANG Y,WANG H,LI J,et al.Efficient graph similarity join for information integration on graphs[J].Frontiers of Computer Science,2016,10(2):317-329.
[5]PANG J,YU G,XU J,et al.Similarity Joins on Massive Data Based on MapReduce Framework[J].Computer Science,2015,42(1):1-5.(in Chinese)
庞俊,于戈,许嘉,等.基于MapReduce框架的海量数据相似性连接研究进展[J].计算机科学,2015,42(1):1-5.
[6]MIAO F Y,WANG H Z.A Method of Similarity Join on Uncertain Graph Database[J].Journal of Software,2018,29(10):3150-3163.(in Chinese)
缪丰羽,王宏志.一种不确定图数据库上的相似性连接方法[J].软件学报,2018,29(10):3150-3163.
[7]LIN J,SCHATZ M.Design patterns for efficient graph algorithms in MapReduce[C]∥Eighth Workshop on Mining & Learning with Graphs(Mlg 10).2010:78-85.
[8]PLIMPTON S J,DEVINE K D.MapReduce in MPI for Large-scale graph algorithms[J].Parallel Computing,2011,37(9):610-632.
[9]QIN L,YU J X,CHANG L,et al.Scalable big graph processing in MapReduce[C]∥Proceedings of the ACM SIGMOD International Conference on Mangement of Data.2014:22-27.
[10]KOLDA T G,PINAR A,PLANTENGA T,et al.Counting Trian-gles in Massive Graphs with MapReduce [J].Siam Journal on Scientific Computing,2013,36(5):48-77.
[11]CHEN Y,ZHAO X,XIAO C,et al.Efficient and Scalable Graph Similarity Joins in MapReduce[J].The Scientific World Journal,2014,2014(3):749028.
[12]CHEN Y F,ZHAO X,HE P J,et al.BMGSJoin:A MapReduce Based Graph Similarity Join Algorithm[J].Pattem Recognition and Artificial Intelligence,2015,28(5):472-480.(in Chinese)
陈一帆,赵翔,何培俊,等.BMGSJoin:一种基于MapReduce的图相似度连接算法[J].模式识别与人工智能,2015,28(5):472-480.
[13]PANG J,GU Y,XU J,et al.Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce[J].Lecture Notes in Computer Science,2014,8485:415-418.
[14]SANJAY G,HOWARD G,SHAN-TAK L.The Google file system[J].ACM SIGOPS Operationg Systems Review,2003,37(5):29-43.
[15]AFRATI F N,ULLMAN J D.Optimizing joins in a map-reduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1282-1298.
[16]OKCAN A,RIEDEWALD M.Processing theta-joins using MapReduce[C]∥ACM SIGMOD International Conference on Management of Data(SIGMOD 2011).Athens,Greece,DBLP,2011:949-960.
[17]YANG H C,DASDAN A,HSIAO R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2007:1029-1040.
[18]AFRATI F N,SARMA A D,MENESTRINA D,et al.FuzzyJoins Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE,2012:498-509.
[19]LUO W,TAN H,MAO H,et al.Efficient Similarity Joins on Massive High-Dimensional Datasets Using MapReduce[C]∥IEEE International Conference on Mobile Data Management.IEEE Computer Society,2012:1-10.
[20]WANG Y,WANG H,LI J,et al.Efficient subgraph join based on connectivity similarity[J].World Wide Web-internet & Web Information Systems,2015,18(4):871-887.
[21]KIM Y,SHIM K.Parallel Top-K Similarity Join Algorithms Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE Computer Society,2012:510-521.
[22]MORRIS R,MORRIS R.ExOR:opportunistic multi-hop routing for wireless networks[C]∥Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.ACM,2005:133-144.
[23]CEOL A,CHATR ARYAMONTRI A,LICATA L,et al.MINT,the molecular interaction database:2009 update.
[J].Nucleic Acids Research,2010,38(suppl1):D532-D539.
[24]CHUA H N,SUNG W K,WONG L.Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions[M]∥Data Mining for Biomedical Applications.Springer Berlin Heidelberg,2006:1623-1630.
[1] 刘卫明, 安冉, 毛伊敏.
基于聚类和WOA的并行支持向量机算法
Parallel Support Vector Machine Algorithm Based on Clustering and WOA
计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040
[2] 张元鸣, 虞家睿, 蒋建波, 陆佳炜, 肖刚.
面向MapReduce的中间数据传输流水线优化机制
Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework
计算机科学, 2021, 48(2): 41-46. https://doi.org/10.11896/jsjkx.191000103
[3] 王童, 马文平, 罗维.
基于区块链的信息共享及安全多方计算模型
Information Sharing and Secure Multi-party Computing Model Based on Blockchain
计算机科学, 2019, 46(9): 162-168. https://doi.org/10.11896/j.issn.1002-137X.2019.09.023
[4] 王晓霞, 孙德才.
一种基于Q-sample的局部相似连接并行算法
Q-sample-based Local Similarity Join Parallel Algorithm
计算机科学, 2019, 46(12): 38-44. https://doi.org/10.11896/jsjkx.190100240
[5] 胡赢双, 陆亿红.
基于MapReduce的强连通网格聚类算法
Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion
计算机科学, 2019, 46(11A): 204-207.
[6] 齐玉东,何诚,司维超.
MapReduce框架下的Skyline云资源选择算法
Cloud Resource Selection Algorithm by Skyline under MapReduce Frame
计算机科学, 2018, 45(6A): 411-414.
[7] 张滨, 乐嘉锦.
基于列存储的MapReduce分布式Hash连接算法
Hash Join in MapReduce Distributed Environment Based on Column-store
计算机科学, 2018, 45(6A): 471-475.
[8] 周华平,刘光宗,张贝贝.
基于索引偏移的MapReduce聚类负载均衡策略
Load Balancing Strategy of MapReduce Clustering Based on Index Shift
计算机科学, 2018, 45(5): 303-309. https://doi.org/10.11896/j.issn.1002-137X.2018.05.053
[9] 王华进,黎建辉,沈志宏,周园春.
基于ORC元数据的Hive Join查询Reducer负载均衡方法
ORC Metadata Based Reducer Load Balancing Method for Hive Join Queries
计算机科学, 2018, 45(3): 158-164. https://doi.org/10.11896/j.issn.1002-137X.2018.03.025
[10] 应毅, 任凯, 刘亚军.
基于大数据的网络日志分析技术
Network Log Analysis Technology Based on Big Data
计算机科学, 2018, 45(11A): 353-355.
[11] 丁勇, 朱长水, 武玉艳.
一种基于Hadoop的关联规则挖掘算法
Association Rule Mining Algorithm Based on Hadoop
计算机科学, 2018, 45(11A): 409-411.
[12] 宋哲理, 王超, 王振飞.
基于MapReduce的多级特征选择机制
Multi-level Feature Selection Mechanism Based on MapReduce
计算机科学, 2018, 45(11A): 468-473.
[13] 田贤忠,沈杰.
大数据环境下基于概率矩阵分解的个性化推荐
Personalized Recommendation Based on Probabilistic Matrix Factorization in Big Data Environment
计算机科学, 2017, 44(Z6): 438-441. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.098
[14] 曾赛文,文中华,戴良伟,袁润.
基于不确定攻击图的攻击路径的网络安全分析
Analysis of Network Security Based on Uncertain Attack Graph Path
计算机科学, 2017, 44(Z6): 351-355. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.080
[15] 王硕,孙光明,邹静昭,李伟生.
基于用户推荐影响度的并行协同过滤算法
Parallel Collaborative Filtering Algorithm Based on User Recommended Influence
计算机科学, 2017, 44(9): 250-255. https://doi.org/10.11896/j.issn.1002-137X.2017.09.047
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!