计算机科学 ›› 2018, Vol. 45 ›› Issue (12): 299-307.doi: 10.11896/j.issn.1002-137X.2018.12.048
缪丰羽1, 王宏志2, 阮群生3
MIAO Feng-yu1, WANG Hong-zhi2, RUAN Qun-sheng3
摘要: 相比于确定图上的相似性连接,不确定图上的相似性连接通常具有更大的实际应用价值以及计算复杂性。文中研究了基于MapReduce分布式编程框架的不确定图上的相似性连接问题,提出了基于概率和的Map方剪枝和Reduce方剪枝的两种剪枝策略。Map方剪枝策略在映射过程中过滤掉了不可能具有相似图的不确定图。Reduce方剪枝策略用于减少约减过程中的候选图对。基于这两种剪枝策略,文中提出了一种基于MapReduce框架的不确定图上的相似性连接算法MUGSJoin。实验结果证明,该算法与同类算法相比具有更好的性能和可扩展性。
中图分类号:
[1]ZHAO X,XIAO C,LIN X,et al.Efficient Graph Similarity Joins with Edit Distance Constraints[C]∥Proceedings of IEEE Conference on Data Engineering.IEEE Press,2012:834-845. [2]YUAN Y,WANG G,CHEN L,et al.Graph similarity search on large uncertain graph databases[J].Vldb Journal,2015,24(2):271-296. [3]ZHENG W,ZOU L,LIAN X,et al.Graph similarity search with edit distance constraint in large graph databases[C]∥ACM International Conference on Conference on Information & Know-ledge Management.ACM,2013:1595-1600. [4]WANG Y,WANG H,LI J,et al.Efficient graph similarity join for information integration on graphs[J].Frontiers of Computer Science,2016,10(2):317-329. [5]PANG J,YU G,XU J,et al.Similarity Joins on Massive Data Based on MapReduce Framework[J].Computer Science,2015,42(1):1-5.(in Chinese) 庞俊,于戈,许嘉,等.基于MapReduce框架的海量数据相似性连接研究进展[J].计算机科学,2015,42(1):1-5. [6]MIAO F Y,WANG H Z.A Method of Similarity Join on Uncertain Graph Database[J].Journal of Software,2018,29(10):3150-3163.(in Chinese) 缪丰羽,王宏志.一种不确定图数据库上的相似性连接方法[J].软件学报,2018,29(10):3150-3163. [7]LIN J,SCHATZ M.Design patterns for efficient graph algorithms in MapReduce[C]∥Eighth Workshop on Mining & Learning with Graphs(Mlg 10).2010:78-85. [8]PLIMPTON S J,DEVINE K D.MapReduce in MPI for Large-scale graph algorithms[J].Parallel Computing,2011,37(9):610-632. [9]QIN L,YU J X,CHANG L,et al.Scalable big graph processing in MapReduce[C]∥Proceedings of the ACM SIGMOD International Conference on Mangement of Data.2014:22-27. [10]KOLDA T G,PINAR A,PLANTENGA T,et al.Counting Trian-gles in Massive Graphs with MapReduce [J].Siam Journal on Scientific Computing,2013,36(5):48-77. [11]CHEN Y,ZHAO X,XIAO C,et al.Efficient and Scalable Graph Similarity Joins in MapReduce[J].The Scientific World Journal,2014,2014(3):749028. [12]CHEN Y F,ZHAO X,HE P J,et al.BMGSJoin:A MapReduce Based Graph Similarity Join Algorithm[J].Pattem Recognition and Artificial Intelligence,2015,28(5):472-480.(in Chinese) 陈一帆,赵翔,何培俊,等.BMGSJoin:一种基于MapReduce的图相似度连接算法[J].模式识别与人工智能,2015,28(5):472-480. [13]PANG J,GU Y,XU J,et al.Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce[J].Lecture Notes in Computer Science,2014,8485:415-418. [14]SANJAY G,HOWARD G,SHAN-TAK L.The Google file system[J].ACM SIGOPS Operationg Systems Review,2003,37(5):29-43. [15]AFRATI F N,ULLMAN J D.Optimizing joins in a map-reduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1282-1298. [16]OKCAN A,RIEDEWALD M.Processing theta-joins using MapReduce[C]∥ACM SIGMOD International Conference on Management of Data(SIGMOD 2011).Athens,Greece,DBLP,2011:949-960. [17]YANG H C,DASDAN A,HSIAO R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2007:1029-1040. [18]AFRATI F N,SARMA A D,MENESTRINA D,et al.FuzzyJoins Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE,2012:498-509. [19]LUO W,TAN H,MAO H,et al.Efficient Similarity Joins on Massive High-Dimensional Datasets Using MapReduce[C]∥IEEE International Conference on Mobile Data Management.IEEE Computer Society,2012:1-10. [20]WANG Y,WANG H,LI J,et al.Efficient subgraph join based on connectivity similarity[J].World Wide Web-internet & Web Information Systems,2015,18(4):871-887. [21]KIM Y,SHIM K.Parallel Top-K Similarity Join Algorithms Using MapReduce[C]∥IEEE International Conference on Data Engineering.IEEE Computer Society,2012:510-521. [22]MORRIS R,MORRIS R.ExOR:opportunistic multi-hop routing for wireless networks[C]∥Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.ACM,2005:133-144. [23]CEOL A,CHATR ARYAMONTRI A,LICATA L,et al.MINT,the molecular interaction database:2009 update. [J].Nucleic Acids Research,2010,38(suppl1):D532-D539. [24]CHUA H N,SUNG W K,WONG L.Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions[M]∥Data Mining for Biomedical Applications.Springer Berlin Heidelberg,2006:1623-1630. |
[1] | 刘卫明, 安冉, 毛伊敏. 基于聚类和WOA的并行支持向量机算法 Parallel Support Vector Machine Algorithm Based on Clustering and WOA 计算机科学, 2022, 49(7): 64-72. https://doi.org/10.11896/jsjkx.210500040 |
[2] | 张元鸣, 虞家睿, 蒋建波, 陆佳炜, 肖刚. 面向MapReduce的中间数据传输流水线优化机制 Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework 计算机科学, 2021, 48(2): 41-46. https://doi.org/10.11896/jsjkx.191000103 |
[3] | 王童, 马文平, 罗维. 基于区块链的信息共享及安全多方计算模型 Information Sharing and Secure Multi-party Computing Model Based on Blockchain 计算机科学, 2019, 46(9): 162-168. https://doi.org/10.11896/j.issn.1002-137X.2019.09.023 |
[4] | 王晓霞, 孙德才. 一种基于Q-sample的局部相似连接并行算法 Q-sample-based Local Similarity Join Parallel Algorithm 计算机科学, 2019, 46(12): 38-44. https://doi.org/10.11896/jsjkx.190100240 |
[5] | 胡赢双, 陆亿红. 基于MapReduce的强连通网格聚类算法 Cell Clustering Algorithm Based on MapReduce and Strongly Connected Fusion 计算机科学, 2019, 46(11A): 204-207. |
[6] | 齐玉东,何诚,司维超. MapReduce框架下的Skyline云资源选择算法 Cloud Resource Selection Algorithm by Skyline under MapReduce Frame 计算机科学, 2018, 45(6A): 411-414. |
[7] | 张滨, 乐嘉锦. 基于列存储的MapReduce分布式Hash连接算法 Hash Join in MapReduce Distributed Environment Based on Column-store 计算机科学, 2018, 45(6A): 471-475. |
[8] | 周华平,刘光宗,张贝贝. 基于索引偏移的MapReduce聚类负载均衡策略 Load Balancing Strategy of MapReduce Clustering Based on Index Shift 计算机科学, 2018, 45(5): 303-309. https://doi.org/10.11896/j.issn.1002-137X.2018.05.053 |
[9] | 王华进,黎建辉,沈志宏,周园春. 基于ORC元数据的Hive Join查询Reducer负载均衡方法 ORC Metadata Based Reducer Load Balancing Method for Hive Join Queries 计算机科学, 2018, 45(3): 158-164. https://doi.org/10.11896/j.issn.1002-137X.2018.03.025 |
[10] | 应毅, 任凯, 刘亚军. 基于大数据的网络日志分析技术 Network Log Analysis Technology Based on Big Data 计算机科学, 2018, 45(11A): 353-355. |
[11] | 丁勇, 朱长水, 武玉艳. 一种基于Hadoop的关联规则挖掘算法 Association Rule Mining Algorithm Based on Hadoop 计算机科学, 2018, 45(11A): 409-411. |
[12] | 宋哲理, 王超, 王振飞. 基于MapReduce的多级特征选择机制 Multi-level Feature Selection Mechanism Based on MapReduce 计算机科学, 2018, 45(11A): 468-473. |
[13] | 田贤忠,沈杰. 大数据环境下基于概率矩阵分解的个性化推荐 Personalized Recommendation Based on Probabilistic Matrix Factorization in Big Data Environment 计算机科学, 2017, 44(Z6): 438-441. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.098 |
[14] | 曾赛文,文中华,戴良伟,袁润. 基于不确定攻击图的攻击路径的网络安全分析 Analysis of Network Security Based on Uncertain Attack Graph Path 计算机科学, 2017, 44(Z6): 351-355. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.080 |
[15] | 王硕,孙光明,邹静昭,李伟生. 基于用户推荐影响度的并行协同过滤算法 Parallel Collaborative Filtering Algorithm Based on User Recommended Influence 计算机科学, 2017, 44(9): 250-255. https://doi.org/10.11896/j.issn.1002-137X.2017.09.047 |
|