计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 28-35.doi: 10.11896/j.issn.1002-137X.2019.04.005
许文, 宋文爱, 富丽贞, 吕伟
XU Wen, SONG Wen-ai, FU Li-zhen, LV Wei
摘要: 图数据规模的爆发式增长使在单机上的子图匹配变得较为困难。尽管现有的分布式算法可以在一定程度上解决大规模图数据的子图匹配问题,但分布式环境中的网络通信代价仍然影响着算法的性能。为此,文中提出了DSsearch分布式子图匹配算法,包含查询图拆分、数据图预处理、候选顶点过滤、中间结果合并 4个步骤。其中,在数据图预处理步骤中使用图划分和完善邻居顶点策略来降低匹配过程中分布式计算节点之间的通信代价;在过滤候选顶点阶段设计DSgraph存储结构存储候选顶点,通过推迟笛卡尔积来减少冗余的中间结果。最后设计了对比实验并在具有7个计算节点的Spark分布式集群上使用真实数据集进行验证。实验结果表明,DSsearch算法能够在秒级时间内完成对百万规模顶点的数据图的子图匹配,尤其是在处理复杂查询图和稠密数据图方面更高效。数据图预处理策略的实验结果说明了通过顶点复制来降低分布式环境中网络通信代价这一策略的可行性。相比TwinTwigJoin、PSgL等算法,随着查询图顶点数量的增加,DSsearch算法的运行时间增长得更缓慢,当查询图顶点数量达到14时,其运行时间是TwinTwigJoin和PSgL算法的一半。实验数据充分说明,分布式环境中的网络通信代价和中间结果数量是影响分布式子图匹配算法的主要因素。实现数据图的预处理和推迟笛卡尔积解决了分布式子图匹配的性能瓶颈问题,有效地完成了大规模图数据的子图匹配。
中图分类号:
[1]ULLMANN J R.An algorithm for subgraph isomorphism[J].Journal of the ACM,1976,23(1):31-42. [2]CORDELLA L P,FOGGIA P,SANSONE C,et al.A (sub) graph isomorphism algorithm for matching large graphs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(10):1367-1372. [3]LEE J,HAN W S,KASPEROVICS R,et al.An in-depth comparison of subgraph isomorphism algorithms in graph databases[J].Proceedings of the VLDB Endowment,2012,6(2):133-144. [4]HAN W S,LEE J,LEE J H.Turbo iso:towards ultrafast and robust subgraph isomorphism search in large graph databases[C]∥Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.ACM,2013:337-348. [5]BI F,CHANG L,LIN X,et al.Efficient subgraph matching by postponing cartesian products[C]∥Proceedings of the 2016 International Conference on Management of Data.ACM,2016:1199-1214. [6]GIUGNO R,SHASHA D.Graphgrep:A fast and universal method for querying graphs[C]∥16th International Conference on Pattern Recognition.IEEE,2002:112-115. [7]ZOU L,CHEN L,YU J X,et al.A novel spectral coding in a large graph database[C]∥Proceedings of the 11th International Conference on Extending Database Technology:Advances in database technology.ACM,2008:181-192. [8]HE H,SINGH A K.Closure-tree:An index structure for graph queries[C]∥Proceedings of the 22nd International Conference on Data Engineering(ICDE’06).IEEE,2006:38-38. [9]SUN Z,WANG H,WANG H,et al.Efficient subgraph matc- hing on billion node graphs[J].Proceedings of the VLDB Endowment,2012,5(9):788-799. [10]SHAO B,WANG H,LI Y.Trinity:A distributed graph engine on a memory cloud[C]∥Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.ACM,2013:505-516. [11]ZHAO Z,WANG G,BUTT A R,et al.Sahad:Subgraph analysis in massive networks using hadoop[C]∥2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS).IEEE,2012:390-401. [12]ALON N,DAO P,HAJIRASOULIHA I,et al.Biomolecular network motif counting and discovery by color coding[J].Bioinformatics,2008,24(13):i241-i249. [13]DEAN J,GHEMAWAT S.MapReduce:simplified data proces- sing on large clusters[J].Communications of the ACM,2008,51(1):107-113. [14]LAI L,QIN L,LIN X,et al.Scalable subgraph enumeration in mapreduce[J].Proceedings of the VLDB Endowment,2015,8(10):974-985. [15]SHAO Y,CUI B,CHEN L,et al.Parallel subgraph listing in a large-scale graph[C]∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.ACM,2014:625-636. [16]REZA T,KLYMKO C,RIPEANU M,et al.Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs[C]∥2017 IEEE International Conference on Cluster Computing (CLUSTER).IEEE,2017:1-12. [17]SUO B,LI Z,PAN W.Parallel subgraph matching on massive graphs[C]∥International Congress on Image and Signal Processing,BioMedical Engineering and Informatics (CISP-BMEI).IEEE,2016:1932-1937. [18]BENLIC U,HAO J K.A multilevel memetic approach for improving graph k-partitions[J].IEEE Transactions on Evolutiona-ry Computation,2011,15(5):624-642. [19]ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al. Spark:Cluster computing with working sets[J].HotCloud,2010,10(10-10):95. [20]LIU X,ZHOU Y,GUAN X,et al.A feasible graph partition framework for parallel computing of big graph[J].Knowledge-Based Systems,2017,134:228-239. [21]KARYPIS G,KUMAR V.A fast and high quality multilevel scheme for partitioning irregular graphs[J].SIAM Journal on Scientific Computing,1998,20(1):359-392. [22]KERNIGHAN B W,LIN S.An efficient heuristic procedure for partitioning graphs[J].The Bell System Technical Journal,1970,49(2):291-307. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 傅丽玉, 陆歌皓, 吴义明, 罗娅玲. 区块链技术的研究及其发展综述 Overview of Research and Development of Blockchain Technology 计算机科学, 2022, 49(6A): 447-461. https://doi.org/10.11896/jsjkx.210600214 |
[3] | 杨亚红, 王海瑞. 基于Renyi熵和BiGRU算法实现SDN环境下的DDoS攻击检测方法 DDoS Attack Detection Method in SDN Environment Based on Renyi Entropy and BiGRU Algorithm 计算机科学, 2022, 49(6A): 555-561. https://doi.org/10.11896/jsjkx.210800095 |
[4] | 孙浩, 毛瀚宇, 张岩峰, 于戈, 徐石成, 何光宇. 区块链跨链技术发展及应用 Development and Application of Blockchain Cross-chain Technology 计算机科学, 2022, 49(5): 287-295. https://doi.org/10.11896/jsjkx.210800132 |
[5] | 冯了了, 丁滟, 刘坤林, 马科林, 常俊胜. 区块链BFT共识算法研究进展 Research Advance on BFT Consensus Algorithms 计算机科学, 2022, 49(4): 329-339. https://doi.org/10.11896/jsjkx.210700011 |
[6] | 梁静茹, 鄂海红, 宋美娜. 基于属性图模型的领域知识图谱构建方法 Method of Domain Knowledge Graph Construction Based on Property Graph Model 计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076 |
[7] | 谭双杰, 林宝军, 刘迎春, 赵帅. 基于机器学习的分布式星载RTs系统负载调度算法 Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning 计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126 |
[8] | 王如斌, 李瑞远, 何华均, 刘通, 李天瑞. 面向海量空间数据的分布式距离连接算法 Distributed Distance Join Algorithm for Massive Spatial Data 计算机科学, 2022, 49(1): 95-100. https://doi.org/10.11896/jsjkx.210100060 |
[9] | 张杰, 岳韶华, 王刚, 刘家义, 姚小强. 基于Stackelberg与边拉普拉斯矩阵的多智能体系统 Multi-agent System Based on Stackelberg and Edge Laplace Matrix 计算机科学, 2021, 48(8): 253-262. https://doi.org/10.11896/jsjkx.200700032 |
[10] | 唐飞, 陈云龙, 冯卓. 基于区块链和代理重加密的电子处方共享方案 Electronic Prescription Sharing Scheme Based on Blockchain and Proxy Re-encryption 计算机科学, 2021, 48(6A): 498-503. https://doi.org/10.11896/jsjkx.201000143 |
[11] | 钱甜甜, 张帆. 基于分布式边缘计算的情绪识别系统 Emotion Recognition System Based on Distributed Edge Computing 计算机科学, 2021, 48(6A): 638-643. https://doi.org/10.11896/jsjkx.201000010 |
[12] | 黄梅根, 刘川, 杜欢, 刘佳乐. 基于知识图谱的认知诊断模型及其在教辅中的应用研究 Research on Cognitive Diagnosis Model Based on Knowledge Graph and Its Application in Teaching Assistant 计算机科学, 2021, 48(6A): 644-648. https://doi.org/10.11896/jsjkx.200700163 |
[13] | 高枫越, 王琰, 朱铁兰. 有适应力的分布式状态估计方法 Resilient Distributed State Estimation Algorithm 计算机科学, 2021, 48(5): 308-312. https://doi.org/10.11896/jsjkx.200300117 |
[14] | 张航, 唐聃, 蔡红亮. 分布式存储系统中的预测式纠删码研究 Study on Predictive Erasure Codes in Distributed Storage System 计算机科学, 2021, 48(5): 130-139. https://doi.org/10.11896/jsjkx.200300124 |
[15] | 丁诗铭, 王天荆, 沈航, 白光伟. 基于能量分类器的抗SSDF攻击协作频谱感知算法 Energy Classifier Based Cooperative Spectrum Sensing Algorithm for Anti-SSDF Attack 计算机科学, 2021, 48(2): 282-288. https://doi.org/10.11896/jsjkx.191100124 |
|