计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 67-77.doi: 10.11896/jsjkx.230800193

• 高性能计算 • 上一篇    下一篇

基于国产DCU异构平台的图匹配算法移植与优化

郝萌1, 田雪洋1, 鲁刚钊2, 刘义3, 张伟哲1, 何慧1   

  1. 1 哈尔滨工业大学计算学部 哈尔滨150001
    2 中国电子科技南湖研究院 浙江 嘉兴314001
    3 中国科学院大学计算机科学与技术学院 北京100049
  • 收稿日期:2023-08-30 修回日期:2024-01-09 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 郝萌(haomeng@hit.edu.cn)
  • 基金资助:
    国家自然科学基金青年基金(62202123);国家自然科学基金联合重点基金(U22A2036);国家重点研发计划(2020YFB1406902);广东省重点研发计划(2020B0101360001);光合基金C类(202202033706)

Transplantation and Optimization of Graph Matching Algorithm Based on Domestic DCUHeterogeneous Platform

HAO Meng1, TIAN Xueyang1, LU Gangzhao2, LIU Yi3, ZHANG Weizhe1, HE Hui1   

  1. 1 Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China
    2 China Nanhu Academy of Electronics and Information Technology,Jiaxing,Zhejiang 314001,China
    3 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2023-08-30 Revised:2024-01-09 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62202123),Joint Funds of the National Natural Science Foundation of China(U22A2036),National Key Research and Development Program of China(2020YFB1406902),Key-Area Research and Development Program of Guangdong Province(2020B0101360001) and GH fund C(202202033706).

摘要: 子图匹配是一种基础的图算法,被广泛应用于社交网络、图神经网络等众多领域。随着图数据规模的增长,人们迫切需要高效的子图匹配算法。GENEVA是一种基于GPU的并行子图匹配算法,其利用区间索引的图存储结构和并行匹配优化方法,能够大幅度减少存储开销,提升子图匹配性能。但由于平台底层硬件架构和编译环境的不同,GENEVA无法直接应用到国产DCU异构平台。为了解决该问题,提出了GENEVA面向国产DCU的移植和优化方案。IO时间开销是GENEVA算法主要的性能瓶颈,文中采用锁页内存、预加载、调度器3种优化策略来突破该瓶颈。其中,锁页内存技术避免了从可分页内存到临时锁页内存的额外数据传输,在DCU平台上大幅度减少了IO传输的时间开销;预加载技术将IO数据传输与DCU核函数计算重叠,掩盖了IO时间开销;调度器在满足预加载需求的同时,减少了冗余数据的传输。在3个不同规模的真实数据集上进行实验,结果表明,采用优化策略后算法性能显著提高。在92.6%的测试用例上,经过优化的GENEVA-HIP算法在国产DCU平台的执行时间比移植前的GENEVA算法在GPU服务器的执行时间短。在较大规模的数据集上,优化的GENEVA-HIP算法在DCU平台上的执行时间相比移植前的GENEVA算法在GPU服务器的执行时间减少了52.73%。

关键词: 子图匹配, DCU, 异构平台, HIP, 移植和优化

Abstract: Subgraph matching is a basic graph algorithm that is widely used in various fields such as social networks and graph neural networks.As the scale of graph data grows,there is an increasing need for efficient subgraph matching algorithms.GENEVA is a GPU-based parallel subgraph matching algorithm.It uses the interval index graph storage structure and parallel matching optimization method to greatly reduce storage overhead and improve subgraph matching performance.However,due to the diffe-rence in the underlying hardware architecture and compilation environment of the platform,GENEVA cannot be directly applied to the domestic DCU platform.In order to solve this problem,this paper proposes GENEVA's transplantation and optimization scheme for domestic DCU.IO time consumption is the main performance bottleneck of GENEVA algorithm.This paper proposes three optimization strategies of page-locked memory,preloading,and scheduler to alleviate this bottleneck.Among them,page-locked memory technology avoids additional data transfer from pageable memory to temporary page-locked memory,and greatly reduces the time consumption of IO transfer on the DCU platform.The preloading technology overlaps IO data transmission with DCU kernel function computation to mask IO time consumption.The scheduler reduces redundant data transfer while satisfy preloading requirements.In this paper,Experiments are carried out on three real-world datasets of different sizes,and the results show that the algorithm performance is significantly improved after using the optimization strategies.On 92.6% of the test cases,the optimized GENEVA-HIP execution time on the Sugon DCU platform is less than that of the unported GENEVA on the GPU server.On a larger dataset,the execution time of the optimized Geneva-HIP algorithm on the DCU platform is reduced by 52.73% compared with the the pre-port GENEVA algorithm on the GPU server.

Key words: Subgraph matching, DCU, Heterogeneous platform, HIP, Transplantation and optimization

中图分类号: 

  • TP391
[1]LACASA L,LUQUE B,BALLESTEROS F,et al.From timeseries to complex networks:The visibility graph[J].Procee-dings of the National Academy of Sciences,2008,105(13):4972-4975.
[2]FOGGIA P,SANSONE C,VENTO M.A performance comparison of five algorithms for graph isomorphism[C]//Proceedings of the 3rd IAPR TC-15 Workshop on Graph-based Representations in Pattern Recognition.2001:188-199.
[3]ZOU L.Research on Subgraph Query Algorithm in Graph Database[D].Wuhan:Huazhong University of Science and Techno-logy,2009.
[4]LU G Z.Research on Performance Model and Optimization of Parallel Applications on Heterogeneous Platforms[D].Heilongjiang:Harbin Institute of Technology,2021.
[5]FAN L L,QIAO Y H,LI J F,et al.CP2K software transplantation and optimization based on domestic c86 processor [J].Computer Science,2023,50(6):58-65.
[6]ZHANG Y C X,FENG L B,LIANG J G.Research on parallel performance optimization of sea ice Model using eOSematic Supercomputer [J].Journal of Software Guide,2019,20(6):80-85.
[7]YANG X Y,ZHAO R C,WANG H S,et al.DCU non-uniform control flow oriented compiler optimization [J].Computer Application,2023,43(10):3170-3177.
[8]YANG S C,ZHAO R C,HAN L,et al.DCU oriented Shared memory access to quantify [J/OL].https://doi.org/10.19678/j.issn.1000-3428.0067210.
[9]CONTE D,FOGGIA P,SANSONE C,et al.Thirty years ofgraph matching in pattern recognition[J].International Journal of Pattern Recognition and Artificial Intelligence,2004,18(3):265-298.
[10]ULLMANN J R.An algorithm for subgraph isomorphism[J].Journal of the ACM(JACM),1976,23(1):31-42.
[11]CORDELLA L P,FOGGIA P,SANSONE C,et al.An improved algorithm for matching large graphs[C]//3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition.2001:149-159.
[12]SHANG H,ZHANG Y,LIN X,et al.Taming verification hardness:an efficient algorithm for testing subgraph isomorphism[J].Proceedings of the VLDB Endowment,2008,1(1):364-375.
[13]HE H,SINGH A K.Graphs-at-a-time:query language and ac-cess methods for graph databases[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.2008:405-418.
[14]ZHAO P,HAN J.On graph query optimization in large net-works[J].Proceedings of the VLDB Endowment,2010,3(1/2):340-351.
[15]SUN S,LUO Q.Subgraph matching with effective matching order and indexing[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):491-505.
[16]HAN W S,LEE J,LEE J H.Turboiso:towards ultrafast and robust subgraph isomorphism search in large graph databases[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data.2013:337-348.
[17]REN X,WANG J.Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs[J].Proceedings of the VLDB Endowment,2015,8(5):617-628.
[18]HAN M,KIM H,GU G,et al.Efficient subgraph matching:Harmonizing dynamic programming,adaptive matching order,and failing set together[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1429-1446.
[19]RIVERO C R,JAMIL H M.Efficient and scalable labeled subgraph matching using SGMatch[J].Knowledge and Information Systems,2017,51(1):61-87.
[20]HU Y,LIU H,HUANG H H.Tricore:Parallel triangle coun-ting on gpus[C]//SC18:International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:171-182.
[21]GUO W,LI Y,SHA M,et al.Gpu-accelerated subgraph enumeration on partitioned graphs[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:1067-1082.
[22]GUO W,LI Y,TAN K L.Exploiting Reuse for GPU Subgraph Enumeration[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(2):4231-4244.
[23]ZENG L,ZOU L,ÖZSU M T,et al.Gsi:Gpu-friendly subgraph isomorphism[C]//2020 IEEE 36th International Conference on Data Engineering(ICDE).IEEE,2020:1249-1260.
[24]AHO A V,DENNING P J,ULLMAN J D.Principles of optimal page replacement[J].Journal of the ACM(JACM),1971,18(1):80-93.
[25]ROSSI R,AHMED N.The network data repository with interactive graph analytics and visualization[C]//Twenty-ninth AAAI Conference on Artificial Intelligence.2015.
[26]LI C H,CUI P J,YUAN Y,et al.Multi-GPU programming model for large graph subgraph matching [J].Computer Science and Exploration,2023,17(7):1576-1585.
[27]MENG K,LIN Z H,TAN G M.GPU-based Subgraph Matching Optimization Technology [J].High Technology Communication,2022,32(1):1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!