计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 151-159.doi: 10.11896/jsjkx.231200073
李纯羽1, 邓龙1, 李永坤1,2, 许胤龙1,2
LI Chunyu1, DENG Long1, LI Yongkun1,2, XU Yinlong1,2
摘要: 图数据在各种应用中日益普及,其因涵盖多种实体类型和存在丰富的关联关系而备受关注。对于图数据库用户而言,高效的图查询服务是保障系统性能的关键因素。随着数据量增加,单机图数据库很难满足将所有数据存储在内存中的需求,而分布式图数据库在拓展性和资源利用率方面受到挑战。基于RDMA的远程内存系统的引入为克服这些挑战提供了一种新的选择,通过分离计算和存储资源,实现了更为灵活的内存使用方式。然而,在使用远程内存的情况下如何最大程度地优化图查询性能成为了当前研究的重点问题。文中首先分析了利用操作系统分页机制透明使用远程内存构建图数据库存在的问题,并在应用层次上设计了远程内存图数据库的存储模型。根据不同数据的特点和访问模式,设计了属性图在远程内存中的存储结构,优化了数据布局和访问路径。实验结果表明,在本地内存受限的情况下,与透明使用远程内存相比,应用感知的设计方式的端到端性能最高提升了12倍。
中图分类号:
[1]Neo4J,Inc.Neo4J[OL].2007[2023-11-21].https://neo4j.com/. [2]JanusGraph.JanusGraph[OL].2017[2023-11-21].https://janusgraph.org/ [3]DEUTSCH A,XU Y,WU M,et al.Tigergraph:A native MPP graph database[J].arXiv:1901.08248,2019. [4]WU M,YI X,YU H,et al.Nebula Graph:An open source distributed graph database[J].arXiv:2206.07278,2022. [5]LIU Q,LI Y,DUAN H,et al.Knowledge Graph Construction Techniques[J].Journal of Computer Research and Development,2016,53(3):582-600. [6]SU L,QIN X,ZHANG Z,et al.Banyan:a scoped dataflow engine for graph query service[J].arXiv:2202.12530,2022. [7]SAHU S,MHEDHBI A,SALIHOGLU S,et al.The ubiquity of large graphs and surprising challenges of graph processing:extended survey[J].The VLDB journal,2020,29:595-618. [8]CHEN H,WU B,DENG S,et al.High performance distributed OLAP on property graphs with grasper[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:2705-2708. [9]BURAGOHAIN C,RISVIK K M,BRETT P,et al.A1:A distributed in-memory graph database[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:329-344. [10]EWAIS M,CHOW P.Disaggregated Memory in the Datacenter:A Survey[J].IEEE Access,2023,11:20688-20712. [11]WANG C,QIAO Y,MA H,et al.Canvas:Isolated and Adaptive Swapping for {Multi-Applications} on Remote Memory[C]//20th USENIX Symposium on Networked Systems Design and Implementation(NSDI 23).2023:161-179. [12]RUAN Z,SCHWARZKOPF M,AGUILERA M K,et al.{AIFM}:{High-Performance},{Application-Integrated} far memory[C]//14th USENIX Symposium on Operating Systems Design and Implementation(OSDI 20).2020:315-332. [13]QIAO Y,WANG C,RUAN Z,et al.Hermit:{Low-Latency},{High-Throughput},and Transparent Remote Memory via {Feedback-Directed} Asynchrony[C]//20th USENIX Symposium on Networked Systems Design and Implementation(NSDI 23).2023:181-198. [14]AMARO E,BRANNER-AUGMON C,LUO Z,et al.Can farmemory improve job throughput?[C]//Proceedings of the Fifteenth European Conference on Computer Systems.2020:1-16. [15]GU J,LEE Y,ZHANG Y,et al.Efficient memory disaggregation with infiniswap[C]//14th USENIX Symposium on Networked Systems Design and Implementation(NSDI 17).2017:649-667. [16]ANGLES R.The Property Graph Database Model[C]//AMW.2018. [17]CHEN H,LI C,FANG J,et al.Grasper:A high performance distributed system for OLAP on property graphs[C]//Procee-dings of the ACM Symposium on Cloud Computing.2019:87-100. [18]BESTA M,GERSTENBERGER R,PETER E,et al.Demystifying graph databases:Analysis and taxonomy of data organization,system designs,and graph queries[J].ACM Computing Surveys,2023,56(2):1-40. [19]ANGLES R,ANTAL J B,AVERBUCH A,et al.The LDBC social network benchmark[J].arXiv:2001.02299,2020. [20]AGUILERA M K,KEETON K,NOVAKOVIC S,et al.Designing far memory data structures:Think outside the box[C]//Proceedings of the Workshop on Hot Topics in Operating Systems.2019:120-126. [21]LI C,CHEN H,ZHANG S,et al.ByteGraph:a high-perfor-mance distributed graph database in ByteDance[J].Proceedings of the VLDB Endowment,2022,15(12):3306-3318. [22]CHEN H,LI C,ZHENG C,et al.G-tran:a high performance distributed graph database with a decentralized architecture[J].Proceedings of the VLDB Endowment,2022,15(11):2545-2558. [23]WEI X,SHI J,CHEN Y,et al.Fast in-memory transaction processing using RDMA and HTM[C]//Proceedings of the 25th Symposium on Operating Systems Principles.2015:87-104. [24]RODRIGUEZ M A.The gremlin graph traversal machine and language(invited talk)[C]//Proceedings of the 15th Symposium on Database Programming Languages.2015:1-10. [25]KALIA A,KAMINSKY M,ANDERSEN D G.Design guide-lines for high performance {RDMA} systems[C]//2016 USENIX Annual Technical Conference(USENIX ATC 16).2016:437-450. [26]DRAGOJEVIĆ A,NARAYANAN D,CASTRO M,et al.{FaRM}:Fast remote memory[C]//11th USENIX Symposium on Networked Systems Design and Implementation(NSDI 14).2014:401-414. [27]BESTA M,GERSTENBERGER R,FISCHER M,et al.The Graph Database Interface:Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2023:1-18. [28]WANG J,LI C,WANG T,et al.Excavating the potential of graph workload on rdma-based far memory architecture[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:1029-1039. [29]WANG J,LI C,LIU Y,et al.Fargraph+:Excavating the paral-lelism of graph processing workload on RDMA-based far memory system[J].Journal of Parallel and Distributed Computing,2023,177:144-159. [30]ZAHKA D,GAVRILOVSKA A.FAM-Graph:Graph analytics on disaggregated memory[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:81-92. |
|