计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 151-159.doi: 10.11896/jsjkx.231200073

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向远程内存图数据库的应用感知分离式存储设计

李纯羽1, 邓龙1, 李永坤1,2, 许胤龙1,2   

  1. 1 中国科学技术大学计算机科学与技术学院 合肥 230026
    2 高性能计算安徽省重点实验室 合肥 230026
  • 收稿日期:2023-12-11 修回日期:2024-05-13 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 李永坤(ykli@ustc.edu.cn)
  • 作者简介:(lichunyu@mail.ustc.edu.cn)
  • 基金资助:
    国家自然科学基金(62172382)

Application-aware Disaggregated Storage Design for Remote Memory Graph Database

LI Chunyu1, DENG Long1, LI Yongkun1,2, XU Yinlong1,2   

  1. 1 School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China
    2 Anhui Province Key Laboratory of High Performance Computing,Hefei 230026,China
  • Received:2023-12-11 Revised:2024-05-13 Online:2025-01-15 Published:2025-01-09
  • About author:LI Chunyu,born in 1999,postgraduate.Her main research interests include graph database and disaggregated me-mory architecture.
    LI Yongkun,born in 1986,professor,is a member of CCF(No.33184M).His main research interests include I/O optimization for data-intensive computing and storage systems,parallel and distributed systems,key-value systems,and cloud computing,etc.
  • Supported by:
    National Natural Science Foundation of China(62172382).

摘要: 图数据在各种应用中日益普及,其因涵盖多种实体类型和存在丰富的关联关系而备受关注。对于图数据库用户而言,高效的图查询服务是保障系统性能的关键因素。随着数据量增加,单机图数据库很难满足将所有数据存储在内存中的需求,而分布式图数据库在拓展性和资源利用率方面受到挑战。基于RDMA的远程内存系统的引入为克服这些挑战提供了一种新的选择,通过分离计算和存储资源,实现了更为灵活的内存使用方式。然而,在使用远程内存的情况下如何最大程度地优化图查询性能成为了当前研究的重点问题。文中首先分析了利用操作系统分页机制透明使用远程内存构建图数据库存在的问题,并在应用层次上设计了远程内存图数据库的存储模型。根据不同数据的特点和访问模式,设计了属性图在远程内存中的存储结构,优化了数据布局和访问路径。实验结果表明,在本地内存受限的情况下,与透明使用远程内存相比,应用感知的设计方式的端到端性能最高提升了12倍。

关键词: 图查询, 图数据库, 图存储, 远程内存, 属性图模型

Abstract: Graph data is becoming more widespread across various applications due to its features to represent multiple entity types and rich associated relationships.For users of graph databases,efficient graph query service is crucial to ensure system performance.As the amount of data grows,the single-machine graph database is difficult to meet the demand of storing all the data in memory,while distributed graph databases,which spread the data across the memory of multiple machines,face challenges in scalability and resource utilization rate.A new solution to these challenges is the introduction of RDMA-based remote memory systems.These systems separate the tasks of computing and storing graph data,offering a more flexible way to use memory.Yet,a big challenge in current solutions is how to ensure the performance of graph query when using remote memory.This study takes a close look at the challenges that come up when building remote memory graph databases on general-purpose far memory platforms,which use remote memory transparently.It suggests a new approach,where the design of the remote memory graph database is aware of how it’s being used.This design method creates a storage model that understands the different types of property graph data and how it’s accessed.Specifically,the study explains how to make sure the data is arranged and accessed in the best way.Experimental results show that when the local memory is limited,the awareness method outperforms the transparent approach,giving a 12x improvement in graph query performance.

Key words: Graph query, Graph database, Graph storage, Remote memory, Property graph model

中图分类号: 

  • TP311
[1]Neo4J,Inc.Neo4J[OL].2007[2023-11-21].https://neo4j.com/.
[2]JanusGraph.JanusGraph[OL].2017[2023-11-21].https://janusgraph.org/
[3]DEUTSCH A,XU Y,WU M,et al.Tigergraph:A native MPP graph database[J].arXiv:1901.08248,2019.
[4]WU M,YI X,YU H,et al.Nebula Graph:An open source distributed graph database[J].arXiv:2206.07278,2022.
[5]LIU Q,LI Y,DUAN H,et al.Knowledge Graph Construction Techniques[J].Journal of Computer Research and Development,2016,53(3):582-600.
[6]SU L,QIN X,ZHANG Z,et al.Banyan:a scoped dataflow engine for graph query service[J].arXiv:2202.12530,2022.
[7]SAHU S,MHEDHBI A,SALIHOGLU S,et al.The ubiquity of large graphs and surprising challenges of graph processing:extended survey[J].The VLDB journal,2020,29:595-618.
[8]CHEN H,WU B,DENG S,et al.High performance distributed OLAP on property graphs with grasper[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:2705-2708.
[9]BURAGOHAIN C,RISVIK K M,BRETT P,et al.A1:A distributed in-memory graph database[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:329-344.
[10]EWAIS M,CHOW P.Disaggregated Memory in the Datacenter:A Survey[J].IEEE Access,2023,11:20688-20712.
[11]WANG C,QIAO Y,MA H,et al.Canvas:Isolated and Adaptive Swapping for {Multi-Applications} on Remote Memory[C]//20th USENIX Symposium on Networked Systems Design and Implementation(NSDI 23).2023:161-179.
[12]RUAN Z,SCHWARZKOPF M,AGUILERA M K,et al.{AIFM}:{High-Performance},{Application-Integrated} far memory[C]//14th USENIX Symposium on Operating Systems Design and Implementation(OSDI 20).2020:315-332.
[13]QIAO Y,WANG C,RUAN Z,et al.Hermit:{Low-Latency},{High-Throughput},and Transparent Remote Memory via {Feedback-Directed} Asynchrony[C]//20th USENIX Symposium on Networked Systems Design and Implementation(NSDI 23).2023:181-198.
[14]AMARO E,BRANNER-AUGMON C,LUO Z,et al.Can farmemory improve job throughput?[C]//Proceedings of the Fifteenth European Conference on Computer Systems.2020:1-16.
[15]GU J,LEE Y,ZHANG Y,et al.Efficient memory disaggregation with infiniswap[C]//14th USENIX Symposium on Networked Systems Design and Implementation(NSDI 17).2017:649-667.
[16]ANGLES R.The Property Graph Database Model[C]//AMW.2018.
[17]CHEN H,LI C,FANG J,et al.Grasper:A high performance distributed system for OLAP on property graphs[C]//Procee-dings of the ACM Symposium on Cloud Computing.2019:87-100.
[18]BESTA M,GERSTENBERGER R,PETER E,et al.Demystifying graph databases:Analysis and taxonomy of data organization,system designs,and graph queries[J].ACM Computing Surveys,2023,56(2):1-40.
[19]ANGLES R,ANTAL J B,AVERBUCH A,et al.The LDBC social network benchmark[J].arXiv:2001.02299,2020.
[20]AGUILERA M K,KEETON K,NOVAKOVIC S,et al.Designing far memory data structures:Think outside the box[C]//Proceedings of the Workshop on Hot Topics in Operating Systems.2019:120-126.
[21]LI C,CHEN H,ZHANG S,et al.ByteGraph:a high-perfor-mance distributed graph database in ByteDance[J].Proceedings of the VLDB Endowment,2022,15(12):3306-3318.
[22]CHEN H,LI C,ZHENG C,et al.G-tran:a high performance distributed graph database with a decentralized architecture[J].Proceedings of the VLDB Endowment,2022,15(11):2545-2558.
[23]WEI X,SHI J,CHEN Y,et al.Fast in-memory transaction processing using RDMA and HTM[C]//Proceedings of the 25th Symposium on Operating Systems Principles.2015:87-104.
[24]RODRIGUEZ M A.The gremlin graph traversal machine and language(invited talk)[C]//Proceedings of the 15th Symposium on Database Programming Languages.2015:1-10.
[25]KALIA A,KAMINSKY M,ANDERSEN D G.Design guide-lines for high performance {RDMA} systems[C]//2016 USENIX Annual Technical Conference(USENIX ATC 16).2016:437-450.
[26]DRAGOJEVIĆ A,NARAYANAN D,CASTRO M,et al.{FaRM}:Fast remote memory[C]//11th USENIX Symposium on Networked Systems Design and Implementation(NSDI 14).2014:401-414.
[27]BESTA M,GERSTENBERGER R,FISCHER M,et al.The Graph Database Interface:Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2023:1-18.
[28]WANG J,LI C,WANG T,et al.Excavating the potential of graph workload on rdma-based far memory architecture[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:1029-1039.
[29]WANG J,LI C,LIU Y,et al.Fargraph+:Excavating the paral-lelism of graph processing workload on RDMA-based far memory system[J].Journal of Parallel and Distributed Computing,2023,177:144-159.
[30]ZAHKA D,GAVRILOVSKA A.FAM-Graph:Graph analytics on disaggregated memory[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:81-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!