计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 56-62.doi: 10.11896/jsjkx.231000155

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向幂律图的动态图存储结构Power-PCSR

毛志雄1, 刘志楠1, 高叙宁2, 王蒙湘3, 巩树凤1, 张岩峰1   

  1. 1 东北大学计算机科学与工程学院 沈阳 110167
    2 东北大学医学与生物信息工程学院 沈阳 110167
    3 中国标准化研究院 北京 100088
  • 收稿日期:2023-10-23 修回日期:2024-03-31 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 王蒙湘(wangmx@cnis.ac.cn)
  • 作者简介:(20215949@stu.neu.edu.cn)
  • 基金资助:
    国家自然科学基金青年科学基金(62202088)

Power-PCSR:An Efficient Dynamic Graph Storage Structure for Power-law Graphs

MAO Zhixiong1, LIU Zhinan1, GAO Xuning2, WANG Mengxiang3, GONG Shufeng1, ZHANG Yanfeng1   

  1. 1 School of Computer Science and Engineering,Northeastern University,Shenyang 110167,China
    2 College of Medicine and Biological Information Engineering,Northeastern University,Shenyang 110167,China
    3 China National Institute of Standardization,Beijing 100088,China
  • Received:2023-10-23 Revised:2024-03-31 Online:2024-08-15 Published:2024-08-13
  • About author:MAO Zhixiong,born in 2002,undergraduate,is a member of CCF(No.Q7701G).His main research interests include graph computation and storage.
    WANG Mengxiang,born in 1991,Ph.D,research assistant,is a member of CCF(No.72448M).Her main research interests include graph data mining,geographic information system(GIS) and international standardization.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62202088).

摘要: 图数据在现实生活中广泛存在,且不断发生变化。传统高效的静态图存储方式——压缩行/列(Compressed Sparse Row/Column,CSR/CSR)存储方式在更新图数据时需要大量的数据迁移,不适用于动态图数据。而能够高效更新图数据的邻接表(Adjacency List,AL) 存储方式往往带有大量的指针,导致其图数据读取和分析效率低。Packed Compressed Sparse Row(PCSR)是一种基于CSR的动态图存储结构。该结构在存储边数据时并不是采用连续无空隙数组,而是采用留有空槽的压缩存储阵列(Packed Memory Arrays,PMA)结构,便于边数据的插入。因此,PCSR支持高效图更新和图分析。但是,PCSR在存储幂律图时,其性能容易受大度数顶点的影响。为此,基于PCSR提出一种支持可高效更新和分析动态幂律图的图存储结构Power-PCSR。该结构将幂律图中度数较大的顶点单独存储在一个独立的PMA中,其他所有小度数顶点与PCSR一样存储在原PMA中。小度顶点变化导致的数据迁移不会触及大度数顶点,从而大大减少了数据迁移数量;同样,大度数顶点更新导致的数据迁移只限制在每个大度数顶点的PMA内部,不会涉及小度数顶点和其他大度数顶点的数据迁移。实验显示,Power-PCSR在分析图数据时与PCSR具有相似的性能,而在更新图数据时比PCSR快2倍。

关键词: 动态图存储, 动态图更新, 数据迁移, Power-PCSR, 幂律图

Abstract: Graph data is widespread in real life and changes over time.The traditional efficient static graph storage structure,compressed sparse row/column(CSR/CSC) requires a large amount of data migration when inserting/deleting edges to/from graphs,which is not suitable for dynamic graphs.Although the adjacency list(AL) is able to update graphs efficiently,it is inefficient in reading and analyzing graphs since it has a large number of pointers,which results in random memory access.PCSR is a novel dynamic graph storage structure based on CSR.It employs a packed memory arrays(PMA) to store the edges rather than a continuous array.Because there are empty slots in PMA,it is easier to insert/delete edges.Thus,packed compressed sparse row(PCSR) is efficient in both graph updating and analysis.However,we find that the performance of PCSR suffers from large degree vertices when storing power-law graphs.For this,this paper proposes a new graph storage structure based on PCSR,Power-PCSR,which supports efficient updating and analysis of dynamic power-law graphs.In Power-PCSR,each large-degree vertex is stored in an independent PMA separately,and other vertices with small degrees are stored in a PMA.The data migration caused by the small-degree vertices will not lead to the migration of large-degree vertices,thus greatly reducing the amount of data migration.Similarly,the data migration caused by the update of large-degree vertices is only limited to its PMA,and will not involve the data migration of other large-degree vertices and small-degree vertices.Experiments show that Power-PCSR has similar performance to PCSR when analyzing graphs,and is 2 times faster than PCSR when updating graph data.

Key words: Dynamic graph storage, Dynamic graph updates, Data migration, Power-PCSR, Power-law graph

中图分类号: 

  • TP391
[1]YU G,GU Y,BAO Y B.Large Scale Graph Data Processing on Cloud Computing Environment[J].Chinese Journal of Compu-ters,2011,34(10):1753-1767.
[2]LANE P A,BOOTH J D.Heterogeneous sparse matrix-vector multiplication via compressed sparse row format[J].Parallel Computing,2023,115:102997.
[3]Episode 3 Of 4:How To Increase Engagement on Twitter[EB/OL].https://www.sirensearch.co.uk/2019/05/23/episode-3-of-4-how-to-increase-engagement-on-twitter/.
[4]Weibo 2020 User Development Report[R/OL].https://data.weibo.com/report/reportDetail?id=456.
[5]TIAN J,PANG Y.Adjoin:A causal consistency model based on the adjacency list in a distributed system[J].Concurrency and Computation:Practice and Experience,2020,32(22):e5835.
[6]WHEATMAN B,XU H.Packed Compressed Sparse Row:ADynamic Graph Representation[C]//2018 IEEE High Perfor-mance Extreme Computing Conference(HPEC).Waltham,MA,USA,2018:1-7.
[7]BENDER M A,EBRAHIMI R,HU H,et al.B-Trees andCache-Oblivious B-Trees with Different-Sized Atomic Keys[J].ACM Transactions on Database Systems,2016,41(3):1-33.
[8]DE LEO D,BONCZ P.Packed Memory Arrays-Rewired[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).Macao,China,2019:830-841.
[9]ZHANG S,JIANG Z,HOU X, et al.DRONE:An Efficient Distributed Subgraph-Centric Framework for Processing Large-Scale Power-law Graphs[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(2):463-474.
[10]PANDEY P,WHEATMAN B,XU H,et al.Terrace:A Hierarchical Graph Container for Skewed Dynamic Graphs[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,New York,NY,USA,2021:1372-1385.
[11]FUCHS P,MARGAN D,GICEVA J.Sortledton:a universal,transactional graph data structure[J].The VLDB Endowment,2022,15(6):1173-1186.
[12]FENG G,MA Z,LI D,et al.RisGraph:A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,2021:513-527.
[13]SAFAEE S,MIRABI M,RAHMANI A M,et al.A distributed B+ Tree indexing method for processing range queries over streaming data[J].Cluster Computing,2024,27(3):1251-1274.
[14]HE J,YAO S,CAI L,et al.SLC-index:A scalable skip list-basedindex for cloud data processing[J].Journal of Central South University,2018,25(10):2438-2450.
[15]CHANG J W,CHEN T Y.When B-Tree Meets Skyrmion Memory:How Skyrmion Memory Affects an Indexing Scheme[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,41(11):3814-3825.
[16]BENDER M A,DEMAINE E D,et al.Cache-oblivious B-trees[J].SIAM Journal on Computing,2005,35(2):341-358.
[17]ARRIGO F,HIGHAM D J,NOFERINI V.Non-backtracking pagerank[J].Journal of Scientific Computing,2019,80(3):1419-1437.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!