计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 56-62.doi: 10.11896/jsjkx.231000155
毛志雄1, 刘志楠1, 高叙宁2, 王蒙湘3, 巩树凤1, 张岩峰1
MAO Zhixiong1, LIU Zhinan1, GAO Xuning2, WANG Mengxiang3, GONG Shufeng1, ZHANG Yanfeng1
摘要: 图数据在现实生活中广泛存在,且不断发生变化。传统高效的静态图存储方式——压缩行/列(Compressed Sparse Row/Column,CSR/CSR)存储方式在更新图数据时需要大量的数据迁移,不适用于动态图数据。而能够高效更新图数据的邻接表(Adjacency List,AL) 存储方式往往带有大量的指针,导致其图数据读取和分析效率低。Packed Compressed Sparse Row(PCSR)是一种基于CSR的动态图存储结构。该结构在存储边数据时并不是采用连续无空隙数组,而是采用留有空槽的压缩存储阵列(Packed Memory Arrays,PMA)结构,便于边数据的插入。因此,PCSR支持高效图更新和图分析。但是,PCSR在存储幂律图时,其性能容易受大度数顶点的影响。为此,基于PCSR提出一种支持可高效更新和分析动态幂律图的图存储结构Power-PCSR。该结构将幂律图中度数较大的顶点单独存储在一个独立的PMA中,其他所有小度数顶点与PCSR一样存储在原PMA中。小度顶点变化导致的数据迁移不会触及大度数顶点,从而大大减少了数据迁移数量;同样,大度数顶点更新导致的数据迁移只限制在每个大度数顶点的PMA内部,不会涉及小度数顶点和其他大度数顶点的数据迁移。实验显示,Power-PCSR在分析图数据时与PCSR具有相似的性能,而在更新图数据时比PCSR快2倍。
中图分类号:
[1]YU G,GU Y,BAO Y B.Large Scale Graph Data Processing on Cloud Computing Environment[J].Chinese Journal of Compu-ters,2011,34(10):1753-1767. [2]LANE P A,BOOTH J D.Heterogeneous sparse matrix-vector multiplication via compressed sparse row format[J].Parallel Computing,2023,115:102997. [3]Episode 3 Of 4:How To Increase Engagement on Twitter[EB/OL].https://www.sirensearch.co.uk/2019/05/23/episode-3-of-4-how-to-increase-engagement-on-twitter/. [4]Weibo 2020 User Development Report[R/OL].https://data.weibo.com/report/reportDetail?id=456. [5]TIAN J,PANG Y.Adjoin:A causal consistency model based on the adjacency list in a distributed system[J].Concurrency and Computation:Practice and Experience,2020,32(22):e5835. [6]WHEATMAN B,XU H.Packed Compressed Sparse Row:ADynamic Graph Representation[C]//2018 IEEE High Perfor-mance Extreme Computing Conference(HPEC).Waltham,MA,USA,2018:1-7. [7]BENDER M A,EBRAHIMI R,HU H,et al.B-Trees andCache-Oblivious B-Trees with Different-Sized Atomic Keys[J].ACM Transactions on Database Systems,2016,41(3):1-33. [8]DE LEO D,BONCZ P.Packed Memory Arrays-Rewired[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).Macao,China,2019:830-841. [9]ZHANG S,JIANG Z,HOU X, et al.DRONE:An Efficient Distributed Subgraph-Centric Framework for Processing Large-Scale Power-law Graphs[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(2):463-474. [10]PANDEY P,WHEATMAN B,XU H,et al.Terrace:A Hierarchical Graph Container for Skewed Dynamic Graphs[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,New York,NY,USA,2021:1372-1385. [11]FUCHS P,MARGAN D,GICEVA J.Sortledton:a universal,transactional graph data structure[J].The VLDB Endowment,2022,15(6):1173-1186. [12]FENG G,MA Z,LI D,et al.RisGraph:A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,2021:513-527. [13]SAFAEE S,MIRABI M,RAHMANI A M,et al.A distributed B+ Tree indexing method for processing range queries over streaming data[J].Cluster Computing,2024,27(3):1251-1274. [14]HE J,YAO S,CAI L,et al.SLC-index:A scalable skip list-basedindex for cloud data processing[J].Journal of Central South University,2018,25(10):2438-2450. [15]CHANG J W,CHEN T Y.When B-Tree Meets Skyrmion Memory:How Skyrmion Memory Affects an Indexing Scheme[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,41(11):3814-3825. [16]BENDER M A,DEMAINE E D,et al.Cache-oblivious B-trees[J].SIAM Journal on Computing,2005,35(2):341-358. [17]ARRIGO F,HIGHAM D J,NOFERINI V.Non-backtracking pagerank[J].Journal of Scientific Computing,2019,80(3):1419-1437. |
|