Computer Science ›› 2024, Vol. 51 ›› Issue (8): 56-62.doi: 10.11896/jsjkx.231000155

• Database & Big Data & Data Science • Previous Articles     Next Articles

Power-PCSR:An Efficient Dynamic Graph Storage Structure for Power-law Graphs

MAO Zhixiong1, LIU Zhinan1, GAO Xuning2, WANG Mengxiang3, GONG Shufeng1, ZHANG Yanfeng1   

  1. 1 School of Computer Science and Engineering,Northeastern University,Shenyang 110167,China
    2 College of Medicine and Biological Information Engineering,Northeastern University,Shenyang 110167,China
    3 China National Institute of Standardization,Beijing 100088,China
  • Received:2023-10-23 Revised:2024-03-31 Online:2024-08-15 Published:2024-08-13
  • About author:MAO Zhixiong,born in 2002,undergraduate,is a member of CCF(No.Q7701G).His main research interests include graph computation and storage.
    WANG Mengxiang,born in 1991,Ph.D,research assistant,is a member of CCF(No.72448M).Her main research interests include graph data mining,geographic information system(GIS) and international standardization.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62202088).

Abstract: Graph data is widespread in real life and changes over time.The traditional efficient static graph storage structure,compressed sparse row/column(CSR/CSC) requires a large amount of data migration when inserting/deleting edges to/from graphs,which is not suitable for dynamic graphs.Although the adjacency list(AL) is able to update graphs efficiently,it is inefficient in reading and analyzing graphs since it has a large number of pointers,which results in random memory access.PCSR is a novel dynamic graph storage structure based on CSR.It employs a packed memory arrays(PMA) to store the edges rather than a continuous array.Because there are empty slots in PMA,it is easier to insert/delete edges.Thus,packed compressed sparse row(PCSR) is efficient in both graph updating and analysis.However,we find that the performance of PCSR suffers from large degree vertices when storing power-law graphs.For this,this paper proposes a new graph storage structure based on PCSR,Power-PCSR,which supports efficient updating and analysis of dynamic power-law graphs.In Power-PCSR,each large-degree vertex is stored in an independent PMA separately,and other vertices with small degrees are stored in a PMA.The data migration caused by the small-degree vertices will not lead to the migration of large-degree vertices,thus greatly reducing the amount of data migration.Similarly,the data migration caused by the update of large-degree vertices is only limited to its PMA,and will not involve the data migration of other large-degree vertices and small-degree vertices.Experiments show that Power-PCSR has similar performance to PCSR when analyzing graphs,and is 2 times faster than PCSR when updating graph data.

Key words: Dynamic graph storage, Dynamic graph updates, Data migration, Power-PCSR, Power-law graph

CLC Number: 

  • TP391
[1]YU G,GU Y,BAO Y B.Large Scale Graph Data Processing on Cloud Computing Environment[J].Chinese Journal of Compu-ters,2011,34(10):1753-1767.
[2]LANE P A,BOOTH J D.Heterogeneous sparse matrix-vector multiplication via compressed sparse row format[J].Parallel Computing,2023,115:102997.
[3]Episode 3 Of 4:How To Increase Engagement on Twitter[EB/OL].https://www.sirensearch.co.uk/2019/05/23/episode-3-of-4-how-to-increase-engagement-on-twitter/.
[4]Weibo 2020 User Development Report[R/OL].https://data.weibo.com/report/reportDetail?id=456.
[5]TIAN J,PANG Y.Adjoin:A causal consistency model based on the adjacency list in a distributed system[J].Concurrency and Computation:Practice and Experience,2020,32(22):e5835.
[6]WHEATMAN B,XU H.Packed Compressed Sparse Row:ADynamic Graph Representation[C]//2018 IEEE High Perfor-mance Extreme Computing Conference(HPEC).Waltham,MA,USA,2018:1-7.
[7]BENDER M A,EBRAHIMI R,HU H,et al.B-Trees andCache-Oblivious B-Trees with Different-Sized Atomic Keys[J].ACM Transactions on Database Systems,2016,41(3):1-33.
[8]DE LEO D,BONCZ P.Packed Memory Arrays-Rewired[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).Macao,China,2019:830-841.
[9]ZHANG S,JIANG Z,HOU X, et al.DRONE:An Efficient Distributed Subgraph-Centric Framework for Processing Large-Scale Power-law Graphs[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(2):463-474.
[10]PANDEY P,WHEATMAN B,XU H,et al.Terrace:A Hierarchical Graph Container for Skewed Dynamic Graphs[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,New York,NY,USA,2021:1372-1385.
[11]FUCHS P,MARGAN D,GICEVA J.Sortledton:a universal,transactional graph data structure[J].The VLDB Endowment,2022,15(6):1173-1186.
[12]FENG G,MA Z,LI D,et al.RisGraph:A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s[C]//Proceedings of the 2021 International Conference on Management of Data(SIGMOD’21).Association for Computing Machinery,2021:513-527.
[13]SAFAEE S,MIRABI M,RAHMANI A M,et al.A distributed B+ Tree indexing method for processing range queries over streaming data[J].Cluster Computing,2024,27(3):1251-1274.
[14]HE J,YAO S,CAI L,et al.SLC-index:A scalable skip list-basedindex for cloud data processing[J].Journal of Central South University,2018,25(10):2438-2450.
[15]CHANG J W,CHEN T Y.When B-Tree Meets Skyrmion Memory:How Skyrmion Memory Affects an Indexing Scheme[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,41(11):3814-3825.
[16]BENDER M A,DEMAINE E D,et al.Cache-oblivious B-trees[J].SIAM Journal on Computing,2005,35(2):341-358.
[17]ARRIGO F,HIGHAM D J,NOFERINI V.Non-backtracking pagerank[J].Journal of Scientific Computing,2019,80(3):1419-1437.
[1] LU Ye-shan. Common Issues and Case Analysis of System Data Migration [J]. Computer Science, 2019, 46(6A): 412-416.
[2] ZHANG Yong, ZHANG Jie-hui and LIU Bin. Big Data Dynamic Migration Method Based on Global Load Balancing in Cloud Environment [J]. Computer Science, 2018, 45(1): 196-199.
[3] ZHENG Sheng and LI Tong. Data Placement Algorithm for Large-scale Storage System [J]. Computer Science, 2013, 40(Z11): 270-273.
[4] SHI Guang-yuan and ZHANG Yu. Hierarchical Storage Access Model Based on Multi-Attributes Measurement [J]. Computer Science, 2013, 40(Z11): 165-169.
[5] LUO Xiang-yu,WANG Yun and CHEN Xiao-mei. Evaluation and Analysis of Load Balancing Mechanisms in Storage Systems [J]. Computer Science, 2013, 40(9): 55-60.
[6] . Research on Evidence Collection under Cloud Computing Environment [J]. Computer Science, 2012, 39(9): 105-108.
[7] GE Xiong-zi,FENG Dan,LU Cheng-tao,JIN Chao. Dynamic Analysis Model of Green Network Storage Systems [J]. Computer Science, 2011, 38(8): 291-296.
[8] LIU Ke,QIN Lei-hua,ZHOU Jing-li,NIE Xue-jun,ZENG Dong. Two-phrase Retrieval Strategy in Content Aware Network Storage System [J]. Computer Science, 2011, 38(5): 20-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!