计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 125-132.doi: 10.11896/jsjkx.250200062

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向并发图分析的局部性感知的缓存管理策略

李汉桥1, 赵苑君2   

  1. 1 武昌首义学院基础科学部 武汉 430064
    2 华中科技大学 武汉 430074
  • 收稿日期:2025-02-17 修回日期:2025-04-02 出版日期:2025-12-15 发布日期:2025-12-09
  • 通讯作者: 赵苑君(138250205@qq.com)
  • 作者简介:(48250591@qq.com)

Locality-aware Cache Management Strategy for Concurrent Graph Analysis

LI Hanqiao1, ZHAO Yuanjun2   

  1. 1 Basic Science Department, Wuchang Shouyi University, Wuhan 430064, China
    2 Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2025-02-17 Revised:2025-04-02 Published:2025-12-15 Online:2025-12-09
  • About author:LI Hanqiao,born in 1983,master,associate professor.His main research in-terests include federated learning,graph neural networks and augmented reality.
    ZHAO Yuanjun,born in 1988,master.Her main research interest is software system.

摘要: 随着图计算技术的蓬勃发展,现有图平台上通常运行着大量的并发图分析任务以获得数据背后的价值。因此,并发图计算技术被广泛应用于智能教育、公共管理和新闻媒体等领域。然而,目前图计算系统大多为执行单个图分析任务设计,在支持并发图分析任务时存在大量冗余数据访问。尽管一些工作已经观察到这一问题,并试图利用其中的时间和空间局部性共享底层图数据减少冗余数据访问,但是其忽视了私有状态数据更新访问的数据局部性,依然面临着缓存利用率低的问题,导致系统吞吐率低。为此,提出了面向并发图分析的局部性感知的缓存管理策略CCG,以充分感知并发图分析任务之间的时间和空间局部性,减少冗余数据访问和同步开销。具体而言,该策略通过高效缓存数据的更新并以增量的方式合并冗余更新,利用并发图分析任务的数据局部性,实现内存数据的高效批量更新,减少数据访问开销并避免缓存抖动,有效提升并发图分析任务的吞吐率。同时,高效利用多级缓存进行分层缓冲与合并,让并发图分析任务在更新访问私有数据时避免同步开销和锁开销,进一步提升系统吞吐率。实验结果显示,在用目前最新的并发图计算系统Glign运行并发图分析任务时,相比于现有最好的图计算缓存策略GRASP,CCG可以将系统吞吐率提升2.3~7.8倍。

关键词: 并发图分析, 缓存架构, 数据局部性, 缓存抖动, 吞吐率

Abstract: With the rapid growth of graph computing,modern graph platforms routinely execute a large number of concurrent graph analytics tasks to extract the latent value in massive datasets.Consequently,concurrent graph processing has been widely adopted in domains,including intelligent education,public administration,and news media.However,most existing graph proces-sing systems are originally designed for single-task execution and suffer from excessive redundant data accesses when handling concurrent workloads.Although prior studies have observed significant redundancy in in-memory graph data across concurrent tasks and have attempted to exploit temporal and spatial locality to share underlying graph data,they largely overlook the data locality in private state updates.This limitation leads to low cache utilization and,ultimately,degraded system throughput.To address this challenge,this paper proposes CCG,a locality-aware cache management strategy for concurrent graph analysis,which fully exploits both temporal and spatial locality across tasks to reduce redundant data accesses and synchronization overhead.Specifically,CCG efficiently buffers and incrementally merges redundant updates,leveraging data locality to perform high-throughput batch updates in memory.This design minimizes access costs,mitigates cache thrashing,and significantly improves concurrency performance.Moreover,CCG employs a multi-level cache hierarchy to enable layered buffering and merging,thereby eliminating synchronization and locking overhead during private state updates.Experimental results show that CCG improves system throughput by 2.3×~7.8× over GRASP.

Key words: Concurrent graph analysis, Cache architecture, Data locality, Cache thrashing, Throughput

中图分类号: 

  • TP391
[1]PAGE L,BRIN S,MOTWANI R,et al.The PageRank citation ranking:bringing order to the web[R].Stanford:Stanford InfoLab,1998.
[2]ZHANG Y,GAO Q,GAO L,et al.Maiter:An asynchronousgraph processing framework for delta-based accumulative iterative computation[J].IEEE Transactions on Parallel and Distri-buted Systems,2013,25(8):2091-2100.
[3]SHUN J,BLELLOCH G E.Ligra:a lightweight graph processing framework for shared memory[C]//Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2013:135-146.
[4]QIAO S J,GUO J,HAN N,et al.Parallel Algorithm for Discovering Communities in Large-Scale Complex Netwoks[J].Chinese Journal of Computers,2017,40(3):687-700.
[5]YANG K,ZHANG M X,CHEN K,et al.Knightking:a fast distributed graph random walk engine[C]//Proceedings of the 2019 ACM Symposium on Operating Systems Principles.2019:524-537.
[6]ZHANG Y,LIAO X,GU L,et al.AsynGraph:Maximizing data parallelism for efficient iterative graph processing on GPUs[J].ACM Transactions on Architecture and Code Optimization(TACO),2020,17(4):1-21.
[7]LOW Y,GONZALEZ J,KYROLA A,et al.Distributed gra-phlab:A framework for machine learning in the cloud[J].ar-Xiv:1204.6078,2012.
[8]XUE J,YANG Z,QU Z,et al.Seraph:an efficient,low-cost system for concurrent graph processing[C]//Proceedings of the 2014 International Symposium on High-performance Parallel and Distributed Computing.2014:227-238.
[9]ZHANG Y,LIAO X,JIN H,et al.{CGraph}:A correlations-aware approach for efficient concurrent iterative graph processing[C]//Proceedings of the 2018 USENIX Annual Technical Conference.2018:441-452.
[10]ZHAO J,ZHANG Y,LIAO X,et al.GraphM:an efficient sto-rage system for high throughput of concurrent graph processing[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2019:1-14.
[11]CHEN H,SHEN M,XIAO N,et al.Krill:a compiler and runtime system for concurrent graph processing[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-16.
[12]YIN X,ZHAO Z,GUPTA R.Glign:Taming misaligned graphtraversals in concurrent graph processing[C]//Proceedings of the 2022 ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2022:78-92.
[13]PARK J S,PENNER M,PRASANNA V K.Optimizing graph algorithms for improved cache performance[J].IEEE Transactions on Parallel and Distributed Systems,2004,15(9):769-782.
[14]SAFRO I,TEMKIN B.Multiscale approach for the networkcompression-friendly ordering[J].Journal of Discrete Algorithms,2011,9(2):190-202.
[15]BANERJEE J,KIM W,KIM S J,et al.Clustering a DAG for CAD Databases[J].IEEE Transactions on Software Enginee-ring,1988,14(11):1684-1699.
[16]BALAJI V,CRAGO N,JALEEL A,et al.P-opt:Practical optimal cache replacement for graph analytics[C]//Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture.IEEE,2021:668-681.
[17]JALEEL A,THEOBALD K B,STEELY JR S C,et al.Highperformance cache replacement using re-reference interval prediction(RRIP)[J].ACM SIGARCH Computer Architecture News,2010,38(3):60-71.
[18]FALDU P,DIAMOND J,GROT B.Domain-specialized cachemanagement for graph analytics[C]//Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture.IEEE,2020:234-248.
[19]HAM T J,WU L,SUNDARAM N,et al.Graphicionado:Ahigh-performance and energy-efficient accelerator for graph analytics[C]//Proceedings of the 2016 Annual IEEE/ACM International Symposium on Microarchitecture.IEEE,2016:1-13.
[20]NGUYEN D,LENHARTH A,PINGALI K.A lightweight infrastructure for graph analytics[C]//Proceedings of the 2013 ACM Symposium on Operating Systems Principles.2013:456-471.
[21]SANCHEZ D,KOZYRAKIS C.ZSim:Fast and accurate microarchitectural simulation of thousand-core systems[J].ACM SIGARCH Computer architecture news,2013,41(3):475-486.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!