计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 354-366.doi: 10.11896/jsjkx.250500033

• 计算机体系结构 • 上一篇    下一篇

基于浮点向量压缩的分布式GMRES算法优化

陈疏桐1, 高建花2, 计卫星2, 李春峰1   

  1. 1 北京理工大学计算机学院 北京 100081
    2 北京师范大学人工智能学院 北京 100875
  • 收稿日期:2025-05-12 修回日期:2025-07-22 发布日期:2026-05-08
  • 通讯作者: 李春峰(lichunfeng@bit.edu.cn)
  • 作者简介:(ist_chen@163.com)
  • 基金资助:
    新一代人工智能国家科技重大专项(2022ZD0116311)

Optimizing Distributed GMRES Algorithm with Floating-point Vector Compression

CHEN Shutong1, GAO Jianhua2, JI Weixing2, LI Chunfeng1   

  1. 1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2 School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
  • Received:2025-05-12 Revised:2025-07-22 Online:2026-05-08
  • About author:CHEN Shutong,born in 2000,postgra-duate,is a student member of CCF(No.P4660G).His main research interest is high performance computing and applications.
    LI Chunfeng,born in 1995,Ph.D,is a member of CCF(No.P4285G).His main research interests include compu-ter architecture and high performance computing.
  • Supported by:
    National Science and Technology Major Project(2022ZD0116311).

摘要: 广义最小残差法(GMRES)是稀疏线性方程组迭代求解的主流算法之一,广泛应用于电路仿真、计算流体力学、自动驾驶等领域。为了高效求解大规模稀疏线性方程组,研究人员设计了分布式GMRES算法,通过节点间通信和分布式计算实现大规模问题的并行求解。然而,分布式GMRES算法中的稀疏矩阵向量乘(SpMV)计算引入了显著的浮点向量通信需求,导致浮点向量的通信开销成为主要性能瓶颈。基于重启动GMRES,提出了面向CPU-GPU异构系统的分布式重启动GMRES(DR-GMRES)算法,并针对算法和异构系统的特性设计了协作计算模式。在此基础上,针对SpMV计算对浮点向量的通信需求和GMRES算法内外迭代的数值特性,提出了双层混合压缩策略,包括有损与无损的混合压缩和内外迭代混合压缩,以保持一定求解精度的同时最小化DR-GMRES的通信开销。此外,设计了面向双层混合压缩策略的多缓冲区压缩通信架构,以实现高效的异步数据压缩。在16个大规模稀疏线性方程组求解问题上的实验结果表明,相比于原始分布式GMRES算法,基于混合数据压缩方法的DR-GMRES算法实现了平均3.37倍的加速比,最高实现了8.02倍的加速比,并且几乎不影响最终结果的计算精度。实际仿真验证表明,基于混合数据压缩方法的DR-GMRES算法相比原始算法在OSQP求解时间上实现了3.09倍的加速。

关键词: 广义最小残差算法, 分布式系统, 数据压缩, 通信优化

Abstract: The generalized minimal residuals(GMRES) method is one of the mainstream iterative algorithms for solving sparse linear systems,widely applied in circuit simulation,computational fluid dynamics,autonomous driving,and other fields.To efficiently solve large-scale sparse linear systems,researchers have developed distributed GMRES algorithms that use inter-node communication and distributed computing to achieve parallel solutions for large-scale problems.However,the sparse matrix-vector multiplication(SpMV) computation in distributed GMRES introduces significant communication demands for floating-point vectors,making the communication overhead of floating-point vectors a major performance bottleneck.Based on the restarted GMRES,this paper proposes a distributed restart GMRES(DR-GMRES) algorithm tailored for CPU-GPU heterogeneous systems,incorporating a collaborative computing model designed to align with the algorithm’s characteristics and heterogeneous architecture.Furthermore,for the communication requirements of floating-point vectors in SpMV and the numerical properties of GMRES’s inner-outer iterations,a two-layer hybrid compression strategy is proposed.This strategy combines lossy and lossless hybrid compression with inner-outer iteration hybrid compression to minimize the communication overhead of DR-GMRES while preserving acceptable solving accuracy.Additionally,a multi-buffer compressed communication architecture is designed to implement asynchronous data compression efficiently.Experimental results on 16 large-scale sparse linear systems demonstrate that,compared to the original distributed GMRES,the DR-GMRES algorithm based on hybrid data compression achieves an average speedup of ×3.37(with a maximum of ×8.02) with negligible impact on computational accuracy.Actual simulations experimentally verify that the DR-GMRES algorithm based on hybrid data compression achieves a ×3.09 acceleration in OSQP solving time over the baseline.

Key words: Generalized minimal residuals, Distributed system, Data compression, Communication optimization

中图分类号: 

  • TP311
[1]NARDEAN S,FERRONATO M,ABUSHAIKHA A.LinearSolvers for Reservoir Simulation Problems:An Overview and Recent Developments[J].Archives of Computational Methods in Engineering,2022,29(6):4341-4378.
[2]BARTELS R H,GOLUB G H.The Simplex Method of Linear Programming Using LU Decomposition[J].Communications of the ACM,1969,12(5):266-268.
[3]LIESEN J,STRAKOS Z.Krylov Subspace Methods:Principles and Analysis[M].Oxford:Oxford University Press,2012:268-274.
[4]SAAD Y,SCHULTZ M H.GMRES:A Generalized MinimalResidual Algorithm for Solving Nonsymmetric Linear Systems[J].SIAM Journal on Scientific and Statistical Computing,1986,7(3):856-869.
[5]YAN Z,WANG X,CHEN R.Exploring the Impact of Adenoid Obstruction on the Human Nasal Airflow Through Parallel Computational Fluid Dynamics[J].Engineering Reports,2025,7(1):e12972.
[6]CAO X,CHEN M,QI Q,et al.An Improved GMRES Method for Solving Electromagnetic Scattering Problems by MoM[J].IEEE Transactions on Antennas and Propagation,2022,70(11):10751-10757.
[7]CHEN B,CHENG J,YU W.A Practical Randomized GMRESAlgorithm for Solving Linear Equation System in Circuit Simulation[C]//Proceedings of the 30th Asia and South Pacific Design Automation Conference.New York:ACM,2025:183-189.
[8]GAO J,JI W,WANG Y.Optimization of Large-Scale SparseMatrix-Vector Multiplication on Multi-GPU Systems[J].ACM Transactions on Architecture and Code Optimization,2024,21(4):1-24.
[9]ZHANG Y,GUO S,GAO J,et al.Load Balancing Optimizations for Distributed GMRES Algorithm[C]//International Confe-rence on Algorithms and Architectures for Parallel Processing.Singapore:Springer,2024:47-56.
[10]XU Z,ALONSO J J,DARVE E.A Numerically Stable Communication-Avoiding-Step GMRES Algorithm[J].SIAM Journal on Matrix Analysis and Applications,2024,45(4):2039-2074.
[11]LINDQUIST N,LUSZCZEK P,DONGARRA J.AcceleratingRestarted GMRES with Mixed Precision Arithmetic[J].IEEE Transactions on Parallel and Distributed Systems,2021,33(4):1027-1037.
[12]LIU J,WANG Y,GAO J,et al.pSpMV:Precision-based Sparse Matrix Partition and SpMV Optimization[J].CCF Transactions on High Performance Computing,2024,6(6):549-565.
[13]GUO S Z,GAO J H,JI W X.Optimizing Distributed GMRES Algorithm with Mixed Precision[J].Computer Science,2024,51(9):15-22.
[14]ALIAGA J I,ANZT H,GRUTZMACHER T,et al.Compressed Basis GMRES on High-Performance Graphics Processing units[J].The International Journal of High Performance Computing Applications,2023,37(2):82-100.
[15]ZHANG J,DENG L,LI R,et al.Achieving High Performance and Portable Parallel GMRES Algorithm for Compressible Flow Simulations on Unstructured grids[J].The Journal of Supercomputing,2023,79(17):20116-20140.
[16]ZIANE KHODJA L,COUTURIRE R,GIERSCH A,et al.Pa-rallel Sparse Linear Solver With GMRES Method Using Minimization Techniques of Communications for GPU Clusters[J].The Journal of Supercomputing,2014,69:200-224.
[17]YANG B,LIU H,CHEN Z.Preconditioned GMRES Solver on Multiple-GPU Architecture[J].Computers & Mathematics with Applications,2016,72(4):1076-1095.
[18]MATSUMOTO K,IDOMURA Y,INA T,et al.Implementation and Performance Evaluation of a Communication-Avoiding GMRES Method for Stencil-Based Code on GPU Cluster[J].The Journal of Supercomputing,2019,75:8115-8146.
[19]LIN S,XIE Z.A Jacobi_PCG Solver for Sparse Linear Systems on Multi-GPU Cluster[J].The Journal of Supercomputing,2017,73:433-454.
[20]GHYSELS P,ASHBY T J,MEERBERGEN K,et al.HidingGlobal Communication Latency in the GMRES Algorithm on Massively Parallel Machines[J].SIAM Journal on Scientific Computing,2013,35(1):C48-C71.
[21]YAMAZAKI I,HOEMMEN M,LUSZCZEK P,et al.Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives[C]//2017 IEEE International Parallel and Distributed Processing Symposium Workshops(IPDPSW).New York:IEEE,2017:1118-1127.
[22]LI C,TANG M,TONG R,et al.P-cloth:Interactive ComplexCloth Simulation on Multi-GPU Systems Using Dynamic Matrix Assembly And Pipelined Implicit Integrators[J].ACM Transactions on Graphics,2020,39(6):1-15.
[23]MA X.An Overview of Recent Developments and Applications of the GMRES Method[J].Pure Math,2013,3:181-187.
[24]HUANG Y,DI S,YU X,et al.Cuszp:An ultra-fast Gpu Error-Bounded Lossy Compression Framework with Optimized end-to-end Performance[C]//Proceedings of the International Confe-rence for High Performance Computing,Networking,Storage and Analysis.New York:IEEE,2023:1-13.
[25]DAVIS T A,HU Y.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25.
[26]FENG M,ZHANG H.Application of Baidu Apollo Open Platform in a Course of Control Simulation Experiments[J].Computer Applications in Engineering Education,2022,30(3):892-906.
[27]STELLATO B,BANJAC G,GOULART P,et al.OSQP:AnOperator Splitting Solver for Quadratic Programs[J].Mathematical Programming Computation,2020,12(4):637-672.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!