计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 354-366.doi: 10.11896/jsjkx.250500033
陈疏桐1, 高建花2, 计卫星2, 李春峰1
CHEN Shutong1, GAO Jianhua2, JI Weixing2, LI Chunfeng1
摘要: 广义最小残差法(GMRES)是稀疏线性方程组迭代求解的主流算法之一,广泛应用于电路仿真、计算流体力学、自动驾驶等领域。为了高效求解大规模稀疏线性方程组,研究人员设计了分布式GMRES算法,通过节点间通信和分布式计算实现大规模问题的并行求解。然而,分布式GMRES算法中的稀疏矩阵向量乘(SpMV)计算引入了显著的浮点向量通信需求,导致浮点向量的通信开销成为主要性能瓶颈。基于重启动GMRES,提出了面向CPU-GPU异构系统的分布式重启动GMRES(DR-GMRES)算法,并针对算法和异构系统的特性设计了协作计算模式。在此基础上,针对SpMV计算对浮点向量的通信需求和GMRES算法内外迭代的数值特性,提出了双层混合压缩策略,包括有损与无损的混合压缩和内外迭代混合压缩,以保持一定求解精度的同时最小化DR-GMRES的通信开销。此外,设计了面向双层混合压缩策略的多缓冲区压缩通信架构,以实现高效的异步数据压缩。在16个大规模稀疏线性方程组求解问题上的实验结果表明,相比于原始分布式GMRES算法,基于混合数据压缩方法的DR-GMRES算法实现了平均3.37倍的加速比,最高实现了8.02倍的加速比,并且几乎不影响最终结果的计算精度。实际仿真验证表明,基于混合数据压缩方法的DR-GMRES算法相比原始算法在OSQP求解时间上实现了3.09倍的加速。
中图分类号:
| [1]NARDEAN S,FERRONATO M,ABUSHAIKHA A.LinearSolvers for Reservoir Simulation Problems:An Overview and Recent Developments[J].Archives of Computational Methods in Engineering,2022,29(6):4341-4378. [2]BARTELS R H,GOLUB G H.The Simplex Method of Linear Programming Using LU Decomposition[J].Communications of the ACM,1969,12(5):266-268. [3]LIESEN J,STRAKOS Z.Krylov Subspace Methods:Principles and Analysis[M].Oxford:Oxford University Press,2012:268-274. [4]SAAD Y,SCHULTZ M H.GMRES:A Generalized MinimalResidual Algorithm for Solving Nonsymmetric Linear Systems[J].SIAM Journal on Scientific and Statistical Computing,1986,7(3):856-869. [5]YAN Z,WANG X,CHEN R.Exploring the Impact of Adenoid Obstruction on the Human Nasal Airflow Through Parallel Computational Fluid Dynamics[J].Engineering Reports,2025,7(1):e12972. [6]CAO X,CHEN M,QI Q,et al.An Improved GMRES Method for Solving Electromagnetic Scattering Problems by MoM[J].IEEE Transactions on Antennas and Propagation,2022,70(11):10751-10757. [7]CHEN B,CHENG J,YU W.A Practical Randomized GMRESAlgorithm for Solving Linear Equation System in Circuit Simulation[C]//Proceedings of the 30th Asia and South Pacific Design Automation Conference.New York:ACM,2025:183-189. [8]GAO J,JI W,WANG Y.Optimization of Large-Scale SparseMatrix-Vector Multiplication on Multi-GPU Systems[J].ACM Transactions on Architecture and Code Optimization,2024,21(4):1-24. [9]ZHANG Y,GUO S,GAO J,et al.Load Balancing Optimizations for Distributed GMRES Algorithm[C]//International Confe-rence on Algorithms and Architectures for Parallel Processing.Singapore:Springer,2024:47-56. [10]XU Z,ALONSO J J,DARVE E.A Numerically Stable Communication-Avoiding-Step GMRES Algorithm[J].SIAM Journal on Matrix Analysis and Applications,2024,45(4):2039-2074. [11]LINDQUIST N,LUSZCZEK P,DONGARRA J.AcceleratingRestarted GMRES with Mixed Precision Arithmetic[J].IEEE Transactions on Parallel and Distributed Systems,2021,33(4):1027-1037. [12]LIU J,WANG Y,GAO J,et al.pSpMV:Precision-based Sparse Matrix Partition and SpMV Optimization[J].CCF Transactions on High Performance Computing,2024,6(6):549-565. [13]GUO S Z,GAO J H,JI W X.Optimizing Distributed GMRES Algorithm with Mixed Precision[J].Computer Science,2024,51(9):15-22. [14]ALIAGA J I,ANZT H,GRUTZMACHER T,et al.Compressed Basis GMRES on High-Performance Graphics Processing units[J].The International Journal of High Performance Computing Applications,2023,37(2):82-100. [15]ZHANG J,DENG L,LI R,et al.Achieving High Performance and Portable Parallel GMRES Algorithm for Compressible Flow Simulations on Unstructured grids[J].The Journal of Supercomputing,2023,79(17):20116-20140. [16]ZIANE KHODJA L,COUTURIRE R,GIERSCH A,et al.Pa-rallel Sparse Linear Solver With GMRES Method Using Minimization Techniques of Communications for GPU Clusters[J].The Journal of Supercomputing,2014,69:200-224. [17]YANG B,LIU H,CHEN Z.Preconditioned GMRES Solver on Multiple-GPU Architecture[J].Computers & Mathematics with Applications,2016,72(4):1076-1095. [18]MATSUMOTO K,IDOMURA Y,INA T,et al.Implementation and Performance Evaluation of a Communication-Avoiding GMRES Method for Stencil-Based Code on GPU Cluster[J].The Journal of Supercomputing,2019,75:8115-8146. [19]LIN S,XIE Z.A Jacobi_PCG Solver for Sparse Linear Systems on Multi-GPU Cluster[J].The Journal of Supercomputing,2017,73:433-454. [20]GHYSELS P,ASHBY T J,MEERBERGEN K,et al.HidingGlobal Communication Latency in the GMRES Algorithm on Massively Parallel Machines[J].SIAM Journal on Scientific Computing,2013,35(1):C48-C71. [21]YAMAZAKI I,HOEMMEN M,LUSZCZEK P,et al.Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives[C]//2017 IEEE International Parallel and Distributed Processing Symposium Workshops(IPDPSW).New York:IEEE,2017:1118-1127. [22]LI C,TANG M,TONG R,et al.P-cloth:Interactive ComplexCloth Simulation on Multi-GPU Systems Using Dynamic Matrix Assembly And Pipelined Implicit Integrators[J].ACM Transactions on Graphics,2020,39(6):1-15. [23]MA X.An Overview of Recent Developments and Applications of the GMRES Method[J].Pure Math,2013,3:181-187. [24]HUANG Y,DI S,YU X,et al.Cuszp:An ultra-fast Gpu Error-Bounded Lossy Compression Framework with Optimized end-to-end Performance[C]//Proceedings of the International Confe-rence for High Performance Computing,Networking,Storage and Analysis.New York:IEEE,2023:1-13. [25]DAVIS T A,HU Y.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25. [26]FENG M,ZHANG H.Application of Baidu Apollo Open Platform in a Course of Control Simulation Experiments[J].Computer Applications in Engineering Education,2022,30(3):892-906. [27]STELLATO B,BANJAC G,GOULART P,et al.OSQP:AnOperator Splitting Solver for Quadratic Programs[J].Mathematical Programming Computation,2020,12(4):637-672. |
|
||