计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 197-201.doi: 10.11896/j.issn.1002-137X.2017.04.043

• 高性能计算 • 上一篇    下一篇

基于MIC集群平台的GMRES算法并行加速

王明清,李明,张清,张广勇,吴韶华   

  1. 浪潮集团高效能服务器和存储技术国家重点实验室 济南250101,太原理工大学数学学院 太原030024,浪潮集团高效能服务器和存储技术国家重点实验室 济南250101,浪潮集团高效能服务器和存储技术国家重点实验室 济南250101,浪潮集团高效能服务器和存储技术国家重点实验室 济南250101
  • 出版日期:2018-11-13 发布日期:2018-11-13

Speedup of GMRES Based on MIC Heterogeneous Cluster Platform

WANG Ming-qing, LI Ming, ZHANG Qing, ZHANG Guang-yong and WU Shao-hua   

  • Online:2018-11-13 Published:2018-11-13

摘要: 广义极小残量法(GMRES)是最常用的求解非对称大规模稀疏线性方程组的方法之一,其收敛速度快且稳定性良好。Intel Xeon Phi众核协处理器(MIC)具有计算能力强、易编程、易移植等特点。采用MPI+OpenMP+offload混合编程模型将GMRES算法移植到MIC集群平台上。采用进程间集合通信异步隐藏、数据传输优化、向量化以及线程亲和性优化等多种手段,大幅提升了GMRES算法的求解效率。最后将并行算法应用到“局部径向基函数求解高维偏微分方程”问题的求解中。测试表明,CPU节点集群上开启32个进程,并行效率高达71.74%,4块MIC卡的最高加速性能可达单颗CPU的7倍。

关键词: 广义极小残量法,MIC,MPI,大规模线性方程组

Abstract: Generalized minimal residual method (GMRES) is the most commonly used method for solving asymmetric large-scale linear algebraic equations,and it has fast convergence and stable property.Intel many integrated co-processors (MIC) has strong computing power and it can program easily.In this paper,MPI+OpenMP+offload hybrid programming paradigm was used to port GMRES algorithm to the MIC heterogeneous cluster platform.The perfor-mance of GMRES parallel algorithm was greatly improved by using kinds of optimization methods,such as hiding collective communications using asynchronous execution model,vectorization optimization,data transfer optimization,extensibility of MIC thread optimization,etc.Finally,GMRES parallel algorithm was used to improve the perfomance of solving high dimensional PDEs by the localized radical basis functions (RBFs) collocation methods.Results from tests indicate that the parallel efficiency can be up to 71.74% when using 32 processes in cluster,and the maximum speedup ratio of 4 MICs to 1 CPU can be up to 7.

Key words: GMRES,MIC,MPI,Large-scale linear algebraic equations

[1] SAAD Y,SCHULTZ M H.GMRES:a generalized minimal residual algorithm for solving nonsymmetric linear systems[J].Siam Journal on Scientific & Statistical Computing,1986,7(3):856-869.
[2] SAAD Y.Iterative methods for sparse linear systems(2nd ed)[M].Philadelphia:SIAM,2003:151-126.
[3] NACHTIGAL N M,REICHEL L, TREFETHENK L NL N.A hybrid GMRES algorithm for nonsymmetric linear systems[J].Siam Journal on Scientific & Statistical Computing,1992,13(3):796-825.
[4] QUAN Z,XIANG S H.A GMRES based polynomal preconditioning algorithm[J].Mathematical Numerical Sinica,2006,28(4):365-376.(in Chinese).全忠,向淑晃.基于GMRES的多项式预处理广义极小残差法[J].计算数学,2006,28(4):365-376.
[5] GHAEMIAN N,ABDOLLAHZADEH A,HEINEMANN Z.Accelerating the GMRES Iterative Linear Solver of an Oil Re-servoir Simulator using the Multi-Processing Power of Compute Unified Device Architecture of Graphics Cards[C]∥Proceedings of the 9th International Workshop on State of the Art in Scientific and Parallel Computing.Heidelberg:Springer,2008:156-159.
[6] WANG M L,KLIE H,PARASHAR M,et al.Solving Sparse Linear Systems on NVIDIA Tesla GPUs[C]∥Computational Science-ICCS 2009 Lecture Notes in Computer Science.2009,5544:864-873.
[7] LIU Y Q,YIN K X,WU E H.Fast GMRES-GPU Solver Large Scale Sparse Linear Systems[J].Journal of Computer-Aided Design & Computer Graphics,2011,23(4):553-560.(in Chinese) 柳有权,尹康学,吴恩华.大规模稀疏线性方程组的GMRES-GPU快速求解算法[J].计算机辅助设计与图形学学报,2011,23(4):553-560.
[8] GHYSELS P,ASHBY T J,MEERBERGEN K,et al.Hidingglobal communication latency in the GMRES algorithm on massively parallel computers[J].Siam Journal on Scientific Computing,2013,35(1):48-71.
[9] 王恩东,张清,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012.
[10] JEFFERS J,REINDERS J.Intel Xeon Phi Coprocessor HighPerformance Programming[R].morgan kaufmann.2013.
[11] LI M,CHEN W,CHEN C S.The localized RBFs collocationmethods for solving high dimensional PDEs[J].Engineering Analysis with Boundary Elements,2013,37(10):1300-1304.
[12] BELLALIJ M,REICHEL L,SADOK H.Some properties ofrange restricted GMRES methods[J].Journal of Computational &Applied Mathematics,2015,290:310-318.
[13] HE K,TAN X D,ZHAO H Y,et al.Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms[J].Integration the VLSI Journal,2016,52(c):10-22.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!