计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 6-12.doi: 10.11896/jsjkx.191000009
刘世芳1,2, 赵永华1, 于天禹1,2, 黄荣锋1,2
LIU Shi-fang1,2, ZHAO Yong-hua1, YU Tian-yu1,2, HUANG Rong-feng1,2
摘要: 广义稠密对称特征问题的求解是许多应用科学和工程的主要任务,并且是计算电磁学、电子结构、有限元模型和量子化学等计算中的重要部分。将广义对称特征问题转化为标准对称特征问题是求解广义稠密对称特征问题的关键计算步骤。针对GPU集群,文中给出了广义稠密对称特征问题标准化块算法在GPU集群上基于MPI+CUDA的实现。为了适应GPU集群的架构,广义对称特征问题标准化算法将正定矩阵的Cholesky分解与传统的广义特征问题标准化块算法相结合,降低了标准化算法中不必要的通信开销,并且增强了算法的并行性。在基于MPI+CUDA的标准化算法中,GPU与CPU之间的数据传输操作被用来掩盖GPU内的数据拷贝操作,这消除了拷贝所花费的时间,进而提高了程序的性能。同时,文中还给出了矩阵在二维通信网格中行通信域和列通信域之间完全并行的点对点的转置算法和基于MPI+CUDA的具有多个右端项的三角矩阵方程BX=A求解的并行块算法。在中科院计算机网络信息中心的超级计算机系统“元”上,每个计算节点配置 2 块 Nvidia Tesla K20 GPGPU卡及2 颗 Intel E5-2680 V2处理器,使用多达32个GPU对不同规模矩阵的基于MPI+CUDA的广义对称特征问题标准化算法进行测试,取得了较好的加速效果与性能,并且具有良好的可扩展性。当使用32个GPU对50000×50000阶的矩阵进行测试时,峰值性能达到了约9.21Tflops。
中图分类号:
[1]KENT P C.Computational challenges of large-scale,long-time,first-principles molecular dynamics[J].Journal of Physics:Conference Series,2008,125:012058. [2]AUCKENTHALER T,BLUM V,BUNGARTZ H J,et al.Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations[J].Parallel Computing,2011,37(12):783-794. [3]SINGH D J.Introduction to the LAPW method[M]//Planewaves,Pseudopotentials and the LAPW Method.Boston,MA:Springer US,1994:35-43. [4]CHEN X S,VONG S W,LI W,et al.Noda iterations for generalized eigenproblems following Perron-Frobenius theory[J].Numerical Algorithms,2019,80(3):937-955. [5]CUCURINGU M,DAVIES P,GLIELMO A,et al.SPONGE:A generalized eigenproblem for clustering signed networks[J].arXiv:1904.08575,2019. [6]BUTKOVICˇ P,JONES D.On special cases of the generalized max-plus eigenproblem[J].SIAM Journal on Matrix Analysis and Applications,2016,37(3):1002-1021. [7]BLACKFORD L S,CHOI J,CLEARY A,et al.ScaLAPACK users’ guide[M].Philadelphia:SIAM,1997. [8]ALPATOV P,BAKER G,EDWARDS C,et al.PLAPACK Parallel Linear Algebra Package Design Overview[C]//Proceedings of the 1997 ACM/IEEE Conference on Supercomputing.IEEE,1997:29-29. [9]BAKER G,GUNNELS J,MORROW G,et al.PLAPACK:highperformance through high-level abstraction[C]// 1998 International Conference on Parallel Processing (Cat.No.98EX205).IEEE,1998. [10]IMAMURA T,YAMADA S,MACHIDA M.Development of a high-performance eigensolver on a peta-scale next-generation supercomputer system[J].Progress in Nuclear Science and Technology,2011,2:643-650. [11]POULSON J,MARKER B,GEIJN R A V D,et al.Elemental:A new framework for distributed memory dense matrix computations[J].ACM Transactions on Mathematical Software (TOMS),2013,39(2):13. [12]PETSCHOW M,PEISE E,BIENTINESI P.High-performance solvers for dense Hermitian eigenproblems[J].SIAM Journal on Scientific Computing,2013,35(1):C1-C22. [13]HENDRICKSON B,JESSUP E,SMITH C.Toward an efficient parallel eigensolver for dense symmetric matrices[J].SIAM Journal on Scientific Computing,1998,20(3):1132-1154. [14]TOMOV S,NATH R,DU P,et al.MAGMA Users’ Guide[EB/OL].ICL,UTK (November 2009).https://icl.cs.utk.edu/projectsfiles/magma/doxygen/. [15]HAIDAR A,SOLCÁR,GATES M,et al.Leading edge hybrid multi-GPU algorithms for generalized eigenproblems in electronic structure calculations[M]//Lecture Notes in Computer Science.Berlin:Springer,2013:67-80. [16]ZHAO Y H,CHI X B,CHENG Q.Efficient parallel blocked algorithms for generalized Hermitian eigenproblem[J].Journal of Computer Research and Development,2007,44(10):1724-1732. [17]LICHTENSTEIN W,JOHNSSON S L.Block-cyclic dense linear algebra[J].SIAM Journal on Scientific Computing,1993,14(6):1259-1288. [18]BOLEN J,DAVIS A,DAZEY B,et al.Massively parallel distributed computing:World's first 281 gigaflop supercomputer[EB/OL].Proceedings of the Intel Supercomputer Users Group.https://pdfs.semanticscholar.org/52ed/b836555ead0769708968adbb58a64120d34b.pdf. [19]BISSELING R H,VAN DE VORST J G G.Parallel LU decomposition on a transputer network[C]//Conference Organized by Koninklijke/Shell-Laboratory,Amsterdam.Berlin:Springer,1988:61-77. |
[1] | 许新鹏, 胡斌星. 基于ICCG法的飞行器部件强度校核快速计算方法 Fast Calculation Method of Aircraft Component Strength Check Based on ICCG 计算机科学, 2020, 47(11A): 624-627. https://doi.org/10.11896/jsjkx.191100154 |
[2] | 王梅,王莎莎,孙莺萁,宋考平,田枫,廖士中. SVRRPMCC:一种支持向量回归机的正则化路径近似算法 SVRRPMCC:A Regularization Path Approximation Algorithm of Support Vector Regression 计算机科学, 2017, 44(12): 42-47. https://doi.org/10.11896/j.issn.1002-137X.2017.12.008 |
[3] | 陆慧娟,魏莎莎,关伟,缪燕子. 基于鱼群优化算法和Cholesky分解的RELM的基因表达数据分类 Improved RELM Based on Fish Swarm Optimization Algorithm and Cholesky Decomposition for Gene Expression Data Classification 计算机科学, 2014, 41(12): 226-230. https://doi.org/10.11896/j.issn.1002-137X.2014.12.049 |
|