计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 43-48.doi: 10.11896/jsjkx.201200129
文敏华1, 汪申鹏1, 韦建文1, 李林颖2, 张斌2, 林新华1
WEN Min-hua1, WANG Shen-peng1, WEI Jian-wen1, LI Lin-ying2, ZHANG Bin2, LIN Xin-hua1
摘要: 湍流燃烧问题的数值模拟是航空发动机设计的关键工具。由于需要使用高精度计算模型求解NS方程,湍流燃烧的数值模拟需要庞大的计算量,而物理化学模型的引入则导致流场极为复杂,使得计算域内的负载平衡问题成为大规模并行计算的瓶颈。为此文中将湍流燃烧的数值模拟方法在单台具有强大计算能力的服务器——DGX-2上进行移植和优化,设计了通量计算的线程分配方式,并以Roofline模型为工具分析指导了实际的优化方向。此外,还设计了高效的数据通信方式,并结合DGX-2的高速互联实现了湍流燃烧数值模拟方法的多GPU并行版本。实验结果表明,相较于双路Intel Xeon 6248 CPU 40核心的并行版本,迭代过程的计算部分在单块V100上获得了8.1倍的性能提升,在DGX-2共16块V100上达到了66.1倍的加速,优于CPU并行版本所能达到的最高性能。
中图分类号:
[1]WU C.Study on applicability of turbulent combustion model in the numerical calculation of combustor[D].Shenyang:Shenyang Institute of Aeronautical Engineering,2009. [2]MOIN P,MAHESH K.Direct numerical simulation:a tool in turbulence research[J].Annual Review of Fluid Mechanics,1998,30(1):539-578. [3]PITSCH H.Large-eddy simulation of turbulent combustion[J].Annu. Rev. Fluid Mech.,2006,38:453-482. [4]KRÜGER J,WESTERMANN R.Linear algebra operators for GPU implementation of numerical algorithms[M]//ACM SIGGRAPH 2005 Courses.2005:234-242. [5]GOODNIGHT N,WOOLLEY C,LEWIN G,et al.A multigrid solver for boundary value problems using programmable grap-hics hardware[M]//ACM SIGGRAPH 2005 Courses.2005:193-203. [6]AISSA M,VERSTRAETE T,VUIK C.Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes[J].Computers & Mathematics with Applications,2017,74(1):201-217. [7]PHILLIPS E,ZHANG Y,DAVIS R,et al.Rapid aerodynamic performance prediction on a cluster of graphics processing units[C]//47th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition.2009:565. [8]JACOBSEN D,THIBAULT J,SENOCAK I.An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters[C]//48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition.2010:522. [9]BOLZ J,FARMER I,GRINSPUN E,et al.Sparse matrix sol- vers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics(TOG),2003,22(3):917-924. [10]CORRIGAN A,CAMELLI F F,LÖHNER R,et al.Running unstructured grid-based CFD solvers on modern graphics hardware[J].International Journal for Numerical Methods in Fluids,2011,66(2):221-229. [11]NGUYEN M T,CASTONGUAY P,LAURENDEAU E.GPU parallelization of multigrid RANS solver for three-dimensional aerodynamic simulations on multiblock grids[J].The Journal of Supercomputing,2019,75(5):2562-2583. [12]OYARZUN G,CHALMOUKIS I A,LEFTHERIOTIS G A,et al.A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers[J].Applied Mathematical Modelling,2020,85:141-156. [13]LI A,SONG S L,CHEN J,et al.Evaluating modern gpu interconnect:Pcie,nvlink,nv-sli,nvswitch and gpudirect[J].IEEE Transactions on Parallel and Distributed Systems,2019,31(1):94-110. [14]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:an insightful visual performance model for multicore architectures[J].Communications of the ACM,2009,52(4):65-76. [15]BUTCHER J C.On the implementation of implicit Runge-Kutta methods[J].BIT Numerical Mathematics,1976,16(3):237-240. [16]ZHONG X.Additive semi-implicit Runge-Kutta methods for computing high-speed nonequilibrium reactive flows[J].Journal of Computational Physics,1996,128(1):19-31. [17]THIBAULT J,SENOCAK I.CUDA implementation of a Na- vier-Stokes solver on multi-GPU desktop platforms for incompressible flows[C]//47th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Sxposition.2009:758. |
[1] | 汪晋, 刘江. 基于GPU的并行DILU预处理技术 GPU-based Parallel DILU Preconditioning Technique 计算机科学, 2022, 49(6): 108-118. https://doi.org/10.11896/jsjkx.210300259 |
[2] | 汪亮, 周新志, 严华. 基于GPU的实时SIFT算法 Real-time SIFT Algorithm Based on GPU 计算机科学, 2020, 47(8): 105-111. https://doi.org/10.11896/jsjkx.190700036 |
[3] | 许新鹏, 胡斌星. 基于ICCG法的飞行器部件强度校核快速计算方法 Fast Calculation Method of Aircraft Component Strength Check Based on ICCG 计算机科学, 2020, 47(11A): 624-627. https://doi.org/10.11896/jsjkx.191100154 |
[4] | 郑红波, 石豪, 杜轶诚, 张美玉, 秦绪佳. 光照不均匀的结构光图像的条纹快速提取方法 Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination 计算机科学, 2019, 46(5): 272-278. https://doi.org/10.11896/j.issn.1002-137X.2019.05.042 |
[5] | 张劼,文敏华,林新华,孟德龙,陆豪. 基于历史模拟法的风险价值算法在GPU上的实现和优化 Implementation and Optimization of Historical VaR on GPU 计算机科学, 2018, 45(5): 291-294. https://doi.org/10.11896/j.issn.1002-137X.2018.05.050 |
[6] | 周筠, 蒋富. 基于CUDA架构的改进Marching Cubes算法 Improved Marching Cubes Based on CUDA 计算机科学, 2018, 45(11A): 573-575. |
[7] | 刘端阳, 郑江帆, 沈国江, 刘志. 基于CUDA的k-means算法并行化研究 Study on Parallel K-means Algorithm Based on CUDA 计算机科学, 2018, 45(11): 292-297. https://doi.org/10.11896/j.issn.1002-137X.2018.11.047 |
[8] | 武昱, 闫光辉, 王雅斐, 马青青, 刘宇轩. 结合GPU技术的并行CP张量分解算法 Parallel CP Tensor Decomposition Algorithm Combining with GPU Technology 计算机科学, 2018, 45(11): 298-303. https://doi.org/10.11896/j.issn.1002-137X.2018.11.048 |
[9] | 徐启航,游安清,马社,崔云俊. 基本图像处理算法的优化过程研究 Study on Optimizations of Basic Image Processing Algorithm 计算机科学, 2017, 44(Z6): 169-172. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.039 |
[10] | 沈洪,李晓光. 图像显著估计的并行算法研究 Research on Parallel Algorithm of Image Saliency Estimation 计算机科学, 2017, 44(12): 266-273. https://doi.org/10.11896/j.issn.1002-137X.2017.12.048 |
[11] | 韦博文,李涛,李广宇,汪致恒,何沐,师悦龄,刘路遥,张瑞. 使用OpenCL技术的影像快速畸变纠正方法在异构平台上的应用分析 Applied Analysis of Image Accelerating Distortion Correction of OpenCL Technology on Heterogeneous Platform 计算机科学, 2016, 43(Z11): 167-169. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.036 |
[12] | 潘茜,张育平,陈海燕. 基于CUDA的并行K-近邻连接算法实现 Implementation of Parallel K-Nearest Neighbor Join Algorithm Based on CUDA 计算机科学, 2016, 43(10): 190-192. https://doi.org/10.11896/j.issn.1002-137X.2016.10.035 |
[13] | 张杰,柴志雷,喻津. 基于GPU的图像特征并行计算方法 Parallel Computation Method of Image Features Based on GPU 计算机科学, 2015, 42(10): 297-300. |
[14] | 余莹,李肯立,郑光勇. 一种基于GPU集群的深度优先并行算法设计与实现 Implementation of Depth First Search Parallel Algorithm on Cluster of GPUs 计算机科学, 2015, 42(1): 82-85. https://doi.org/10.11896/j.issn.1002-137X.2015.01.019 |
[15] | 阳王东,李肯立,石林. 一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现 Quasi-diagonal Matrix Hybrid Compression Algorithm and Implementation for SpMV on GPU 计算机科学, 2014, 41(7): 290-296. https://doi.org/10.11896/j.issn.1002-137X.2014.07.060 |
|