Computer Science ›› 2021, Vol. 48 ›› Issue (12): 43-48.doi: 10.11896/jsjkx.201200129

• Computer Architecture • Previous Articles     Next Articles

DGX-2 Based Optimization of Application for Turbulent Combustion

WEN Min-hua1, WANG Shen-peng1, WEI Jian-wen1, LI Lin-ying2, ZHANG Bin2, LIN Xin-hua1   

  1. 1 Center for High Performance Computing,Shanghai Jiao Tong University,Shanghai 200240,China
    2 School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai 200240,China
  • Received:2020-12-14 Revised:2021-04-09 Online:2021-12-15 Published:2021-11-26
  • About author:WEN Min-hua,born in 1988,associate engineer,is a member of China Compu-ter Federation.His main research in-terests include engineering computing and so on.
    LIN Xin-hua,born in 1979,senior en-gineer,is a member of China Computer Federation.His main research interests include performance modeling and optimization.
  • Supported by:
    National Key Research and Development Program of China(2016YFB0201800).

Abstract: Numerical simulation of turbulent combustion is a key tool for aeroengine design.Due to the need of high-precision model to Navier-Stokes equation,numerical simulation of turbulent combustion requires huge amount of calculations,and the phy-sicochemical models causes the flow field to be extremely complicated,making the load balancing a bottleneck for large-scale pa-rallelization.We port and optimize the numerical simulation method of turbulent combustion on a powerful computing server,DGX-2.We design the threading method of flux calculation and use Roofline model to guide the optimization.In addition,we design an efficient communication method and propose a multi-GPU parallel method for turbulent combustion based on high-speed interconnection of DGX-2.The results show that the performance of a single V100 GPU is 8.1x higher than that on dual-socket Intel Xeon 6248 CPU node with 40 cores.And the multi-GPU version on DGX-2 with 16 V100 GPUs achieves 66.1x speedup,which is higher than the best performance on CPU cluster.

Key words: CUDA, DGX-2, Navier-Stokes equation, Turbulent combustion

CLC Number: 

  • TP311.1
[1]WU C.Study on applicability of turbulent combustion model in the numerical calculation of combustor[D].Shenyang:Shenyang Institute of Aeronautical Engineering,2009.
[2]MOIN P,MAHESH K.Direct numerical simulation:a tool in turbulence research[J].Annual Review of Fluid Mechanics,1998,30(1):539-578.
[3]PITSCH H.Large-eddy simulation of turbulent combustion[J].Annu. Rev. Fluid Mech.,2006,38:453-482.
[4]KRÜGER J,WESTERMANN R.Linear algebra operators for GPU implementation of numerical algorithms[M]//ACM SIGGRAPH 2005 Courses.2005:234-242.
[5]GOODNIGHT N,WOOLLEY C,LEWIN G,et al.A multigrid solver for boundary value problems using programmable grap-hics hardware[M]//ACM SIGGRAPH 2005 Courses.2005:193-203.
[6]AISSA M,VERSTRAETE T,VUIK C.Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes[J].Computers & Mathematics with Applications,2017,74(1):201-217.
[7]PHILLIPS E,ZHANG Y,DAVIS R,et al.Rapid aerodynamic performance prediction on a cluster of graphics processing units[C]//47th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition.2009:565.
[8]JACOBSEN D,THIBAULT J,SENOCAK I.An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters[C]//48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition.2010:522.
[9]BOLZ J,FARMER I,GRINSPUN E,et al.Sparse matrix sol- vers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics(TOG),2003,22(3):917-924.
[10]CORRIGAN A,CAMELLI F F,LÖHNER R,et al.Running unstructured grid-based CFD solvers on modern graphics hardware[J].International Journal for Numerical Methods in Fluids,2011,66(2):221-229.
[11]NGUYEN M T,CASTONGUAY P,LAURENDEAU E.GPU parallelization of multigrid RANS solver for three-dimensional aerodynamic simulations on multiblock grids[J].The Journal of Supercomputing,2019,75(5):2562-2583.
[12]OYARZUN G,CHALMOUKIS I A,LEFTHERIOTIS G A,et al.A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers[J].Applied Mathematical Modelling,2020,85:141-156.
[13]LI A,SONG S L,CHEN J,et al.Evaluating modern gpu interconnect:Pcie,nvlink,nv-sli,nvswitch and gpudirect[J].IEEE Transactions on Parallel and Distributed Systems,2019,31(1):94-110.
[14]WILLIAMS S,WATERMAN A,PATTERSON D.Roofline:an insightful visual performance model for multicore architectures[J].Communications of the ACM,2009,52(4):65-76.
[15]BUTCHER J C.On the implementation of implicit Runge-Kutta methods[J].BIT Numerical Mathematics,1976,16(3):237-240.
[16]ZHONG X.Additive semi-implicit Runge-Kutta methods for computing high-speed nonequilibrium reactive flows[J].Journal of Computational Physics,1996,128(1):19-31.
[17]THIBAULT J,SENOCAK I.CUDA implementation of a Na- vier-Stokes solver on multi-GPU desktop platforms for incompressible flows[C]//47th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Sxposition.2009:758.
[1] WANG Jin, LIU Jiang. GPU-based Parallel DILU Preconditioning Technique [J]. Computer Science, 2022, 49(6): 108-118.
[2] WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111.
[3] XU Xin-peng, HU Bin-xing. Fast Calculation Method of Aircraft Component Strength Check Based on ICCG [J]. Computer Science, 2020, 47(11A): 624-627.
[4] ZHENG Hong-bo, SHI Hao, DU Yi-cheng, ZHANG Mei-yu, QIN Xu-jia. Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination [J]. Computer Science, 2019, 46(5): 272-278.
[5] ZHU Chao, WU Su-ping. Parallel Harris Feature Point Detection Algorithm [J]. Computer Science, 2019, 46(11A): 289-293.
[6] ZHANG Jie, WEN Min-hua, Jame LIN, MENG De-long and LU Hao. Implementation and Optimization of Historical VaR on GPU [J]. Computer Science, 2018, 45(5): 291-294.
[7] ZHOU Yun, JIANG Fu. Improved Marching Cubes Based on CUDA [J]. Computer Science, 2018, 45(11A): 573-575.
[8] LIU Duan-yang, ZHENG Jiang-fan, SHEN Guo-jiang, LIU Zhi. Study on Parallel K-means Algorithm Based on CUDA [J]. Computer Science, 2018, 45(11): 292-297.
[9] WU Yu, YAN Guang-hui, WANG Ya-fei, MA Qing-qing, LIU Yu-xuan. Parallel CP Tensor Decomposition Algorithm Combining with GPU Technology [J]. Computer Science, 2018, 45(11): 298-303.
[10] XU Qi-hang, YOU An-qing, MA She and CUI Yun-jun. Study on Optimizations of Basic Image Processing Algorithm [J]. Computer Science, 2017, 44(Z6): 169-172.
[11] SHEN Hong and LI Xiao-guang. Research on Parallel Algorithm of Image Saliency Estimation [J]. Computer Science, 2017, 44(12): 266-273.
[12] LI Xiu-chang, DUAN Jin, ZHU Yong and XIAO Bo1. GMRES Algorithm to Solve Navier-Stokes Equation of Smoke Simulation [J]. Computer Science, 2016, 43(Z11): 190-192.
[13] WEI Bo-wen, LI Tao, LI Guang-yu, WANG Zhi-heng, HE Mu, SHI Yue-ling, LIU Lu-yao and ZHANG Rui. Applied Analysis of Image Accelerating Distortion Correction of OpenCL Technology on Heterogeneous Platform [J]. Computer Science, 2016, 43(Z11): 167-169.
[14] PAN Qian, ZHANG Yu-ping and CHEN Hai-yan. Implementation of Parallel K-Nearest Neighbor Join Algorithm Based on CUDA [J]. Computer Science, 2016, 43(10): 190-192.
[15] ZENG Xuan-jie, CHEN Qiang, TAN Hai-peng, NIU Si-jie and SUN Quan-sen. CUDA-based Acceleration Algorithm of Bilateral Filtering [J]. Computer Science, 2015, 42(Z6): 163-167.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!