计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 32-40.doi: 10.11896/jsjkx.230300123
丁越, 徐传福, 邱昊中, 戴未希, 汪青松, 林拥真, 王正华
DING Yue, XU Chuanfu, QIU Haozhong, DAI Weixi, WANG Qingsong, LIN Yongzhen, WANG Zhenghua
摘要: 异构并行体系结构是当前高性能计算的重要技术趋势。由于各种异构平台通常支持不同的编程模型,跨平台性能可移植异构并行应用开发非常困难。SYCL是一个基于C++语言的单源跨平台并行编程开放标准。目前针对SYCL的研究主要集中于与其他并行编程模型的性能比较,对SYCL中提供的不同并行内核实现及其性能优化研究得较少。针对这一现状,基于SYCL编程模型对开源多相流数值模拟软件openLBMmflow实现跨平台异构并行模拟,通过对比基础并行版本、细粒度调优的ND-range并行版本以及计算到工作项多对一映射方法,系统总结了SYCL并行应用的性能优化方法。测试结果表明,在Intel Xeon Platinum 9242 CPU以及NVIDIA Tesla V100 GPU上,相比优化后的OpenMP并行实现,在不需要额外调优的情况下,基础并行版本在CPU上获得了2.91的加速比,表明了SYCL的开箱即用性能具备一定优势。以基础并行版本为基准,ND-range并行版本通过改变工作组大小及形状,在CPU与GPU上分别取得了最高1.45以及2.23的加速比。通过优化计算到工作项的多对一映射改变每个工作项处理的格子数量以及形状,与基础并行版本相比,在CPU与GPU上分别取得了最高1.57以及1.34的加速比。结果表明,SYCL并行应用在CPU上更适合采用计算到工作项多对一映射的优化方法,在GPU上更适合采用ND-range并行内核,以提高性能。
中图分类号:
[1]ANDERSON J D.Computational Fluid Dynamics:The Basicswith Applications[M].New York:McGraw-Hill,1995:1-30. [2]SUCCI S,BENZI R,HIGUERA F.The Lattice BoltzmannEquation:A New Tool for Computational Fluid-Dynamics[J].Physica D,1991,47(1/2):219-230. [3]MILIANI S,MONTESSORI A,ROCCA M L,et al.Dam-Break Modeling:LBM as the Way towards Fully 3D,Large-Scale Applications[J].Journal of hydraulic engineering,2021,147(5):1-17. [4]SARITHA G,BANERJEE R.Development and Application of a High Density Ratio Pseudopotential Based Two-phase LBM Solver to Study Cavitating Bubble Dynamics in Pressure Driven Channel Flow at Low Reynolds Number[J].European Journal of Mechanics B:Fluids,2019,75:83-96. [5]BUDINSKI L.Application of the LBM with Adaptive Grid onWater Hammer Simulation[J].Journal of Hydroinformatics,2016,18(4):687-701. [6]ZHIS.Impact of Mesh Partitioning Methods in CFD for LargeScale Parallel Computing[J].Computers & Fluids,2014,103:1-5. [7]LEE S,GOUNLEY J,RANDLES A,et al.Performance Porta-bility Study for Massively Parallel Computational FluidDyna-mics Application on Scalable Heterogeneous Architectures[J].Journal of Parallel and Distributed Computing,2019,129:1-13. [8]CUDA Official[OL].[2023-02-01].https://developer.nvidia.com/cuda-toolkit. [9]OpenCL Official[OL].[2023-02-01].https://opencl.org/. [10]TIAN W,SEVILLA T A,ZUO W.A Systematic Evaluation of Accelerating Indoor Airflow Simulations using Cross-Platform Parallel Computing[J].Journal of Building Performance Simulation,2017,10(3):243-255. [11]OpenACC Official[OL].[2023-02-01].https://www.openacc.org/. [12]MATSUFURU H,AOKI S,AOYAMA T,et al.OpenCL vsOpenACC:Lessons from Development of Lattice QCD Simulation Code[J].Procedia Computer Science,2015,51:1313-1322. [13]Kokkos Official[OL].[2023-02-01].https://kokkos.org/. [14]RAJA Documentation [OL].[2023-02-01].https://raja.read-thedocs.io/. [15]REGULY I Z,MUDALIGE G R.Productivity, Performance, and Portability for Computational Fluid Dynamics Applications[J].Computers & Fluids,2020,199:104425. [16]MACIÀ S,MARTÍNEZ-FERRER P J,AYGUADÉ E,et al.Automated Generation of High-Performance Computational Fluid Dynamics Codes[J].arXiv:2204.12120v2,2022. [17]OPS/OP2 Official[OL].[2023-02-01].https://op-dsl.github.io/. [18]DEVITO Z,JOUBERT N,PALACIOS F,et al.Liszt:A Domain Specific Language for Building Portable Mesh-Based PDE Sol-vers[C]//Proceedings of the 2011 International Conference for High Performance Computing,Networking, Storage and Analysis.California:IEEE Computer Society,2011:1-12. [19]MOHR D,STEFANOVIC D.Stella:A Python-Based Domain-Specific Language for Simulations[C]//Proceedings of the 31st Annual ACM Symposium on Applied Computing.New York:Association for Computing Machinery,2016:1952-1959. [20]Khronos Group Official[OL].[2023-02-01].https://www.khronos.org/. [21]SYCL Official[OL].[2023-02-01].https://www.khronos.org/sycl/. [22]oneAPI Official[OL].[2023-02-01].https://www.oneapi.io/. [23]Codeplay Official[OL].[2023-02-01].https://codeplay.com/. [24]FEICHTINGER C,HABICH J,KOESTLER H,et al.Perfor-mance Modeling and Analysis of Heterogeneous Lattice Boltzmann Simulations on CPU-GPU Clusters[J].Parallel Computing,2015,46(7):1-13. [25]LI D,XU C,WANG Y,et al.Parallelizing andOptimizing Large-Scale 3D Multi-phase Flow Simulations on the Tianhe-2 Supercomputer[J].Concurrency & Computation Practice & Expe-rience,2016,28(5):1678-1692. [26]VOLOKITIN V,BASHINOV A,EFIMENKO E,et al.Parallel Computing Technologies[M].Springer International Publi-shing,2021:288-300. [27]MARINELLI E,APPUSWAMY R.XJoin:Portable,ParallelHash Join across Diverse XPU Architectures with oneAPI[C]//Proceedings of the 17th International Workshop on Data Ma-nagement on New Hardware.Virtual Event China:ACM,2021:1-5. [28]openLBMflow Repository[OL].[2023-02-01].https://sourceforge.net/projects/lbmflow/. [29]CHEN S,MARTINEZ D,MEI R.On Boundary Conditions in Lattice Boltzmann Methods[J].Physics of Fluids,1996,8(9):2527-2536. [30]REINDERS J,ASHBAUGH B,BRODMAN J,et al.Data Parallel C++:Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL[M].Berkeley,CA:Apress,2021:125-126. [31]Compiler Repository[OL].[2023-02-01].https://github.com/intel/llvm. [32]XU C,WANG X,LI D,et al.Openmp4.5-Enabled Large-Scale Heterogeneous Lattice Boltzmann Multiphase Flow Simulations[C]//Proceedings of 2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking(ISPA/BDCloud/SocialCom/SustainCom).California:IEEE Computer Society,2019:1007-1016. [33]WANG X.Parallel Collaborative Algorithm for Large-ScaleLBM Multiphase Flow on Heterogeneous Many-Core Platform[D].Changsha:National University of Defense Technology,2018. |
|