计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 32-40.doi: 10.11896/jsjkx.230300123

• 高性能计算 • 上一篇    下一篇

基于SYCL的多相流LBM模拟跨平台异构并行计算研究

丁越, 徐传福, 邱昊中, 戴未希, 汪青松, 林拥真, 王正华   

  1. 国防科技大学计算机学院 长沙 410073
  • 收稿日期:2023-03-14 修回日期:2023-06-12 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 徐传福(xuchuanfu@nudt.edu.cn)
  • 作者简介:(dingyue@nudt.edu.cn)

Study on Cross-platform Heterogeneous Parallel Computing for Lattice Boltzmann Multi-phase Flow Simulations Based on SYCL

DING Yue, XU Chuanfu, QIU Haozhong, DAI Weixi, WANG Qingsong, LIN Yongzhen, WANG Zhenghua   

  1. College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
  • Received:2023-03-14 Revised:2023-06-12 Online:2023-11-15 Published:2023-11-06
  • About author:DING Yue,born in 1999,postgraduate.Her main research interests include parallel and high performance computing applications and so on.XU Chuanfu,born in 1980,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include parallel computing and applications and so on.

摘要: 异构并行体系结构是当前高性能计算的重要技术趋势。由于各种异构平台通常支持不同的编程模型,跨平台性能可移植异构并行应用开发非常困难。SYCL是一个基于C++语言的单源跨平台并行编程开放标准。目前针对SYCL的研究主要集中于与其他并行编程模型的性能比较,对SYCL中提供的不同并行内核实现及其性能优化研究得较少。针对这一现状,基于SYCL编程模型对开源多相流数值模拟软件openLBMmflow实现跨平台异构并行模拟,通过对比基础并行版本、细粒度调优的ND-range并行版本以及计算到工作项多对一映射方法,系统总结了SYCL并行应用的性能优化方法。测试结果表明,在Intel Xeon Platinum 9242 CPU以及NVIDIA Tesla V100 GPU上,相比优化后的OpenMP并行实现,在不需要额外调优的情况下,基础并行版本在CPU上获得了2.91的加速比,表明了SYCL的开箱即用性能具备一定优势。以基础并行版本为基准,ND-range并行版本通过改变工作组大小及形状,在CPU与GPU上分别取得了最高1.45以及2.23的加速比。通过优化计算到工作项的多对一映射改变每个工作项处理的格子数量以及形状,与基础并行版本相比,在CPU与GPU上分别取得了最高1.57以及1.34的加速比。结果表明,SYCL并行应用在CPU上更适合采用计算到工作项多对一映射的优化方法,在GPU上更适合采用ND-range并行内核,以提高性能。

关键词: SYCL, 格子玻尔兹曼方法, 多相流模拟, 异构并行计算, 跨平台并行编程模型

Abstract: Heterogeneous parallel architecture is an important technology trend in current high-performance computing.Since various heterogeneous platforms usually support different programming models,the development of cross-platform performance portable heterogeneous parallel application is difficult.SYCL is a single-source cross-platform parallel programming open standard based on C++ language.The current research on SYCL mainly focuses on the performance comparison with other parallel programming models,but there are few researches on the different parallel kernel implementations provided in SYCL and their performance optimization.To address this situation,the open source multi-phase flow simulation software openLBMflow is implemented based on the SYCL programming model for cross-platform heterogeneous parallel simulation.The performance optimization methods of SYCL parallel applications are systematically summarized by comparing the basic parallel version,the fine-grained tuned ND-range parallel version and many-to-one mapping computation to work-items method.The results show that on Intel Xeon Platinum 9242 CPU and NVIDIA Tesla V100 GPU,the basic parallel kernel achieves a speedup of 2.91 on CPU without additional tuning compared to the optimized OpenMP parallel implementation,indicating the out-of-the-box performance advantage of SYCL.Using the basic parallel version as a baseline,the ND-range parallel version achieves up to 1.45x speedup on the CPU and 2.23x speedup on the GPU respectively by changing the work-group size and shape.By changing and optimizing the number and shape of lattices processed per work-item,the many-to-one mapping computation to work-items method achieves up to 1.57x speedup on the CPU and 1.34x speedup on the GPU respectively compared to the basic parallel version.The results show that SYCL parallel applications are more suitable for many-to-one mapping computation to work-items method on the CPU and ND-range parallel kernels on the GPU to improve performance.

Key words: SYCL, Lattice Boltzmann method, Multi-phase flows imulation, Heterogeneous parallel computing, Cross-platform parallel programming model

中图分类号: 

  • TP391
[1]ANDERSON J D.Computational Fluid Dynamics:The Basicswith Applications[M].New York:McGraw-Hill,1995:1-30.
[2]SUCCI S,BENZI R,HIGUERA F.The Lattice BoltzmannEquation:A New Tool for Computational Fluid-Dynamics[J].Physica D,1991,47(1/2):219-230.
[3]MILIANI S,MONTESSORI A,ROCCA M L,et al.Dam-Break Modeling:LBM as the Way towards Fully 3D,Large-Scale Applications[J].Journal of hydraulic engineering,2021,147(5):1-17.
[4]SARITHA G,BANERJEE R.Development and Application of a High Density Ratio Pseudopotential Based Two-phase LBM Solver to Study Cavitating Bubble Dynamics in Pressure Driven Channel Flow at Low Reynolds Number[J].European Journal of Mechanics B:Fluids,2019,75:83-96.
[5]BUDINSKI L.Application of the LBM with Adaptive Grid onWater Hammer Simulation[J].Journal of Hydroinformatics,2016,18(4):687-701.
[6]ZHIS.Impact of Mesh Partitioning Methods in CFD for LargeScale Parallel Computing[J].Computers & Fluids,2014,103:1-5.
[7]LEE S,GOUNLEY J,RANDLES A,et al.Performance Porta-bility Study for Massively Parallel Computational FluidDyna-mics Application on Scalable Heterogeneous Architectures[J].Journal of Parallel and Distributed Computing,2019,129:1-13.
[8]CUDA Official[OL].[2023-02-01].https://developer.nvidia.com/cuda-toolkit.
[9]OpenCL Official[OL].[2023-02-01].https://opencl.org/.
[10]TIAN W,SEVILLA T A,ZUO W.A Systematic Evaluation of Accelerating Indoor Airflow Simulations using Cross-Platform Parallel Computing[J].Journal of Building Performance Simulation,2017,10(3):243-255.
[11]OpenACC Official[OL].[2023-02-01].https://www.openacc.org/.
[12]MATSUFURU H,AOKI S,AOYAMA T,et al.OpenCL vsOpenACC:Lessons from Development of Lattice QCD Simulation Code[J].Procedia Computer Science,2015,51:1313-1322.
[13]Kokkos Official[OL].[2023-02-01].https://kokkos.org/.
[14]RAJA Documentation [OL].[2023-02-01].https://raja.read-thedocs.io/.
[15]REGULY I Z,MUDALIGE G R.Productivity, Performance, and Portability for Computational Fluid Dynamics Applications[J].Computers & Fluids,2020,199:104425.
[16]MACIÀ S,MARTÍNEZ-FERRER P J,AYGUADÉ E,et al.Automated Generation of High-Performance Computational Fluid Dynamics Codes[J].arXiv:2204.12120v2,2022.
[17]OPS/OP2 Official[OL].[2023-02-01].https://op-dsl.github.io/.
[18]DEVITO Z,JOUBERT N,PALACIOS F,et al.Liszt:A Domain Specific Language for Building Portable Mesh-Based PDE Sol-vers[C]//Proceedings of the 2011 International Conference for High Performance Computing,Networking, Storage and Analysis.California:IEEE Computer Society,2011:1-12.
[19]MOHR D,STEFANOVIC D.Stella:A Python-Based Domain-Specific Language for Simulations[C]//Proceedings of the 31st Annual ACM Symposium on Applied Computing.New York:Association for Computing Machinery,2016:1952-1959.
[20]Khronos Group Official[OL].[2023-02-01].https://www.khronos.org/.
[21]SYCL Official[OL].[2023-02-01].https://www.khronos.org/sycl/.
[22]oneAPI Official[OL].[2023-02-01].https://www.oneapi.io/.
[23]Codeplay Official[OL].[2023-02-01].https://codeplay.com/.
[24]FEICHTINGER C,HABICH J,KOESTLER H,et al.Perfor-mance Modeling and Analysis of Heterogeneous Lattice Boltzmann Simulations on CPU-GPU Clusters[J].Parallel Computing,2015,46(7):1-13.
[25]LI D,XU C,WANG Y,et al.Parallelizing andOptimizing Large-Scale 3D Multi-phase Flow Simulations on the Tianhe-2 Supercomputer[J].Concurrency & Computation Practice & Expe-rience,2016,28(5):1678-1692.
[26]VOLOKITIN V,BASHINOV A,EFIMENKO E,et al.Parallel Computing Technologies[M].Springer International Publi-shing,2021:288-300.
[27]MARINELLI E,APPUSWAMY R.XJoin:Portable,ParallelHash Join across Diverse XPU Architectures with oneAPI[C]//Proceedings of the 17th International Workshop on Data Ma-nagement on New Hardware.Virtual Event China:ACM,2021:1-5.
[28]openLBMflow Repository[OL].[2023-02-01].https://sourceforge.net/projects/lbmflow/.
[29]CHEN S,MARTINEZ D,MEI R.On Boundary Conditions in Lattice Boltzmann Methods[J].Physics of Fluids,1996,8(9):2527-2536.
[30]REINDERS J,ASHBAUGH B,BRODMAN J,et al.Data Parallel C++:Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL[M].Berkeley,CA:Apress,2021:125-126.
[31]Compiler Repository[OL].[2023-02-01].https://github.com/intel/llvm.
[32]XU C,WANG X,LI D,et al.Openmp4.5-Enabled Large-Scale Heterogeneous Lattice Boltzmann Multiphase Flow Simulations[C]//Proceedings of 2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking(ISPA/BDCloud/SocialCom/SustainCom).California:IEEE Computer Society,2019:1007-1016.
[33]WANG X.Parallel Collaborative Algorithm for Large-ScaleLBM Multiphase Flow on Heterogeneous Many-Core Platform[D].Changsha:National University of Defense Technology,2018.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!