Computer Science ›› 2023, Vol. 50 ›› Issue (11): 32-40.doi: 10.11896/jsjkx.230300123

• High Performance Computing • Previous Articles     Next Articles

Study on Cross-platform Heterogeneous Parallel Computing for Lattice Boltzmann Multi-phase Flow Simulations Based on SYCL

DING Yue, XU Chuanfu, QIU Haozhong, DAI Weixi, WANG Qingsong, LIN Yongzhen, WANG Zhenghua   

  1. College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
  • Received:2023-03-14 Revised:2023-06-12 Online:2023-11-15 Published:2023-11-06
  • About author:DING Yue,born in 1999,postgraduate.Her main research interests include parallel and high performance computing applications and so on.XU Chuanfu,born in 1980,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include parallel computing and applications and so on.

Abstract: Heterogeneous parallel architecture is an important technology trend in current high-performance computing.Since various heterogeneous platforms usually support different programming models,the development of cross-platform performance portable heterogeneous parallel application is difficult.SYCL is a single-source cross-platform parallel programming open standard based on C++ language.The current research on SYCL mainly focuses on the performance comparison with other parallel programming models,but there are few researches on the different parallel kernel implementations provided in SYCL and their performance optimization.To address this situation,the open source multi-phase flow simulation software openLBMflow is implemented based on the SYCL programming model for cross-platform heterogeneous parallel simulation.The performance optimization methods of SYCL parallel applications are systematically summarized by comparing the basic parallel version,the fine-grained tuned ND-range parallel version and many-to-one mapping computation to work-items method.The results show that on Intel Xeon Platinum 9242 CPU and NVIDIA Tesla V100 GPU,the basic parallel kernel achieves a speedup of 2.91 on CPU without additional tuning compared to the optimized OpenMP parallel implementation,indicating the out-of-the-box performance advantage of SYCL.Using the basic parallel version as a baseline,the ND-range parallel version achieves up to 1.45x speedup on the CPU and 2.23x speedup on the GPU respectively by changing the work-group size and shape.By changing and optimizing the number and shape of lattices processed per work-item,the many-to-one mapping computation to work-items method achieves up to 1.57x speedup on the CPU and 1.34x speedup on the GPU respectively compared to the basic parallel version.The results show that SYCL parallel applications are more suitable for many-to-one mapping computation to work-items method on the CPU and ND-range parallel kernels on the GPU to improve performance.

Key words: SYCL, Lattice Boltzmann method, Multi-phase flows imulation, Heterogeneous parallel computing, Cross-platform parallel programming model

CLC Number: 

  • TP391
[1]ANDERSON J D.Computational Fluid Dynamics:The Basicswith Applications[M].New York:McGraw-Hill,1995:1-30.
[2]SUCCI S,BENZI R,HIGUERA F.The Lattice BoltzmannEquation:A New Tool for Computational Fluid-Dynamics[J].Physica D,1991,47(1/2):219-230.
[3]MILIANI S,MONTESSORI A,ROCCA M L,et al.Dam-Break Modeling:LBM as the Way towards Fully 3D,Large-Scale Applications[J].Journal of hydraulic engineering,2021,147(5):1-17.
[4]SARITHA G,BANERJEE R.Development and Application of a High Density Ratio Pseudopotential Based Two-phase LBM Solver to Study Cavitating Bubble Dynamics in Pressure Driven Channel Flow at Low Reynolds Number[J].European Journal of Mechanics B:Fluids,2019,75:83-96.
[5]BUDINSKI L.Application of the LBM with Adaptive Grid onWater Hammer Simulation[J].Journal of Hydroinformatics,2016,18(4):687-701.
[6]ZHIS.Impact of Mesh Partitioning Methods in CFD for LargeScale Parallel Computing[J].Computers & Fluids,2014,103:1-5.
[7]LEE S,GOUNLEY J,RANDLES A,et al.Performance Porta-bility Study for Massively Parallel Computational FluidDyna-mics Application on Scalable Heterogeneous Architectures[J].Journal of Parallel and Distributed Computing,2019,129:1-13.
[8]CUDA Official[OL].[2023-02-01].https://developer.nvidia.com/cuda-toolkit.
[9]OpenCL Official[OL].[2023-02-01].https://opencl.org/.
[10]TIAN W,SEVILLA T A,ZUO W.A Systematic Evaluation of Accelerating Indoor Airflow Simulations using Cross-Platform Parallel Computing[J].Journal of Building Performance Simulation,2017,10(3):243-255.
[11]OpenACC Official[OL].[2023-02-01].https://www.openacc.org/.
[12]MATSUFURU H,AOKI S,AOYAMA T,et al.OpenCL vsOpenACC:Lessons from Development of Lattice QCD Simulation Code[J].Procedia Computer Science,2015,51:1313-1322.
[13]Kokkos Official[OL].[2023-02-01].https://kokkos.org/.
[14]RAJA Documentation [OL].[2023-02-01].https://raja.read-thedocs.io/.
[15]REGULY I Z,MUDALIGE G R.Productivity, Performance, and Portability for Computational Fluid Dynamics Applications[J].Computers & Fluids,2020,199:104425.
[16]MACIÀ S,MARTÍNEZ-FERRER P J,AYGUADÉ E,et al.Automated Generation of High-Performance Computational Fluid Dynamics Codes[J].arXiv:2204.12120v2,2022.
[17]OPS/OP2 Official[OL].[2023-02-01].https://op-dsl.github.io/.
[18]DEVITO Z,JOUBERT N,PALACIOS F,et al.Liszt:A Domain Specific Language for Building Portable Mesh-Based PDE Sol-vers[C]//Proceedings of the 2011 International Conference for High Performance Computing,Networking, Storage and Analysis.California:IEEE Computer Society,2011:1-12.
[19]MOHR D,STEFANOVIC D.Stella:A Python-Based Domain-Specific Language for Simulations[C]//Proceedings of the 31st Annual ACM Symposium on Applied Computing.New York:Association for Computing Machinery,2016:1952-1959.
[20]Khronos Group Official[OL].[2023-02-01].https://www.khronos.org/.
[21]SYCL Official[OL].[2023-02-01].https://www.khronos.org/sycl/.
[22]oneAPI Official[OL].[2023-02-01].https://www.oneapi.io/.
[23]Codeplay Official[OL].[2023-02-01].https://codeplay.com/.
[24]FEICHTINGER C,HABICH J,KOESTLER H,et al.Perfor-mance Modeling and Analysis of Heterogeneous Lattice Boltzmann Simulations on CPU-GPU Clusters[J].Parallel Computing,2015,46(7):1-13.
[25]LI D,XU C,WANG Y,et al.Parallelizing andOptimizing Large-Scale 3D Multi-phase Flow Simulations on the Tianhe-2 Supercomputer[J].Concurrency & Computation Practice & Expe-rience,2016,28(5):1678-1692.
[26]VOLOKITIN V,BASHINOV A,EFIMENKO E,et al.Parallel Computing Technologies[M].Springer International Publi-shing,2021:288-300.
[27]MARINELLI E,APPUSWAMY R.XJoin:Portable,ParallelHash Join across Diverse XPU Architectures with oneAPI[C]//Proceedings of the 17th International Workshop on Data Ma-nagement on New Hardware.Virtual Event China:ACM,2021:1-5.
[28]openLBMflow Repository[OL].[2023-02-01].https://sourceforge.net/projects/lbmflow/.
[29]CHEN S,MARTINEZ D,MEI R.On Boundary Conditions in Lattice Boltzmann Methods[J].Physics of Fluids,1996,8(9):2527-2536.
[30]REINDERS J,ASHBAUGH B,BRODMAN J,et al.Data Parallel C++:Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL[M].Berkeley,CA:Apress,2021:125-126.
[31]Compiler Repository[OL].[2023-02-01].https://github.com/intel/llvm.
[32]XU C,WANG X,LI D,et al.Openmp4.5-Enabled Large-Scale Heterogeneous Lattice Boltzmann Multiphase Flow Simulations[C]//Proceedings of 2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking(ISPA/BDCloud/SocialCom/SustainCom).California:IEEE Computer Society,2019:1007-1016.
[33]WANG X.Parallel Collaborative Algorithm for Large-ScaleLBM Multiphase Flow on Heterogeneous Many-Core Platform[D].Changsha:National University of Defense Technology,2018.
[1] LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17.
[2] XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23.
[3] XU Lei, CHEN Rong-liang, CAI Xiao-chuan. Scalable Parallel Finite Volume Lattice Boltzmann Method Based on Unstructured Grid [J]. Computer Science, 2019, 46(8): 84-88.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!