计算机科学 ›› 2020, Vol. 47 ›› Issue (4): 13-17.doi: 10.11896/jsjkx.191000010

• 计算机体系结构 • 上一篇    下一篇

面向超大规模并行模拟的LBM计算流体力学软件

吕小敬1, 刘钊2, 褚学森3, 石树鹏1, 孟虹松1, 黄震春2   

  1. 1 国家超级计算无锡中心 江苏 无锡214072;
    2 清华大学计算机科学与技术系 北京100084;
    3 中国船舶科学研究中心 江苏 无锡214072
  • 收稿日期:2019-09-08 出版日期:2020-04-15 发布日期:2020-04-15
  • 通讯作者: 刘钊(liuz18@mails.tsinghua.edu.cn)
  • 基金资助:
    国家重点研发计划(2017YFB0203602)

Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations

LV Xiao-jing1, LIU Zhao2, CHU Xue-sen3, SHI Shu-peng1, MENG Hong-song1, HUANG Zhen-chun2   

  1. 1 National Supercomputing Center in Wuxi,Wuxi,Jiangsu 214072,China;
    2 Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;
    3 China Ship Scientific Research Center,Wuxi,Jiangsu 214072,China
  • Received:2019-09-08 Online:2020-04-15 Published:2020-04-15
  • Contact: LIU Zhao,born in 1986,Ph.D,is a member of China Computer Federation.His main research interests include high performance computing and computer architecture.
  • About author:LV Xiao-jing,born in 1989,postgradua-te,is a member of China Computer Fede-ration.Her main research interests include parallel algorithm and application.
  • Supported by:
    This work was supported by National Key R&D Program of China (2017YFB0203602).

摘要: 格子玻尔兹曼方法(Lattice Boltzmann Method,LBM)是一种基于介观模拟尺度的计算流体力学方法,已被广泛用于理论研究和工程领域。提高LBM计算流体软件的并行模拟能力,是高性能计算及应用研究中的一项重要内容。该研究基于“神威·太湖之光”超级计算系统,设计并实现了一套高效扩展的LBM计算流体力学软件。针对国产众核处理器SW26010的架构,文中设计了以下几种提高SWLBM方针速度和可扩展性的多级并行技术,包括面向19点stencil的数据复用、碰撞过程向量化、主从异步并行通信计算隐藏等。基于以上并行优化方案,文中测试了高达56000亿网格的数值模拟,SWLBM软件持续浮点计算性能达到4.7PFlops,软件模拟速度提高了172倍。相比百万核心10000*10000*5000网格风场模拟,SWLBM整机千万核心的并行效率可达87%。测试结果表明,SWLBM有能力为工业应用提供实用的大规模并行模拟解决方案。

关键词: SW26010, 并行优化, 多级并行, 格子玻尔兹曼方法

Abstract: Lattice Boltzmann Method (LBM) is a computational fluid dynamics method based on mesoscopic simulation scales and has been widely used in theoretical research and processing engineering problems.Improving the parallel simulation capability of LBM Computing Fluid software is an important study for high performance computing and applications.The research aims to design and implement a set of highly efficient extended LBM computational fluid dynamics software based on the “Sunway TaihuLight” supercomputing system.According to the architecture of domestic multi-core processor SW26010,several parallel optimization multi-level parallelism techniques to boost the simulation speed and improve the scalability of SWLBM are designed,including date reuse of 19-point stencil,vectorization of collision process and communication overlap computing.Based on these parallel optimization schemes,the numerical simulation with over 10million cores and up to 5.6trillion grids is tested and the SWLBM software can bring up to 172x speed up and achieve a sustained floating of 4.7 PFlops.Compared with the million-core 10000*10000*5000 grid wind filed simulation,the SWLBM machine has a core efficiency of 87%.Test results show that SWLBM has the ability to provide practical large-scale parallel simulation solutions for industrial applications.

Key words: Lattice Boltzmann method, Multi-level parallelism, Parallel optimization, SW26010

中图分类号: 

  • TP391
[1]GAGLIANO A,NOCERA F,PATANIA F,et al.Assessment of micro-wind turbines performance in the urban environments:an aided methodology through geographical information systems[J].International Journal of Energy and Environmental Engineering,2013,4(1):43.
[2]KRAUSE M J,GENGENBACH T,HEUVELINE V.Hybridparallel simulations of fluid flows in complex geometries:application to the human lungs[M]//Euro-Par 2010 Parallel Processing Workshops.Berlin:Springer,2011:209-216.
[3]GÖTZ J,IGLBERGER K,STÜRMER M,et al.Direct numerical simulation of particulate flows on 294912 processor cores[C]//2010 ACM/IEEE International Conference for High Performance Computing,Networking,Storage and Analysis.New Orleans,LA,USA:IEEE,2010.
[4]SCHORNBAUM F,RÜDE U.Massively parallel algorithms for the lattice boltzmann method on NonUniform grids[J].SIAM Journal on Scientific Computing,2016,38(2):C96-C126.
[5]FIETZ J,KRAUSE M J,SCHULZ C,et al.Optimized hybridparallel lattice boltzmann fluid flow simulations on complex geometries[M]//Euro-Par 2012 Parallel Processing.Berlin:Springer,2012:818-829.
[6]ONODERA N,AOKI T,SHIMOKAWABE T.Large-scale LES wind simulation using lattice Boltzmann method for a 10km× 10km area in metropolitan Tokyo[J].TSUBAME e-Science Journal Global Scientific Information and Computing Center,2003,9:1-8.
[7]BAILEY P,MYRE J,WALSH S D C,et al.Accelerating lattice boltzmann fluid flow simulations using graphics processors[C]//2009 International Conference on Parallel Processing.Vienna:IEEE,2009.
[8]CRIMI G,MANTOVANI F,PIVANTI M,et al.Early experience on porting and running a lattice boltzmann code on the xeon-phi Co-processor[J].Procedia Computer Science,2013,18:551-560.
[9]YANG C,ZHENG W M,XUE W,et al.A peta-scalable CPU-GPU algorithm for global atmospheric simulations[J].ACM SIGPLAN Notices,2013,48(8):1.
[1] 朱雨, 庞建民, 徐金龙, 陶小涵, 王军.
面向SW26010处理器的三维Stencil自适应分块参数算法
Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor
计算机科学, 2021, 48(6): 10-18. https://doi.org/10.11896/jsjkx.200700059
[2] 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵.
基于神威平台的Floyd并行算法的实现和优化
Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform
计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051
[3] 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀.
面向国产异构众核处理器SW26010的BFS优化方法
Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010
计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013
[4] 魏霖静, 宁璐璐, 郭斌, 侯振兴, 甘诗润.
基于混合蛙跳算法的K-mediods聚类挖掘与并行优化
K-mediods Cluster Mining and Parallel Optimization Based on Shuffled Frog Leaping Algorithm
计算机科学, 2020, 47(10): 126-129. https://doi.org/10.11896/jsjkx.190900113
[5] 徐传福,王曦,刘舒,陈世钊,林玉.
基于Python的大规模高性能LBM多相流模拟
Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python
计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
[6] 杨思燕,贺国旗,刘如意.
基于SIFT算法的大场景视频拼接算法及优化
Video Stitching Algorithm Based on SIFT and Its Optimization
计算机科学, 2019, 46(7): 286-291. https://doi.org/10.11896/j.issn.1002-137X.2019.07.044
[7] 倪鸿, 刘鑫.
非结构网格下稀疏下三角方程求解器众核优化技术研究
Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids
计算机科学, 2019, 46(6A): 518-522.
[8] 陶小涵, 庞建民, 高伟, 王琦, 姚金阳.
基于SW26010处理器的FT程序的性能优化
Performance Optimization of FT Program Based on SW26010 Processor
计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050
[9] 刘玉成, 理查德·丁, 张颖超.
一种BPNNs识别算法的医学检测泛实时性问题研究
Research on Pan-real-time Problem of Medical Detection Based on BPNNs Recognition Algorithm
计算机科学, 2018, 45(6): 301-307. https://doi.org/10.11896/j.issn.1002-137X.2018.06.053
[10] 姜文超,林穗,王多强,李东明,金海.
Calculix三级并行优化及其在天河二号超级计算机中的应用
Three-level Parallel Optimization and Application of Calculix in TH-2 Super-computing Environments
计算机科学, 2017, 44(3): 32-35. https://doi.org/10.11896/j.issn.1002-137X.2017.03.008
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!