计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 24-30.doi: 10.11896/jsjkx.181102176

• 计算机体系结构 • 上一篇    下一篇

基于十亿亿次国产超算系统的流体力学软件众核适应性研究

李芳1,李志辉2,徐金秀1,范昊1,褚学森3,李新亮4   

  1. (江南计算技术研究所 江苏 无锡214083)1;
    (国家计算流体力学实验室 北京100191)2;
    (中国船舶科学研究中心 江苏 无锡214081)3;
    (中国科学院力学研究所 北京100190)4
  • 收稿日期:2018-11-26 发布日期:2020-01-19
  • 通讯作者: 李志辉(zhli0097@x263.net)
  • 基金资助:
    载人航天工程技术课题(2018-14);国家自然科学基金重大研究计划(91530319);国家重点基础研究发展计划(2014CB744100)

Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System

LI Fang1,LI Zhi-hui2,XU Jin-xiu1,FAN Hao1,CHU Xue-sen3,LI Xin-liang4   

  1. (Jiangnan Institute of Computing Technology,Wuxi,Jiangsu 214083,China)1;
    (National Laboratory of Computational Fluid Dynamics,Beijing 100191,China)2;
    (China Ship Scientific Research Center,Wuxi,Jiangsu 214081,China)3;
    (Institute of Mechanics,Chinese Academy of Sciences,Beijing 100190,China)4
  • Received:2018-11-26 Published:2020-01-19
  • About author:LI Fang,born in 1980,Ph.D,associate researcher.Her main research interests include computational fuid dynamics and high-performance parallel computation and application;LI Zhi-hui,borin in 1968,Ph.D,professor,doctoral supervisor.His main research interests include computable modeling on nonlinear deforming and destroying mechanism of metal truss structure,numerical forecast of flight track and high-performance parallel computation and application.
  • Supported by:
    This work was supported by the Project of Manned Space Engineering Technology (2018-14),Major Research Plan of the National Natural Science Foundation of China (91530319) and National Basic Research Program of China (2014CB744100).

摘要: 国产众核处理器提供了两种移植难度相差较大的众核级并行编程语言。不同流体力学软件对众核架构适应性的不同,决定了它们在移植优化过程中适合于不同的编程语言。首先介绍了国产众核处理器的体系结构、编程模型和并行编程语言;然后分析了流体力学软件应用于国产众核处理器存在的挑战性问题,包括隐格式带来的数据相关性、大型稀疏矩阵线性代数方程组求解、多重网格方法和非结构网格等,这些问题限制了软件对众核架构的适应性。文中针对这些难题分别提出了创新的优化算法,并通过理论分析和实验得到了几种典型流体力学软件的众核适应性研究结论。实践证明,多数流体力学软件对国产众核处理器的适应性良好,能够采用OpenACC编译器自动移植,并扩展到百万核并行规模,能保持较高的并行效率。

关键词: 编程语言, 并行算法, 国产, 流体力学软件, 适应性, 众核架构

Abstract: Domestic many-core super computing system provides two program languages with different program difficulty.Adaptation to many-core architecture of CFD software decides which program language should be used.Firstly,this paper briefly introduced the many-core architecture,program model and program languages.And then challenges on the adaptation of CFD software were analyzed,including data relativity of implicit method,solving of big parse linear equations,many grid method and unstructured grids.For each challenge,corresponding countermeasure was provided too.At last,the paper provided the speedup ratio of some typical software of fluid dynamics based on theory analysis and experiments.Facts prove that most CFD softwares adapt well to domestic many-core architecture and can use simple program language to get better parallel ration on million cores.

Key words: Adaptation, Domestic, Many-core architecture, Parallel algorithm, Program language, Software of computational fluid dynamics

中图分类号: 

  • TP311
[1]ZHENG F,LI H L,LV H,et al.Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture[J].Journal of Computer Science and Technology,2015,30(1):145-162.
[2]FU H H,LIAO J F,YANG J Z,et al.The Sunway Taihulight supercomputer:system and applications[J].Science China Information Sciences,2016,59(7):72-91.
[3]YANG C,XUE W,FU H H,et al.10m-core scalable fully- implicit solver for nonhydrostatic atmospheric dynamics[C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2016:6-15.
[4]ZHANG J,ZHOU C B,WANG Y G,et al.Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer [C]∥International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2016:34-45.
[5]FU H H,XUE W,YANG C,et al.Redesigning CAM-SE for Peta-Scale Climate Modeling Performance on Sunway TaihuLight [C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2017:4-12.
[6]FU H H,LIAO J F,YANG J Z,et al.15-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight:Enabling Depiction of Realistic 10 Hz Scenarios[C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2017:102-117.
[7]QIAO F L,ZHAO W,YIN X Q,et al.A highly effective global surface wave numerical simulation with ultra-high resolution[C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2016:46-56.
[8]HOU C F,XU J,WANG P,et al.Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals[J].MOLECULAR SIMULATION,2012,38(1):8-15.
[9]HOU C F,XU J,WANG P,et al.Petascale molecular dynamics simulation of crystalline silicon on Tianhe-1A[J].International Journal of High Performance Computing Applications,2013,27(3):307-317.
[10]LI D,XU Z M,LI S,et al.A survey on information diffusion in online social networks [J].Chinese Journal of Computers,2014,37(1):189-206.
[11]LIN H,TANG X C,YU B W,et al.Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores[C]∥2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE Computer Society,2017.
[12]LIN J,XU Z G,NUKADA A.Optimizations of Two Compute-bound Scientific Kernels on SW26010 Many-core Processor[C]∥Proceedings of the 46th International Conference on Pa-rallel Processing.IEEE,2017.
[13]XU Z G,LIN J,MATSUOKA S.Benchmarking Sunway SW26010 Manycore Processor[C]∥Proceedings of The Seventh International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) (IPDPS workshop).Orlando,USA,2017.
[14]AN H.Pipelining Computation and Data Reuse Strategies for Scaling GROMACS on the Sunway Many-core Processor[C]∥18th International Conference on Algorithms and Architectures for Parallel Processing(ICA3PP-2018).IEEE,2018.
[15]YOU H T,ZHANG L B,MAO Z H.OpenACC2.0 VS OpenMP4.0 Comparation of Two Popular Programming Language Based on Compilation Instructions[J].High Performance Computing,2014,227:20-25.
[16]何沧平.OpenACC并行编程实战[M].北京:机械工业出版社,2016.
[17]LIAO J F.Redesigning CAM-SE for Peta-Scale Climate Mode- ling Performance on Sunway TaihuLight [D].Beijing:Tsinghua University,2017.
[18]AO Y L.Research on Key Optimizations of Sparse Matrix and Stencil Computation for the Domestic Large Many-core System[D].Beijing:University of Chinese Academy of Sciences,2017.
[19]NI H.Research on Heterogeneous parallel computing technology of CFD in unstructured grids[D].Wuxi:The 56th Institute of PLA,2018.
[20]LI Z Z.Research on parallel multi grid of unstructured grids
[D].Changsha:National University of Defense Technology,2012.
[21]MENG D L,WEN M H,WEI J W.Porting and Optimizing OpenFOAM on Sunway TaihuLight System.Computer Science,2017,10(44):64-70.
[22]LIN H.Extreme-scale graph analysis on heterogeneous architecture[D].Beijing:Tsinghua University,2017.
[23]XU J X,YOU H T.Application of Many-core Programming Language OpenACC in Solving of Boltzmann Equations[J].High Performance Computing,2016(2):7-12.
[24]LI Z H,ZHANG H X.Parallel Computing of Three Dimension Complex Gas Motion Flow[J].Journal of Aerodynamics,2010,28(1):7-16.
[1] 高健博, 张家硕, 李青山, 陈钟.
RegLang:一种面向监管的智能合约编程语言
RegLang:A Smart Contract Programming Language for Regulation
计算机科学, 2022, 49(6A): 462-468. https://doi.org/10.11896/jsjkx.210700016
[2] 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫.
基于国产众核架构的非结构网格分区块重构预处理算法研究
Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture
计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045
[3] 刘聃, 郭绍忠, 郝江伟, 许瑾晨.
基于SIMD扩展部件的长向量超越函数实现方法
Implementation of Transcendental Functions on Vectors Based on SIMD Extensions
计算机科学, 2021, 48(6): 26-33. https://doi.org/10.11896/jsjkx.200400007
[4] 高枫越, 王琰, 朱铁兰.
有适应力的分布式状态估计方法
Resilient Distributed State Estimation Algorithm
计算机科学, 2021, 48(5): 308-312. https://doi.org/10.11896/jsjkx.200300117
[5] 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇.
面向语音分离的深层转导式非负矩阵分解并行算法
Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation
计算机科学, 2020, 47(8): 49-55. https://doi.org/10.11896/jsjkx.190900202
[6] 胡浩, 沈莉, 周清雷, 巩令钦.
基于LLVM编译器的节点融合优化方法
Node Fusion Optimization Method Based on LLVM Compiler
计算机科学, 2020, 47(6A): 561-566. https://doi.org/10.11896/JsJkx.191100017
[7] 倪鸿, 刘鑫.
非结构网格下稀疏下三角方程求解器众核优化技术研究
Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids
计算机科学, 2019, 46(6A): 518-522.
[8] 史征, 徐明星.
用于票房收益预测的国产电影信息数据库
Database of Chinese Domestic Films for Fox-office Revenue Forecasting
计算机科学, 2019, 46(11A): 149-152.
[9] 朱超, 吴素萍.
并行Harris特征点检测算法
Parallel Harris Feature Point Detection Algorithm
计算机科学, 2019, 46(11A): 289-293.
[10] 朱江, 陈森.
基于NAWL-ILSTM的网络安全态势预测方法
Network Security Situation Prediction Method Based on NAWL-ILSTM
计算机科学, 2019, 46(10): 161-166. https://doi.org/10.11896/jsjkx.180901820
[11] 徐启泽, 韩文廷, 陈俊仕, 安虹.
众核平台上广度优先搜索算法的优化
Optimization of Breadth-first Search Algorithm Based on Many-core Platform
计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049
[12] 周杰,李文敬.
基于三层混合编程模型的Petri网并行算法研究
Research on Parallel Algorithm of Petri Net Based on Three-layer Mixed Programming Model
计算机科学, 2017, 44(Z11): 586-591. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.126
[13] 廖新考,王力生,刘晓建,许晓洁.
网络环境下的个性化信任模型PTM
Personalized Trust Model in Network Environment
计算机科学, 2017, 44(8): 100-106. https://doi.org/10.11896/j.issn.1002-137X.2017.08.019
[14] 唐兵,Laurent BOBELIN,贺海武.
基于MPI和OpenMP混合编程的非负矩阵分解并行算法
Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model
计算机科学, 2017, 44(3): 51-54. https://doi.org/10.11896/j.issn.1002-137X.2017.03.013
[15] 李元平,李华,赵俊岚.
有限状态机模型测试序列生成算法研究
Research about FSM Test Sequence Generation Algorithm
计算机科学, 2016, 43(Z11): 474-481. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.107
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!