计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 87-95.doi: 10.11896/jsjkx.231100016

• 高性能计算 • 上一篇    下一篇

基于领域分析的结构线性静力软件串并行一致化方法

唐德泓1,2, 杨浩1,2, 文龙飞1,2, 徐正秋3   

  1. 1 中物院高性能数值模拟软件中心 北京 100088
    2 北京应用物理与计算数学研究所 北京 100088
    3 四川中锐信息技术有限公司 成都 610041
  • 收稿日期:2023-10-31 修回日期:2024-04-10 出版日期:2024-09-15 发布日期:2024-09-10
  • 通讯作者: 杨浩(yang_hao@iapcm.ac.cn)
  • 作者简介:(tang_dehong@iapcm.ac.cn)

Domain Analysis Based Approach to Obtain Identical Results on Varying Number of Processors for Structural Linear Static Software

TANG Dehong1,2, YANG Hao1,2, WEN Longfei1,2, XU Zhengqiu3   

  1. 1 CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China
    2 Institute of Applied Physics and Computational Mathematics,Beijing 100088,China
    3 Sichuan Zhongrui Information Technology Co.,Ltd,Chengdu 610041,China
  • Received:2023-10-31 Revised:2024-04-10 Online:2024-09-15 Published:2024-09-10
  • About author:TANG Dehong,born in 1988,Ph.D,associate researcher.His main research interests include CAE,parallel computing framework and HPC.
    YANG Hao,born in 1990,Ph.D,assistant professor.His main research intere-sts include CAE,parallel computing framework and HPC.

摘要: 并行CAE软件的计算结果串并行一致性是其计算结果可信的必要条件。然而,软件研发时常引入串并行不一致缺陷,其形式众多,现象相互耦合,散布于海量代码中,成为实现CAE软件串并行一致性的挑战。文中以结构线性静力软件的串并行一致性需求为切入点,针对现有的“专家知识法”与“缺陷定位法”应用于CAE软件串并行一致化时存在的粒度粗、准度差、成本高和缺乏系统性问题,引入领域分析方法,并与专家知识和数据流状态比对结合,提出了一种适用于结构线性静力的串并行一致化方法,实现了结构线性静力软件串并行不一致缺陷的细粒度、高准度与低成本系统性识别与修复。基于前述方法形成相关工具,并将方法与工具应用于SSTA的串并行一致化,识别并修复其中8处串并行不一致缺陷,使其通过90余真实模型的串并行一致考核,并实现串并行结果严格一致;同时,该方法与工具还将串并行不一致缺陷定位耗时由平均大于两人天降低至数人时。

关键词: 串并行一致化, 结构线性静力软件, 串行代码并行化, 领域分析, 缺陷定位

Abstract: Obtain identical results on varying number of processors is a prerequisite for the reliability of parallel CAE software.However,during the development of parallel CAE software,various types of faults that can cause non-identical results are often introduced.These faults couple with each other to produce the final non-identical results,and are concealed within various levels of the CAE software that incorporates numerous lines of code.This poses the challenge to obtain identical simulation results on varying number of processors for parallel CAE software.When applied to parallel CAE software,traditional approaches such as expert knowledge and fault-location are often characterized by coarse granularity,poor accuracy,high cost or lack of systematism.To address this issue,we propose an approach that combines domain analysis with expert knowledge and dataflow state comparison to obtain identical results on varying number of processors for structural linear static software.This approach can be used to identify and fix faults that cause non-identical results in structural linear static software with high accuracy and low cost.Based on the above approach,we develop a corresponding tool and apply it in conjunction with the approach to identify and fix eight faults in SSTA,a structural linear static software.This endeavor helps SSTA to obtain strictly identical results on varying number of processors in more than ninety real simulation models,and significantly reduces the time required to identify a fault from more than two days to several hours.

Key words: Obtain identical results on varying number of processors, Structural linear static software, Parallelization of serial codes, Domain analysis, Fault-location

中图分类号: 

  • TP311
[1]LIU X.Research on CFD Parallel Computing Technology and Massively Parallel Computing Platform for Chemical Non-equilibrium Flow Problems[D].Zhengzhou:PLA Information Engineering University,2006.
[2]POOYAN D,RICCARDO R,MARISA G,et al.Migrations of a Generic Multi-Physics Framework to HPC Environments [J].Computers & Fluids,2013,80(2013):301-309.
[3]FU Y G,WANG X,FENG J C.Parallel Refactor of KYLIN-IIbase on JCOGIN Framework[R].Beijing:CAEP Software Center for High Performance Numerical Simulation,2019.
[4]NATHALIE M.Industrial Code Modernization of High Per-formance Computing Simulations on Modern Supercomputer Architectures[D].Paris:Paris-Saclay University,2019.
[5]JIANG S L,XU K L,YU Y,et al.Migration of Application Software to JAUMIN/JASMIN Framework[R].Beijing:Technical Report of CAEP Software Center for High Performance Nume-rical Simulation,2019.
[6]IRIS R B,AMON S,TONY C,et al.Domain Engineering[M].Berlin:Springer,2013.
[7]WEN L F,WANG J T,ZHANG A Q,et al.Design of Structural Mechanics Solver Library[R].Beijing:CAEP Software Center for High Performance Numerical Simulation,2020.
[8]WONG W E,GAO R Z,LI Y H,et al.A Survey on Software Fault Localization[J].IEEE Transactions on Software Engineering,2016,42(8):707-740.
[9]PRIYA P,MIRAL P.Software Fault Localization:A Survey[J].International Journal of Computer Applications,2016,154(9):6-13.
[10]JOSEP S.A Survey on Algorithmic Debugging Strategies[J].Advances in Engineering Software,2011,42(11):976-991.
[11]NICHOLAS G,MALVIN K,THOMAS B.Obtaining identical results on varying numbers of processors in domain decomposed particle monte carlo simulations[R].UCRL-PROC-210823,Lawrence Livermore National Laboratory,2005.
[12]LI G,ZHANG B Y,DENG L,et al.Application of Seudo-Random Number to Obtain Identical Results on Varying Numbers of Processors in Domain Decomposed Particle Monte Carlo Simulations[J].Chinese Journal of Computational Physics,2017,34(1):67-72.
[13]LIU Q K,MO Z Y,ZHANG A Q,et al.A Programming Framework for Large-Scale Numerical Simulations on Unstructured Meshes[J].CCF Transactions on High Performance Computing,2019,1:35-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!