计算机科学 ›› 2013, Vol. 40 ›› Issue (3): 116-120.

• 2012多值逻辑专栏 • 上一篇    下一篇

一个结构网格并行CFD程序的单机性能优化

车永刚,张理论,王勇献,徐传福,刘巍,王正华,刘化勇   

  1. (国防科技大学计算机学院 长沙410073) (空气动力学国家重点实验室 绵阳 621000)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Uniprocessor Performance Tuning of a Structured Grid Based Parallel CFD Application

  • Online:2018-11-16 Published:2018-11-16

摘要: 从单机性能优化角度对一个高阶精度结构网格CFI)并行程序进行了优化。通过识别关键变量并对其进行 常量参数化优化,使编译器能够实现更高级别的针对性优化;根据程序数据结构特点及访问模式,设计了分级数据缓 存技术,使程序主要计算代码能够以更优的方式访问主要数据结构,提高了访存空间局部性;进行了各种循环变换,以 优化访存性能。在国家超算长沙中心“`Tianhe—lA',并行机上的测试结果表明,相对于采用Intel编译器最高优化级别 的版本,其对10。万网格点二维翼型算例,串行程序性能提高约22.2%-28.9%;对1. 12亿网格点三角翼算例,并行 程序性能提高约13.9%-20.2%。

关键词: CFD并行计算,单机性能优化,关键变量参数化,分级数据缓存

Abstract: This paper optimized the performance of a high order structure grid based parallel CFI)(Computational Fluid Dynamics) application from a view of uniprocessor optimization. Performance critical variables were identified and trans- formed into constant parameters to enable compiler to apply specific high level optimizations. Multi-level data buffering was applied for the application's main data structures based on their structure and access characteristics, enabling the main computation codes to access these data more efficiently. Some loop transformations were applied to optimize the application's memory access performance. Performance evaluation was carried out on "TianhclA" parallel computer in- stalled at national super computer center in Changsha. Compared to the original code compiled by Intel compiler with the highest optimization levcl,the optimized code improves the serial performance for about 22. 2%一28. 9 0 0 for an 100 million grid points 2D acrofoil test case, and improves the parallel performance for about 13. 9%一20. 2 0 0 for an 112 million grid points delta aerofoil test case.

Key words: Parallel CFD, Uniprocessor performance tuning, Kcy variable parameterization, Multi-level data buffering

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!