计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 56-66.doi: 10.11896/jsjkx.231000124
邸健强1,2, 袁良1, 张云泉1, 张思佳2
DI Jianqiang1,2, YUAN Liang1, ZHANG Yunquan1, ZHANG Sijia2
摘要: 天气研究与预报模式(WRF)是一种应用广泛的中尺度数值天气预报系统,在大气研究和业务预报领域发挥着重要作用。Stencil计算是科学工程应用中一类常见的嵌套循环计算模式,WRF中对大气动力学和热力学方程的数值求解引出了大量空间网格上的复杂Stencil计算,存在多维度、多变量、物理模型边界特殊性、物理和动力学过程的复杂性等模型特征。文中深入剖析了WRF中典型的Stencil计算模式,识别抽象出典型Stencil循环中存在的“中间变量”概念,围绕其设计实现了3种优化方案,即中间变量计算合并、中间变量降维存储以及中间变量提取,有效提高了数据局部性,改善了数据重用率和空间复用率,降低了冗余计算和访存开销。结果表明,经优化方案重构的WRF 4.2典型Stencil热点函数在Intel CPU和Hygon CPU上均可获得良好的性能加速,最高加速比达21.3%和17.8%。
中图分类号:
[1]YUAN L,ZHANG Y,GUO P,et al.Tessellating Stencils[C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.2017:1-13. [2]YUAN L,HUANG S,ZHANG Y,et al.Tessellating star Stencils[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10. [3]YUAN L,CAO H,ZHANG Y,et al.Temporal vectorization for Stencils[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-13. [4]LI K,YUAN L,ZHANG Y,et al.An efficient vectorizationscheme for Stencil computation[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:650-660. [5]LI K,YUAN L,ZHANG Y,et al.Reducing redundancy in data organization and arithmetic calculation for Stencil computations[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-15. [6]YUAN L,DING C,SMITH W,et al.A relational theory of locality[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,16(3):1-26. [7]YUAN L,DING C,DENNING P,et al.A measurement theory of locality[J].arXiv:1802.01254,2018. [8]YUAN L,XIAO J.SI on parallel system and algorithm optimization[J].CCF Transactions on High Performance Computing,2023,5(3):229-230. [9]HUANG M,MIELIKAINEN J,HUANG B,et al.Developmentof efficient GPU parallelization of WRF Yonsei University pla-netary boundary layer scheme[J].Geoscientific Model Development,2015,8(9):2977-2990. [10]MIELIKAINEN J,HUANG B,HUANG A.Optimizing weather and researchforecast(WRF) Thompson cloud microphysics on Intel Many Integrated Core(MIC)[C]//Satellite Data Compression,Communications,and Processing X.SPIE,2014,9124:182-193. [11]WANG S D.WRF mode transplantation and optimization based on “Shenwei 26010” heterogeneous many-core processor[D].Jinan:Shandong University,2020. [12]MALAKAR P,SAXENA V,GEORGE T,et al.Performanceevaluation and optimization of nested high resolution weather simulations[C]//Euro-Par 2012 Parallel Processing:18th International Conference.Berlin Heidelberg:Springer,2012:805-817. [13]HASHMI J M,CHU C H,CHAKRABORTY S,et al.FAL-CON-X:Zero-copy MPI derived datatype processing on modern CPU and GPU architectures[J].Journal of Parallel and Distri-buted Computing,2020,144:1-13. [14]HUANG J,WANG W,WANG Y,et al.Performance Evaluation and Optimization of the Weather Research and Forecasting(WRF) Model Based on Kunpeng 920[J].Applied Sciences,2023,13(17):9800. [15]SOBHANI N,DEL VENTO D,GILL D.Performance analysisand optimization of the Weather Research and Forecasting Mo-del(WRF) advection schemes[C]//Third Symp.on High Performance Computing for Weather,Water,and Climate.Seattle,WA,Amer.Meteor.Soc.2017,3. [16]MIELIKAINEN J,HUANG B,HUANG A H L.Optimizingzonal advection of the Advanced Research WRF(ARW) dyna-mics for Intel MIC[C]//High-Performance Computing in Remote Sensing IV.SPIE,2014,9247:162-172. [17]MIELIKAINEN J,HUANG B,HUANG A H L.Optimizingmeridional advection of the Advanced Research WRF(ARW) dynamics for Intel Xeon Phi coprocessor[C]//Satellite Data Compression,Communications,and Processing XI.SPIE,2015,9501:246-258. [18]AO Y,YANG C,WANG X,et al.26 pflops Stencil computations for atmospheric modeling on sunway taihulight[C]//2017 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2017:535-544. [19]XU K,SONG Z,CHAN Y,et al.Refactoring and optimizingWRF model on sunway taihulight[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10. [20]LI M,LIU Y,YANG H,et al.Automatic code generation and optimization of large-scale stencil computation on many-core processors[C]//Proceedings of the 50th International Confe-rence on Parallel Processing.2021:1-12. [21]ZHANG K,SU H,DOU Y.Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures[J].The Journal of Supercomputing,2021,77(11):13584-13600. |
|