Computer Science ›› 2024, Vol. 51 ›› Issue (4): 56-66.doi: 10.11896/jsjkx.231000124

• High Performance Computing • Previous Articles     Next Articles

Performance Optimization of Complex Stencil in Weather Forecast Model WRF

DI Jianqiang1,2, YUAN Liang1, ZHANG Yunquan1, ZHANG Sijia2   

  1. 1 High Performance Computer Research Center,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    2 School of Information Science and Engineering,Dalian Ocean University,Dalian,Liaoning 116023,China
  • Received:2023-10-18 Revised:2024-02-04 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(61972376,62072431,62032023) and Huawei(TC20220914048).

Abstract: The weather research and forecasting model(WRF) is a widely used mesoscale numerical weather forecasting system that plays an important role in the fields of atmospheric research and meteorological operational forecasting.Stencil computation is a common nested loop pattern in scientific and engineering applications.WRF performs a large number of complex stencil computation on spatial grids to solve numerical equations of atmospheric dynamics and thermodynamics.The stencils in WRF are featured by multi-dimensionality,multi-variables,particularity of physical model boundaries,and complexity of physical and dynamic processes.This study analyzes the typical stencil pattern in WRF,identifies and abstracts the concept of “intermediate variable”,and implements three optimization schemes,namely,intermediate variable computation merging,intermediate variable dimensio-nality reduction storage,and intermediate variables extraction.The optimization schemes effectively improve the data locality,increase data reuse and spatial reuse rates,and reduces redundant computing and memory access overhead.The results show that the WRF 4.2 typical hotspot functions achieve significant performance improvements on both Intel CPU and Hygon CPU,with the highest speedup ratios of 21.3% and 17.8% respectively.

Key words: WRF, Stencil computation, Intermediate variable, Optimization scheme, Data locality, Hotspot function, Performance improvement

CLC Number: 

  • TP319
[1]YUAN L,ZHANG Y,GUO P,et al.Tessellating Stencils[C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.2017:1-13.
[2]YUAN L,HUANG S,ZHANG Y,et al.Tessellating star Stencils[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10.
[3]YUAN L,CAO H,ZHANG Y,et al.Temporal vectorization for Stencils[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-13.
[4]LI K,YUAN L,ZHANG Y,et al.An efficient vectorizationscheme for Stencil computation[C]//2022 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2022:650-660.
[5]LI K,YUAN L,ZHANG Y,et al.Reducing redundancy in data organization and arithmetic calculation for Stencil computations[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-15.
[6]YUAN L,DING C,SMITH W,et al.A relational theory of locality[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,16(3):1-26.
[7]YUAN L,DING C,DENNING P,et al.A measurement theory of locality[J].arXiv:1802.01254,2018.
[8]YUAN L,XIAO J.SI on parallel system and algorithm optimization[J].CCF Transactions on High Performance Computing,2023,5(3):229-230.
[9]HUANG M,MIELIKAINEN J,HUANG B,et al.Developmentof efficient GPU parallelization of WRF Yonsei University pla-netary boundary layer scheme[J].Geoscientific Model Development,2015,8(9):2977-2990.
[10]MIELIKAINEN J,HUANG B,HUANG A.Optimizing weather and researchforecast(WRF) Thompson cloud microphysics on Intel Many Integrated Core(MIC)[C]//Satellite Data Compression,Communications,and Processing X.SPIE,2014,9124:182-193.
[11]WANG S D.WRF mode transplantation and optimization based on “Shenwei 26010” heterogeneous many-core processor[D].Jinan:Shandong University,2020.
[12]MALAKAR P,SAXENA V,GEORGE T,et al.Performanceevaluation and optimization of nested high resolution weather simulations[C]//Euro-Par 2012 Parallel Processing:18th International Conference.Berlin Heidelberg:Springer,2012:805-817.
[13]HASHMI J M,CHU C H,CHAKRABORTY S,et al.FAL-CON-X:Zero-copy MPI derived datatype processing on modern CPU and GPU architectures[J].Journal of Parallel and Distri-buted Computing,2020,144:1-13.
[14]HUANG J,WANG W,WANG Y,et al.Performance Evaluation and Optimization of the Weather Research and Forecasting(WRF) Model Based on Kunpeng 920[J].Applied Sciences,2023,13(17):9800.
[15]SOBHANI N,DEL VENTO D,GILL D.Performance analysisand optimization of the Weather Research and Forecasting Mo-del(WRF) advection schemes[C]//Third Symp.on High Performance Computing for Weather,Water,and Climate.Seattle,WA,Amer.Meteor.Soc.2017,3.
[16]MIELIKAINEN J,HUANG B,HUANG A H L.Optimizingzonal advection of the Advanced Research WRF(ARW) dyna-mics for Intel MIC[C]//High-Performance Computing in Remote Sensing IV.SPIE,2014,9247:162-172.
[17]MIELIKAINEN J,HUANG B,HUANG A H L.Optimizingmeridional advection of the Advanced Research WRF(ARW) dynamics for Intel Xeon Phi coprocessor[C]//Satellite Data Compression,Communications,and Processing XI.SPIE,2015,9501:246-258.
[18]AO Y,YANG C,WANG X,et al.26 pflops Stencil computations for atmospheric modeling on sunway taihulight[C]//2017 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2017:535-544.
[19]XU K,SONG Z,CHAN Y,et al.Refactoring and optimizingWRF model on sunway taihulight[C]//Proceedings of the 48th International Conference on Parallel Processing.2019:1-10.
[20]LI M,LIU Y,YANG H,et al.Automatic code generation and optimization of large-scale stencil computation on many-core processors[C]//Proceedings of the 50th International Confe-rence on Parallel Processing.2021:1-12.
[21]ZHANG K,SU H,DOU Y.Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures[J].The Journal of Supercomputing,2021,77(11):13584-13600.
[1] JI Ying-rui, YUAN Liang, ZHANG Yun-quan. Parallelization and Locality Optimization for Red-Black Gauss-Seidel Stencil [J]. Computer Science, 2022, 49(5): 363-370.
[2] QIAN Dong-wei, CUI Yang-guang, WEI Tong-quan. Secondary Modeling of Pollutant Concentration Prediction Based on Deep Neural Networks with Federal Learning [J]. Computer Science, 2022, 49(11A): 211200084-5.
[3] BAO Yi-kun, ZHANG Peng, XU Xiao-wen, MO Ze-yao. Prediction of Optimal Loop Tiling Size for stencil Computation Based on Neural Network Model [J]. Computer Science, 2022, 49(10): 18-26.
[4] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[5] WANG Yue-feng and WANG Xi-bo. Design of Local Scheduling Algorithm for Integrated Preemptive Scheduling Policy in Hadoop Cluster Environment [J]. Computer Science, 2017, 44(Z6): 567-570.
[6] TANG Hong-mei and ZHENG Gang. Design and Optimization on Virtual Desktop Infrastructure Based on KVM [J]. Computer Science, 2017, 44(Z6): 560-562.
[7] LI Hang-chen, QIN Xiao-lin and SHEN Yao. Load Balancing Strategy on MapReduce with Locality-aware [J]. Computer Science, 2015, 42(10): 50-56.
[8] CHU Ya,MA Ting-huai and ZHAO Li-cheng. Cloud Computing Resource Scheduling:Policy and Algorithm [J]. Computer Science, 2013, 40(11): 8-13.
[9] GU Yu ,ZHOU Liang , DING Qiu-lin. Research of Three-Queue Scheduling Algorithms Based on Priority [J]. Computer Science, 2011, 38(Z10): 253-256.
[10] . [J]. Computer Science, 2009, 36(1): 16-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!