计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 10-18.doi: 10.11896/jsjkx.200700059
朱雨, 庞建民, 徐金龙, 陶小涵, 王军
ZHU Yu, PANG Jian-min, XU Jin-long, TAO Xiao-han, WANG Jun
摘要: Stencil计算是科学应用中的一类重要计算,而分块是提升Stencil计算数据局部性的关键技术。针对现有三维Stencil优化在SW26010处理器上缺少时间分块以及分块参数需手工调优的问题,引入时间分块,提出了面向SW26010处理器的三维Stencil自适应分块参数算法。通过建立性能分析模型,结合硬件计算能力及存储容量等限制因素,文中系统地分析了分块参数对模型性能的影响,判断性能瓶颈,指导分块参数的优化方向。基于性能分析模型,自适应分块参数算法可给出预测性能最优时的分块参数,有利于三维Stencil在SW26010处理器上的快速优化部署。选取了三维7点和三维27点Stencil算例进行实验。与未使用时间分块的三维Stencil优化相比,以上两个算例在自适应选择的分块参数下可以达到1.47和1.29的加速比,且实际最优分块参数与理论最佳分块参数一致,这验证了所提性能分析模型及自适应分块参数算法的有效性。
中图分类号:
[1]RIVERA G,TSENG C W.Tiling optimizations for 3D scientific computations [C]//Proceedings of the 2000 ACM/IEEE Confe-rence on Supercomputing(SC’00).Piscataway:IEEE,2000:32. [2]YUAN L,ZHANG Y,GUO P,et al.Tessellating stencils [C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.New York:ACM,2017:1-13. [3]WIJESINGHE T,SENEVIRATHNE K,SIRIWARDHANA C,et al.Parameterized Diamond Tiling for Parallelizing stencil computations [C]//2017 Moratuwa Engineering Research Conference (MERCon).Piscataway:IEEE,2017:99-104. [4]GUO P,YUAN L,ZHANG Y,et al. Parallel Stencil Algorithm Based on Tessellating [J].Journal of Frontiers of Computer Science and Technology,2019,13(2):181-194. [5]MURANUSHI T,MAKINO J.Optimal Temporal Blocking for Stencil Computation [J].Procedia Computer Science,2015,51:1303-1312. [6]WELLEIN G,HAGER G,ZEISER T,et al.Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization [C]//2009 33rd Annual IEEE International Computer Software and Applications Conference.Piscataway:IEEE,2009:579-586. [7]FRIGO M,STRUMPEN V.Cache oblivious stencil computa-tions [C]//Proceedings of the 19th Annual International Confe-rence on Supercomputing.New York:ACM,2005:361-366. [8]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULAU,et al.Effective automatic parallelization of stencil computations [C]//Programming Language Design and Implementation.New York:ACM,2007:235-244. [9]NGUYEN A,SATISH N,CHHUGANI J,et al.3.5-D blocking optimization for stencil computations on modern CPUs and GPUs [C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Sto-rage and Analysis(SC’10).Piscataway:IEEE,2010:1-13. [10]RAWAT P S,HONG C,RAVISHANKAR M,et al.Effectiveresource management for enhancing performance of 2D and 3D stencils on GPUs [C]//Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit(GPGPU ’16).New York:ACM,2016:92-102. [11]SZUSTAK L,ROJEK K,OLAS T,et al. Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor [J].Scientific Programming,2015(5):1-14. [12]WILLIAMS S,SHALF J,OLIKER L,et al. Scientific computing kernels on the cell processor [J].International Journal of Parallel Programming,2007,35(3):263-298. [13]AO Y,YANG C,WANG X,et al.26 pflops stencil computations for atmospheric modeling on sunway taihulight [C]//2017 IEEE International Parallel and Distributed Processing Sympo-sium (IPDPS).Piscataway:IEEE,2017:535-544. [14]MENG J,SKADRON K.Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs [C]//International Conference on Supercomputing.New York:ACM,2009:256-265. [15]RAWAT P S,VAIDYA M,SUKUMARAN-RAJAM A,et al. Domain-Specific Optimization and Generation of High-Perfor-mance GPU Code for Stencil Computations [J].Proceedings of the IEEE,2018,106(11):1902-1920 . [16]DATTA K,MURPHY M,VOLKOV V,et al.Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures [C]//Proceedings of the 2008 ACM/IEEE Con-ference on Supercomputing(SC’08).Piscataway:IEEE,2008:1-12. [17]MATSUMURA K,ZOHOURI H R,WAHIB M,et al.AN5D:automated stencil framework for high-degree temporal blocking on GPUs [C]//Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization.New York:ACM,2020 . [18]UNAT D,CAI X,BADEN S B.Mint:realizing CUDA perfor-mance in 3D stencil methods with annotated C [C]//International Conference on Supercomputing.New York:ACM,2011:214-224. [19]BONDHUGULA U,BANDISHTI V,PANANILATH I.Dia-mond Tiling:Tiling Techniques to Maximize Parallelism for Stencil Computations [J].IEEE Transactions on Parallel and Distributed Systems,2017,28(5):1285-1298. [20]WU Q,NI Y,HUANG X.Regional Ocean Model Parallel Optimization in “Sunway TaihuLight” [J].Journal of Computer Research and Development,2019,56(7):1556-1566. [21]LV X J,LIU Z,CHU X S,et al. Extreme-scale SimulationBased LBM Computing Fluid Dynamics Simulations [J].Computer Science,2020,47(4):13-17. [22]SAIDI S,TENDULKAR P,LEPLEY T,et al. Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms [J].Microprocessors Microsystems,2013,37(8):848-857. [23]XU Z,LIN J,MATSUOKA S.Benchmarking sw26010 many-core processor [C]//2017 IEEE International Parallel and Distributed Processing Symposium Workshops(IPDPSW).Piscata-way:IEEE,2017:743-752. |
[1] | 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051 |
[2] | 袁欣辉, 林蓉芬, 魏迪, 尹万旺, 徐金秀. 面向国产异构众核处理器SW26010的BFS优化方法 Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 计算机科学, 2020, 47(8): 98-104. https://doi.org/10.11896/jsjkx.191000013 |
[3] | 吕小敬, 刘钊, 褚学森, 石树鹏, 孟虹松, 黄震春. 面向超大规模并行模拟的LBM计算流体力学软件 Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations 计算机科学, 2020, 47(4): 13-17. https://doi.org/10.11896/jsjkx.191000010 |
[4] | 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究 Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids 计算机科学, 2019, 46(6A): 518-522. |
[5] | 陶小涵, 庞建民, 高伟, 王琦, 姚金阳. 基于SW26010处理器的FT程序的性能优化 Performance Optimization of FT Program Based on SW26010 Processor 计算机科学, 2019, 46(4): 321-328. https://doi.org/10.11896/j.issn.1002-137X.2019.04.050 |
[6] | 晋钢,王蕾,王志英. 异步电路的静态数据流图模型及其性能分析 Performance Evaluation Method for Asynchronous Circuit Based on Static Data Flow Structure 计算机科学, 2009, 36(12): 231-234. |
|