计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 363-370.doi: 10.11896/jsjkx.220100119
纪璎芮1, 袁良2, 张云泉2
JI Ying-rui1, YUAN Liang2, ZHANG Yun-quan2
摘要: Stencil(模版计算)是一类常见的循环嵌套计算模式,被广泛应用于计算电磁、天气模拟、地球物理、海洋模拟等许多科学和工程模拟应用中。随着现代处理器体系结构的发展,多核和多层存储层次不断加深,研究并行性和局部性成为了提高程序运行速度的主要途径。分块是开发数据局部性和程序并行性的主要技术之一,目前,针对Stencil已提出了大量高效分块和向量化方法,但大多局限于具有较高并行度的Jacobi 类型的Stencil。Gauss-Seidel Stencil具有更优的收敛速度,被广泛应用于多重网格的计算中。这类Stencil的数据依赖更为复杂,文中面向红黑排序的Gauss-Seidel Stencil设计了一种并行分块和向量化算法,提升了Gauss-Seidel Stencil的数据局部性、中粒度多核并行性以及核内细粒度并行性。实验结果证实了本文方案的有效性。
中图分类号:
[1]WOLF M E,LAM M S.A data locality optimizing algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.1991:30-44. [2]SONG Y,LI Z.New tiling techniques to improve cache temporal locality[J].ACM SIGPLAN Notices,1999,34(5):215-228. [3]RIVERA G,TSENG C W.Tiling optimizations for 3D scientific computations[C]//Proceedings of the 2000 ACM/IEEE Confe-rence on Supercomputing(SC’00).IEEE,2000:32. [4]MENG J,SKADRON K.Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs[C]//Proceedings of the 23rd International Conference on Supercomputing.2009:256-265. [5]RASTELLO F,DAUXOIS T.Efficient tiling for an ode discrete integration program:Redundant tasks instead of trapezoidal shaped-tiles[C]//Proceedings of the IEEE IPDPS’02.Pisca-taway,NJ:IEEE,2002:138. [6]NGUYEN A,SATISH N,CHHUGANI J,et al.3.5-D blocking optimization for stencil computations on modern CPUs and GPUs[C]//Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,Networking,Sto-rage and Analysis(SC’10).IEEE,2010:1-13. [7]DING C,HE Y.A ghost cell expansion method for reducingcommunications in solving PDE problems[C]//Proceedings of the 2001 ACM/IEEE Conference on Supercomputing(SC’01).IEEE,2001:55-55. [8]JIN G,MELLOR-CRUMMEY J,FOWLER R.Increasing temporal locality with skewing and recursive blocking[C]//Proceedings of the 2001 ACM/IEEE Conference on Supercomputing(SC’01).IEEE,2001:57-57. [9]WONNACOTT D.Achieving scalable locality with time ske-wing[J].International Journal of Parallel Programming,2002,30(3):181-221. [10]MALAS T,HAGER G,LTAIEF H,et al.Multicore-optimized wavefront diamond blocking for optimizing stencil updates[J].SIAM Journal on Scientific Computing,2015,37(4):C439-C464. [11]WELLEIN G,HAGER G,ZEISER T,et al.Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization[C]//2009 33rd Annual IEEE International Computer Software and Applications Conference.IEEE,2009:579-586. [12]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation.2008:101-113. [13]BANDISHTI V,PANANILATH I,BONDHUGULA U.Tiling stencil computations to maximize parallelism[C]//Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis(SC’12).IEEE,2012:1-11. [14]PANANILATH I,ACHARYA A,VASISTA V,et al.An optimizing code generator for a class of lattice-boltzmann computations[J].ACM Transactions on Architecture and Code Optimization (TACO),2015,12(2):1-23. [15]GROSSER T,VERDOOLAEGE S,COHEN A,et al.The relation between diamond tiling and hexagonal tiling[J].Parallel Processing Letters,2014,24(3):1441002. [16]FRIGO M,STRUMPEN V.Cache oblivious stencil computa-tions[C]//Proceedings of the 19th Annual International Conference on Supercomputing.2005:361-366. [17]FRIGO M,STRUMPEN V.The cache complexity of mul-tithreaded cache oblivious algorithms[J].Theory of Computing Systems,2009,45(2):203-233. [18]STRZODKA R,SHAHEEN M,PAJAK D,et al.Cache obli-vious parallelograms in iterative stencil computations[C]//Proceedings of the 24th ACM International Conference on Supercomputing.2010:49-59. [19]TANG Y,CHOWDHURY R A,KUSZMAUL B C,et al.The pochoir stencil compiler[C]//Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures.2011:117-128. [20]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULA U,et al.Effective automatic parallelization of stencil computations[J].ACM Sigplan Notices,2007,42(6):235-244. [21]CUI H M,WANG L,FAN D R,et al.Landing stencil code on Godson-T[J].Journal of Computer Science and Technology,2010,25(4):886-894. [22]WONNACOTT D.Using time skewing to eliminate idle timedue to memory bandwidth and network limitations[C]//Proceedings 14th International Parallel and Distributed Processing Symposium(IPDPS 2000).IEEE,2000:171-180. [23]ZHOU X,GIACALONE J P,GARZARÁN M J,et al.Hierarchical overlapped tiling[C]//Proceedings of the Tenth International Symposium on Code Generation and Optimization.2012:207-218. [24]GROSSER T,COHEN A,KELLY P H J,et al.Split tiling for GPUs:automatic parallelization using trapezoidal tiles[C]//Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units.2013:24-31. [25]HENRETTY T,VERAS R,FRANCHETTI F,et al.A stencil compiler for short-vector simd architectures[C]//Proceedings of the 27th International ACM Conference on International Confe-rence on Supercomputing.2013:13-24. [26]STRZODKA R,SHAHEEN M,PAJAK D,et al.Cache accurate time skewing in iterative stencil computations[C]//2011 International Conference on Parallel Processing.IEEE,2011:571-581. [27]GROSSER T,COHEN A,HOLEWINSKI J,et al.Hybrid hexagonal/classical tiling for GPUs[C]//Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:66-75. [28]YUAN L,ZHANG Y,GUO P,et al.Tessellating stencils[C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.2017:1-13. |
[1] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[2] | 苏尔. 初等稳定矩阵约化A0为上Hessenberg型的方法研究 Research on Method of Reducing A0 to Upper Hessenberg Type with Elementary Stability Matrix 计算机科学, 2021, 48(6A): 649-657. https://doi.org/10.11896/jsjkx.200800063 |
[3] | 朱雨, 庞建民, 徐金龙, 陶小涵, 王军. 面向SW26010处理器的三维Stencil自适应分块参数算法 Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor 计算机科学, 2021, 48(6): 10-18. https://doi.org/10.11896/jsjkx.200700059 |
[4] | 钱心缘, 吴文渊. 基于R-SIS和R-LWE构建的IBE加密方案 Identity-based Encryption Scheme Based on R-SIS/R-LWE 计算机科学, 2021, 48(6): 315-323. https://doi.org/10.11896/jsjkx.200700215 |
[5] | 赵会群, 吴凯锋. 一种大数据估价算法 Big Data Valuation Algorithm 计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156 |
[6] | 池昊宇, 陈长波. 基于神经网络的循环分块大小预测 Prediction of Loop Tiling Size Based on Neural Network 计算机科学, 2020, 47(8): 62-70. https://doi.org/10.11896/jsjkx.191200180 |
[7] | 喻露, 胡剑锋, 姚磊岳. 全局块与局部块协作的相关滤波目标跟踪算法 Correlation Filter Object Tracking Algorithm Based on Global and Local Block Cooperation 计算机科学, 2020, 47(6): 157-163. https://doi.org/10.11896/jsjkx.190500078 |
[8] | 田伟, 刘浩, 陈根龙, 宫晓蕙. 面向分块压缩感知的交叉子集导引自适应观测 Cross Subset-guided Adaptive Measurement for Block Compressive Sensing 计算机科学, 2020, 47(12): 190-196. https://doi.org/10.11896/jsjkx.200800197 |
[9] | 李春景, 胡静, 唐枝. 基于层次特征的自适应径向基插值图像放大的保真指标 Fidelity Index in Image Magnification Based on Hierarchical Feature and Radial Basis Function 计算机科学, 2019, 46(4): 254-260. https://doi.org/10.11896/j.issn.1002-137X.2019.04.040 |
[10] | 陈莉莉, 朱峰, 盛斌, 陈志华. 基于离散四元数傅里叶变换的彩色图像质量评价 Quality Evaluation of Color Image Based on Discrete Quaternion Fourier Transform 计算机科学, 2018, 45(8): 70-74. https://doi.org/10.11896/j.issn.1002-137X.2018.08.012 |
[11] | 王燕, 王双印. 基于卷积神经网络的人脸信息增强识别研究 Research on Face Information Enhancement and Recognition Based on Convolutional Neural Network 计算机科学, 2018, 45(8): 268-271. https://doi.org/10.11896/j.issn.1002-137X.2018.08.048 |
[12] | 杜秀丽, 张薇, 顾斌斌, 陈波, 邱少明. 基于灰度共生矩阵的图像自适应分块压缩感知方法 GLCM-based Adaptive Block Compressed Sensing Method for Image 计算机科学, 2018, 45(8): 277-282. https://doi.org/10.11896/j.issn.1002-137X.2018.08.050 |
[13] | 瞿中,郭阳,鞠芳蓉. 一种基于改进渗流模型的混凝土表面裂缝快速检测算法 Algorithm of Accelerated Cracks Detection Based on Improved Percolation Model in Concrete Surface Image 计算机科学, 2017, 44(1): 300-302. https://doi.org/10.11896/j.issn.1002-137X.2017.01.055 |
[14] | 曹春红,张建华,李林峰. 基于双子代差分演化和自适应分块机制的多聚焦图像融合算法 Multi-focus Image Fusion Based on Twin-generation Differential Evolution and Adaptive Block Mechanism 计算机科学, 2016, 43(7): 67-72. https://doi.org/10.11896/j.issn.1002-137X.2016.07.011 |
[15] | 王毅,金忠. 基于矩阵完整化的分块整合推荐算法 Split-Integration Recommendation Algorithm Based on Matrix Completion 计算机科学, 2016, 43(5): 219-222. https://doi.org/10.11896/j.issn.1002-137X.2016.05.040 |
|