计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 100-109.doi: 10.11896/jsjkx.231100060
廖启华1, 聂凯2, 韩林2, 陈梦尧1, 谢汶兵3
LIAO Qihua1, NIE Kai2, HAN Lin2, CHEN Mengyao1, XIE Wenbing3
摘要: 现有的多面体编译框架(如Pluto,LLVM/Polly和GCC/Graphite)在进行循环分块时,都采用了固定分块大小,无法充分发挥不同硬件的缓存特性,导致存在较大的性能差异。针对这一问题,涌现了许多基于多级缓存和数据局部性的循环分块算法,但这些算法往往只能优化特定循环程序或者缺乏综合考虑,不适合移植到通用编译器中。文中提出了一种基于数据局部性的循环分块选择算法,该算法不仅考虑了缓存替换策略的影响,还考虑了多核环境下的负载均衡问题。算法基于LLVM中的Polly模块实现,并选用Pluto和PolyBench中的部分测试用例进行单核和多核测试。实验结果表明,单核环境下,相比LLVM/Polly的默认分块方法,该算法在两种硬件平台下分别获得了平均2.03和2.05的加速比,且在多核环境下具有良好的并行可扩展性。
中图分类号:
[1]ABDOLLAHI-KALKHORAN A,LOTFI S,IZADKHAH H.TEA-SEA:Tiling and scheduling of non-uniform two-level perfectly nested loops using an evolutionary approach[J].Expert Systems with Applications,2022,191:116152. [2]XU R,SHA E H M,ZHUGE Q,et al.Loop interchange and ti-ling for multi-dimensional loops to minimize write operations on NVMs[J].Journal of Systems Architecture,2023,135:102799. [3]SATO Y,YUKI T,ENDO T.An autotuning framework forscalable execution of tiled code via iterative polyhedral compilation[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,15(4):1-23. [4]BENABDERRAHMANE M W,POUCHET L N,COHEN A,et al.The polyhedral model is more widely applicable than you think[C]//Compiler Construction:19th International Confe-rence,CC 2010,Held as Part of the Joint European Conferences on Theory and Practice of Software,ETAPS 2010,Paphos,Cyprus,March 20-28,2010.Springer Berlin Heidelberg,2010:283-303. [5]POUCHET L N,BONDHUGULA U,BASTOUL C,et al.Loop transformations:convexity,pruning and optimization[J].ACM SIGPLAN Notices,2011,46(1):549-562. [6]KELEFOURAS V,DJEMAME K,KERAMIDAS G,et al.Ananalytical model for loop tiling transformation[C]//Embedded Computer Systems:Architectures,Modeling,and Simulation:21st International Conference,SAMOS 2021,Virtual Event,July 4-8,2021,Proceedings.Cham:Springer International Publi-shing,2022:95-107. [7]KELEFOURAS V,DJEMAME K,KERAMIDAS G,et al.AMethodology for Efficient Tile Size Selection for Affine Loop Kernels[J].International Journal of Parallel Programming,2022,50(3/4):405-432. [8]YUKI T,RENGANARAYANAN L,RAJOPADHYE S,et al.Automatic creation of tile size selection models[C]//Procee-dings of the 8thAnnual IEEE/ACM International Symposium on Code Generation and Optimization.2010:190-199. [9]BASKARAN M M,HARTONO A,TAVARAGERI S,et al.Parameterized tiling revisited[C]//Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2010:200-209. [10]TRIFUNOVIC K,COHEN A,EDELSOHN D,et al.Graphite two years after:First lessons learned from real-world polyhedral compilation[C]//GCC Research Opportunities Workshop(GROW'10).2010. [11]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULAU,et al.Effective automatic parallelization of stencil computations[J].ACMSigplan Notices,2007,42(6):235-244. [12]BONDHUGULA U,BANDISHTI V,PANANILATH I.Dia-mond tiling:Tiling techniques to maximize parallelism for stencil computations[J].IEEE Transactions on Parallel and Distri-buted Systems,2016,28(5):1285-1298. [13]GROSSER T,COHEN A,HOLEWINSKI J,et al.Hybrid hexagonal/classical tiling for GPUs[C]//Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:66-75. [14]MEHTA S,BEERAKA G,YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,10(4):1-27. [15]LOW T M,IGUAL F D,SMITH T M,et al.Analytical mode-ling is enough for high-performance BLIS[J].ACM Transactions on Mathematical Software(TOMS),2016,43(2):1-18. [16]MEHTA S,GARG R,TRIVEDI N,et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercompu-ting.2016:1-12. [17]SHIRAKO J,SHARMA K,FAUZIA N,et al.Analytical bounds for optimal tile size selection[C]//Compiler Construction:21st International Conference,CC 2012,Held as Part of the European Joint Conferences on Theory and Practice of Software,ETAPS 2012,Tallinn,Estonia,March 24-April 1,2012.Proceedings 21.Springer Berlin Heidelberg,2012:101-121. [18]CHAI X F,LIU S,QU B,et al.Vectorization-Friendly Tile Size Selection Algorithm[J].Computer Engineering and Applications,2020,56(15):37-42. [19]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Computer Science,2021,48(6):10-18. [20]BAO Y K,ZHANG P,XU X W,et al.Prediction of OptimalLoop Tiling Size for stencil Computation Based on Neural Network Model[J].Computer Science,2022,49(10):18-26. [21]QU B,LIU S,ZHANG Z Y,et al.Hexagonal Loop Tiling for Jacobi Computation Optimization Method[J].Ruan Jian Xue Bao/Journal of Software,2024,35(8):3721-3738. [22]NARASIMHAN K,ACHARYA A,BAID A,et al.A practicaltile size selection model for affine loop nests[C]//Proceedings of the ACM International Conference on Supercomputing.2021:27-39. [23]The LLVM Foundation.The LLVM Compiler Infrastructure[EB/OL].https://llvm.org/. [24]RAGHESH A,JOHANNES D,TOBIAS G,et al.LLVMFramework for High-Level Loop and Data-Locality Optimizations[EB/OL].https://polly.llvm.org/index.html. [25]LIU S,WU W G,ZHAO B,et al.Loop Tiling for Optimization of Locality and Parallelism[J].Journal of Computer Research and Development,2015,52(5):1160-1176. |
|