Computer Science ›› 2024, Vol. 51 ›› Issue (12): 100-109.doi: 10.11896/jsjkx.231100060

• High Performance Computing • Previous Articles     Next Articles

Tile Selection Algorithm Based on Data Locality

LIAO Qihua1, NIE Kai2, HAN Lin2, CHEN Mengyao1, XIE Wenbing3   

  1. 1 School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
    2 National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450000, China
    3 Wuxi Advanced Technology Research Institute, Wuxi, Jiangsu 214000, China
  • Received:2023-11-09 Revised:2024-04-21 Online:2024-12-15 Published:2024-12-10
  • About author:LIAO Qihua,born in 1997,postgra-duate.His main research interests include compiler optimization and high-performance computing.
    HAN Lin,born in 1978,Ph.D,associate professor,is a senior member of CCF(No.16416M).His main research interests include compiler optimization and high-performance computing.
  • Supported by:
    Major Science and Technology Special Projects in Henan Province for 2022(221100210600) and Seeking Truth and Science Research Fund for 2022(32213247).

Abstract: The existing polyhedral compilation frameworks(such as Pluto,LLVM/Poly and GCC/Graphite) use fixed block sizes when performing loop tiling,which cannot fully utilize the caching characteristics of different hardware,resulting in significant performance differences.In response to this issue,many loop tiling algorithms based on multi-level caching and data locality have emerged,but these algorithms often only optimize specific loop programs or lack comprehensive consideration,and are not suitable for transplantation into general compilers.This paper proposes a tile size selection algorithm based on data locality,which not only considers the impact of cache replacement strategy,but also considers the load balancing problem in multi-core environments.The algorithm is implemented based on the Polly module in LLVM,and some test cases from Pluto and PolyBench are selected for single core and multi-core testing.The experimental results show that compared to the default partitioning method of LLVM/Polly,the proposed algorithm achieves an average acceleration ratio of 2.03 and 2.05 on two hardware platforms in a single core environment,and has good parallel scalability in a multi-core environment.

Key words: Data locality, Polyhedral model, Loop tiling, Tile size, Load balancing

CLC Number: 

  • TP311
[1]ABDOLLAHI-KALKHORAN A,LOTFI S,IZADKHAH H.TEA-SEA:Tiling and scheduling of non-uniform two-level perfectly nested loops using an evolutionary approach[J].Expert Systems with Applications,2022,191:116152.
[2]XU R,SHA E H M,ZHUGE Q,et al.Loop interchange and ti-ling for multi-dimensional loops to minimize write operations on NVMs[J].Journal of Systems Architecture,2023,135:102799.
[3]SATO Y,YUKI T,ENDO T.An autotuning framework forscalable execution of tiled code via iterative polyhedral compilation[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,15(4):1-23.
[4]BENABDERRAHMANE M W,POUCHET L N,COHEN A,et al.The polyhedral model is more widely applicable than you think[C]//Compiler Construction:19th International Confe-rence,CC 2010,Held as Part of the Joint European Conferences on Theory and Practice of Software,ETAPS 2010,Paphos,Cyprus,March 20-28,2010.Springer Berlin Heidelberg,2010:283-303.
[5]POUCHET L N,BONDHUGULA U,BASTOUL C,et al.Loop transformations:convexity,pruning and optimization[J].ACM SIGPLAN Notices,2011,46(1):549-562.
[6]KELEFOURAS V,DJEMAME K,KERAMIDAS G,et al.Ananalytical model for loop tiling transformation[C]//Embedded Computer Systems:Architectures,Modeling,and Simulation:21st International Conference,SAMOS 2021,Virtual Event,July 4-8,2021,Proceedings.Cham:Springer International Publi-shing,2022:95-107.
[7]KELEFOURAS V,DJEMAME K,KERAMIDAS G,et al.AMethodology for Efficient Tile Size Selection for Affine Loop Kernels[J].International Journal of Parallel Programming,2022,50(3/4):405-432.
[8]YUKI T,RENGANARAYANAN L,RAJOPADHYE S,et al.Automatic creation of tile size selection models[C]//Procee-dings of the 8thAnnual IEEE/ACM International Symposium on Code Generation and Optimization.2010:190-199.
[9]BASKARAN M M,HARTONO A,TAVARAGERI S,et al.Parameterized tiling revisited[C]//Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2010:200-209.
[10]TRIFUNOVIC K,COHEN A,EDELSOHN D,et al.Graphite two years after:First lessons learned from real-world polyhedral compilation[C]//GCC Research Opportunities Workshop(GROW'10).2010.
[11]KRISHNAMOORTHY S,BASKARAN M,BONDHUGULAU,et al.Effective automatic parallelization of stencil computations[J].ACMSigplan Notices,2007,42(6):235-244.
[12]BONDHUGULA U,BANDISHTI V,PANANILATH I.Dia-mond tiling:Tiling techniques to maximize parallelism for stencil computations[J].IEEE Transactions on Parallel and Distri-buted Systems,2016,28(5):1285-1298.
[13]GROSSER T,COHEN A,HOLEWINSKI J,et al.Hybrid hexagonal/classical tiling for GPUs[C]//Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization.2014:66-75.
[14]MEHTA S,BEERAKA G,YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,10(4):1-27.
[15]LOW T M,IGUAL F D,SMITH T M,et al.Analytical mode-ling is enough for high-performance BLIS[J].ACM Transactions on Mathematical Software(TOMS),2016,43(2):1-18.
[16]MEHTA S,GARG R,TRIVEDI N,et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercompu-ting.2016:1-12.
[17]SHIRAKO J,SHARMA K,FAUZIA N,et al.Analytical bounds for optimal tile size selection[C]//Compiler Construction:21st International Conference,CC 2012,Held as Part of the European Joint Conferences on Theory and Practice of Software,ETAPS 2012,Tallinn,Estonia,March 24-April 1,2012.Proceedings 21.Springer Berlin Heidelberg,2012:101-121.
[18]CHAI X F,LIU S,QU B,et al.Vectorization-Friendly Tile Size Selection Algorithm[J].Computer Engineering and Applications,2020,56(15):37-42.
[19]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Computer Science,2021,48(6):10-18.
[20]BAO Y K,ZHANG P,XU X W,et al.Prediction of OptimalLoop Tiling Size for stencil Computation Based on Neural Network Model[J].Computer Science,2022,49(10):18-26.
[21]QU B,LIU S,ZHANG Z Y,et al.Hexagonal Loop Tiling for Jacobi Computation Optimization Method[J].Ruan Jian Xue Bao/Journal of Software,2024,35(8):3721-3738.
[22]NARASIMHAN K,ACHARYA A,BAID A,et al.A practicaltile size selection model for affine loop nests[C]//Proceedings of the ACM International Conference on Supercomputing.2021:27-39.
[23]The LLVM Foundation.The LLVM Compiler Infrastructure[EB/OL].https://llvm.org/.
[24]RAGHESH A,JOHANNES D,TOBIAS G,et al.LLVMFramework for High-Level Loop and Data-Locality Optimizations[EB/OL].https://polly.llvm.org/index.html.
[25]LIU S,WU W G,ZHAO B,et al.Loop Tiling for Optimization of Locality and Parallelism[J].Journal of Computer Research and Development,2015,52(5):1160-1176.
[1] DI Jianqiang, YUAN Liang, ZHANG Yunquan, ZHANG Sijia. Performance Optimization of Complex Stencil in Weather Forecast Model WRF [J]. Computer Science, 2024, 51(4): 56-66.
[2] HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, XU Jinchen. Optimisation of Automatic Matrix Multiplication Mixing Accuracy Based on Polyhedral Models [J]. Computer Science, 2024, 51(12): 110-119.
[3] YANG Zheming, ZUO Lulu, JI Wen. Joint Optimization Method for Node Deployment and Resource Allocation Based on End-EdgeCollaboration [J]. Computer Science, 2024, 51(11A): 240200010-7.
[4] HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen. Automatic Mixing Precision Optimization for Matrix Multiplication Calculation [J]. Computer Science, 2024, 51(11A): 240300057-10.
[5] FU Xiong, FANG Lei, WANG Junchang. Edge Server Placement for Energy Consumption and Load Balancing [J]. Computer Science, 2023, 50(6A): 220300088-5.
[6] XIE Haoshan, LIU Xiaonan, ZHAO Chenyan, LIU Zhengyu. Simulation Implementation of HHL Algorithm Based on Songshan Supercomputer System [J]. Computer Science, 2023, 50(6): 74-80.
[7] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[8] CHEN Ziqiang, XIA Zhengyou. Failure Recovery Model for Single Link with Congestion-Avoidance in SDN [J]. Computer Science, 2023, 50(4): 212-219.
[9] WENG Jie, LIN Bing, CHEN Xing. Multi-edge Server Load Balancing Strategy Based on Game Theory [J]. Computer Science, 2023, 50(11A): 221200150-8.
[10] YUAN Peiyan, MA Yiwen. Optimal Edge Server Placement Method Based on Delay and Load [J]. Computer Science, 2023, 50(11A): 220900260-8.
[11] GUO Yingya, WANG Lijuan, GENG Haijun. Edge Server Placement Algorithm Based on Spectral Clustering [J]. Computer Science, 2023, 50(10): 248-257.
[12] TIAN Zhen-zhen, JIANG Wei, ZHENG Bing-xu, MENG Li-min. Load Balancing Optimization Scheduling Algorithm Based on Server Cluster [J]. Computer Science, 2022, 49(6A): 639-644.
[13] GAO Jie, LIU Sha, HUANG Ze-qiang, ZHENG Tian-yu, LIU Xin, QI Feng-bin. Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor [J]. Computer Science, 2022, 49(5): 355-362.
[14] TAN Shuang-jie, LIN Bao-jun, LIU Ying-chun, ZHAO Shuai. Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning [J]. Computer Science, 2022, 49(2): 336-341.
[15] XU Yi-ming, MA Li, FU Ying-xun, LI Yang, MA Dong-chao. Intelligent Routing Technology for Multi-terminal Access in Integrated Network [J]. Computer Science, 2022, 49(12): 332-339.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!