Computer Science ›› 2022, Vol. 49 ›› Issue (10): 18-26.doi: 10.11896/jsjkx.220100147

• High Perfonnance Computing • Previous Articles     Next Articles

Prediction of Optimal Loop Tiling Size for stencil Computation Based on Neural Network Model

BAO Yi-kun1,2, ZHANG Peng2,3, XU Xiao-wen2,3, MO Ze-yao3,4   

  1. 1 Graduate School of China Academy of Engineering Physics,Beijing 100094,China
    2 Institute of Applied Physics and Computational Mathematics,Beijing 100088,China
    3 CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China
    4 China Academy of Engineering Physics,Mianyang,Sichuan 621900,China
  • Received:2022-01-15 Revised:2022-05-06 Online:2022-10-15 Published:2022-10-13
  • About author:BAO Yi-kun,born in 1997,postgra-duate.Her main research interests include performance optimization,parallel computing and machine learning.
    ZHANG Peng,born in 1990,associate researcher.His main research interests include parallel computing,high performance mathematics softwares and software performance optimizations on multi- and many-core processors.
  • Supported by:
    National Natural Science Foundation of China(62032023).

Abstract: Stencil computation is one kind of the most important loop kernels in scientific and engineering computing applications.Loop tiling can effectively improve the data locality of stencil computation and the degree of computational parallelism,but the best tile size is hard to choose.Traditional tile size selection methods usually have shortcomings in some ways of time overhead,labor cost and model accuracy.In this paper,a tile size selection method based on artificial neural network is proposed to predict the optimal tile size of three-dimensional Jacobi stencil loop programs.Experimental results show that,for 11 real stencil programs,the performance improvement of the programs using the model prediction tile size compared with the non tiling is 2% and 35% in serial and parallel tests respectively.Compared with the well-known grid search method,our method has a similar prediction accuracy,but only takes one 30 thousandth of the online time cost.In addition,compared with the Turbo-tiling method,our method improves the performance of tiled codes nearly 9% in average.

Key words: Stencil computation, Loop tiling technology, Machine learning, Artificial neural network

CLC Number: 

  • TP314
[1]DATTA K.Auto-tuning stencil codes for cache-based multicore platforms[D].Berkeley:University of California,2009.
[2] XU Y T,FU H H,GAN L,et al.Performance Optimization and Analysis for Different Stencil Kernels on Multi-Core and Many-Core Architectures[C]//2013 CCF National Annual Conference on High Performance Computing.2013:628-637.
[3]LIU S,WU W G,ZHAO B,et al.Loop Tiling for Optimization of Locality and Parallelism[J].Journal of Computer Research and Development,2015,52(5):1160-1176.
[4]WOLF M,LAM M.A data Locality Optimizing Algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.1991:30-44.
[5]LAM M S.The cache preformance and optimizations of Blocked Argorithm[J].ACM SIGPLAN Notices,1991(4):63-74.
[6]MITCHELL N,HOGSTEDT K,CARTER L,et al.Quantifying the Multi-Level Nature of Tiling Interactions[J].International Journal of Parallel Programming,1998,26(6):641-670.
[7]SARKAR V,MEGIDDO N.An analytical model for loop tiling and its solution[C]//2000 IEEE International Symposium on Performance Analysis of Systems and Software.ISPASS(Cat.No.00EX422).IEEE,2000:146-153.
[8]HSU C H,KREMER U.A quantitative analysis of tile size selection algorithms[J].The Journal of Supercomputing,2004,27(3):279-294.
[9]LIU J,ZHANG Y,DING W,et al.On-chip cache hierarchy-aware tile scheduling for multicore machines[C]//International Symposium on Code Generation and Optimization(CGO 2011).IEEE,2011:161-170.
[10]MEHTA S,BEERAKA G,YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,10(4):1-27.
[11]MEHTA S,GARG R,TRIVEDI N,et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercompu-ting.2016:1-12.
[12]SATO Y,YUKI T,ENDO T.An autotuning framework forscalable execution of tiled code via iterative polyhedral compilation[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,15(4):1-23.
[13]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algo-rithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Journal of Computer Science,2021,48(6):10-18.
[14]KISUKI T,KNIJNENBURG P M W,O'BOYLE M F P.Combined selection of tile sizes and unroll factors using iterative compilation[C]//Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques(Cat.No.PR00622).IEEE,2000:237-246.
[15]WHALEY R C,PETITET A,DONGARRA J J.Automated em-pirical optimizations of software and the ATLAS project[J].Parallel Computing,2001,27(1/2):3-35.
[16]KNIJNENBURG P M W,KISUKI T,GALLIVAN K,et al.The effect of cache models on iterative compilation for combined ti-ling and unrolling[J].Concurrency and Computation:Practice and Experience,2004,16(2/3):247-270.
[17]CHEN C,CHAME J,HALL M.Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy[C]//International Symposium on Code Generation and Optimization.IEEE,2005:111-122.
[18]YOTOV K,PINGALI K,STODGHILL P.Think globally,search locally[C]//Proceedings of the 19th Annual Interna-tional Conference on Supercomputing.2005:141-150.
[19]TIWARI A,CHEN C,CHAME J,et al.A scalable auto-tuning framework for compiler optimization[C]//2009 IEEE International Symposium on Parallel & Distributed Processing.IEEE,2009:1-12.
[20]GOTO K,VAN DE GEIJN R.High-performance implementa-tion of the level-3 BLAS[J].ACM Transactions on Mathematical Software(TOMS),2008,35(1):1-14.
[21]SHIRAKO J,SHARMA K,FAUZIA N,et al.Analytical bounds for optimal tile size selection[C]//International Conference on Compiler Construction.Berlin:Springer,2012:101-121.
[22]YUKI T,RENGANARAYANAN L,RAJOPADHYE S,et al.Automatic creation of tile size selection models[C]//Procee-dings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2010:190-199.
[23]MALIK A M.Optimal tile size selection problem using machine learning[C]//2012 11th International Conference on Machine Learning and Applications.IEEE,2012:275-280.
[24]ZHANG P,FANG J,YANG C,et al.Optimizing streaming para-llelism on heterogeneous many-core architectures[J].IEEE Transactions on Parallel and Distributed Systems,2020,31(8):1878-1896.
[25]LIU S,CUI Y,JIANG Q,et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing,2018,121:27-41.
[26]CHI H Y,CHEN C B.Prediction of Loop Tiling Size Based on Neural Network[J].Journal of Computer Science,2020:47(8):62-70.
[27]KAMIL S,HUSBANDS P,OLIKER L,et al.Impact of modern memory subsystems on cache optimizations for stencil computations[C]//Proceedings of the 2005 Workshop on Memory System Performance.2005:36-43.
[28]BORCHANI H,VARANDO G,BIELZA C,et al.A survey on multi-output regression[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2015,5(5):216-233.
[1] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[4] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[5] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[6] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[7] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[8] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[9] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[10] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[11] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[12] XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
[13] JI Ying-rui, YUAN Liang, ZHANG Yun-quan. Parallelization and Locality Optimization for Red-Black Gauss-Seidel Stencil [J]. Computer Science, 2022, 49(5): 363-370.
[14] LI Ye, CHEN Song-can. Physics-informed Neural Networks:Recent Advances and Prospects [J]. Computer Science, 2022, 49(4): 254-262.
[15] YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!