Computer Science ›› 2020, Vol. 47 ›› Issue (8): 62-70.doi: 10.11896/jsjkx.191200180

;

Previous Articles     Next Articles

Prediction of Loop Tiling Size Based on Neural Network

CHI Hao-yu, CHEN Chang-bo   

  1. Chongqing School, University of Chinese Academy of Sciences, Chongqing 400714, China
    School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CHI Hao-yu, born in 1996, postgra-duate.His main research interests include compiler optimization and machine learning.
    CHEN Chang-bo, born in 1981, Ph.D, associate professor.His main research interests include symbolic-numeric computation, automatic parallelization and optimzation of computer programs and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (11771421, 11671377, 61572024), Chinese Academy of Sciences “Light of the West”, Chongqing Academician-led Science and Technology Innovation Guidance Project (cstc2018jcyj-yszxX0002).

Abstract: Optimization of loop programs is an important topic in program optimization.As a typical technique of loop optimization, loop tiling has been widely studied and applied.The selection of tile size, which is complex and highly dependent on program and hardware, has a great impact on the performance of loops.Traditional approaches based on static analysis and heuristic experience search have high cost of labor and time, and lack generality and portability.Therefore, the neural network method, which is good at representing high-dimensional information, is considered to learn the hidden correlation between tiling size and program performance, which is a result of complex interaction between program and hardware.A new group of features with 29 dimensions are extracted based on size of the problem, structure of the loop, locality of the operations within the loop.The experiments are carried out on hundreds of thousands of lines of six kinds of kernel programs (3D loop, 2D data) with random sizes in the range of 1024 to 2048.The sequential model (TSS-T6) achieves an average speedup of 6.64 compared with GCC-O2, 98.5% of the average maximum available performancecompared with the exhaustive search, and an average 9.9% performance improvement compared with Pluto.The parallel model (TSSP-T6-Search) achieves an average speedup of 2.41 compared with the OpenMP default optimization, 91.7% of the average maximum available performance compared with the exhaustive search, and an average 9% performance improvement compared with Pluto default parallel tiling optimization.

Key words: Artificial neural network, Automatic tuning, Cache optimization, Compilation optimization, Loop tiling

CLC Number: 

  • TP314
[1] KASPERSKY K.Code Optimization:Effective Memory Usage[M].A-List Publishing, 2003.
[2] ASHOURI A H, PALERMO G, CAVAZOS J, et al.AutomaticTuning of Compilers Using Machine Learning[M].Springer International Publishing, 2018.
[3] WANG Z, O’BOYLE M.Machine learning in compiler optimization[J].Proceedings of the IEEE, 2018, 106(11):1879-1901.
[4] ASHOURI A H, KILLIAN W, CAVAZOS J, et al.A Survey onCompiler Autotuning using Machine Learning[J].ACM Computing Surveys, 2018, 51(5):1-42.
[5] LIU S, WU W G, ZHAO B, et al.Loop tiling technology for local and parallel optimization[J].Computer research and development, 2015, 52(5):1160-1176.
[6] YUKI T, RENGANARAYANAN L, RAJOPADHYE S, et al.Automatic creation of tile size selection models[C]∥Procee-dings of the 8th annual IEEE/ACM international symposium on Code generation and optimization.ACM, 2010:190-199.
[7] KISUKI T, KNIJNENBURG P M W, O'BOYLE M F P.Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation[C]∥Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques.Piscataway, NJ:IEEE, 2000:237-246.
[8] STEPHENSON M, AMARASINGHE S P.Predicting unrollfactors using supervised classification[C]∥ International Symposium on Code Generation & Optimization.IEEE, 2005.
[9] CUMMINS C, PETOUMENOS P, WANG Z, et al.End-to-end deep learning of optimization heuristics[C]∥2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).IEEE, 2017:219-232.
[10]ASHOURI A H, MARIANI G, PALERMO G, et al.Cobayn:Compiler autotuning framework using bayesian networks[J].ACM Transactions on Architecture and Code Optimization (TACO), 2016, 13(2):1-25.
[11]WANG Z, O’BOYLE M F P.Mapping parallelism to multi-cores:a machine learning based approach[C]∥ACM Sigplan Notices.ACM, 2009:75-84.
[12]CASTRO M, GOES L F W, RIBEIRO C P, et al.A machine learning-based approach for thread mapping on transactional memory applications[C]∥2011 18th International Conference on High Performance Computing.IEEE, 2011:1-10.
[13]KIM D G, RENGANARAYANAN L, ROSTRON D, et al.Multi-level tiling:M for the price of one[C]∥Proceedings of the 2007 ACM/IEEE conference on Supercomputing(SC’ 07).IEEE, 2007:1-12.
[14]MEHTA S, BEERAKA G, YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization (TACO), 2013, 10(4):1-27.
[15]MEHTA S, GARG R, TRIVEDI N, et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]∥Proceedings of the 2016 International Conference on Supercompu-ting.ACM, 2016:1-12.
[16]CHAME J, MOON S.A tile selection algorithm for data locality and cache interference[C]∥Proceedings of the 13th Internatio-nal Conference on Supercomputing.ACM, 1999:492-499.
[17]COLEMAN S, MCKINLEY K S.Tile size selection using cache organization and data layout[J].ACM SIGPLAN Notices, 1995, 30(6):279-290.
[18]YOTOV K, PINGALI K, STODGHILL P.Think globally, search locally[C]∥Proceedings of the 19th annual international conference on Supercomputing.ACM, 2005:141-150.
[19]WHALEY R C, DONGARRA J J.Automatically tuned linear algebra software[C]∥Proceedings of the 1998 ACM/IEEE Conference on Supercomputing(SC’98).IEEE, 1998:38-38.
[20]WHALEY R C, PETITET A, DONGARRA J J.Automated empirical optimizations of software and the ATLAS project[J].Parallel computing, 2001, 27(1/2):3-35.
[21]SHIRAKO J, SHARMA K, FAUZIA N, et al.Analytical bounds for optimal tile size selection[C]∥International Conference on Compiler Construction.Springer, Berlin, Heidelberg, 2012:101-121.
[22]FRAGUELA B B, CARMUEJA M G, ANDRADE D.Optimal Tile Size Selection Guided by Analytical Models[C]∥Parallel Computing, 2005:565-572.
[23]LIU S, CUI Y, JIANG Q, et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing, 2018, 121:27-41.
[24]RAHMAN M, POUCHET L N, SADAYAPPAN P.Neural network assisted tile size selection[C]∥International Workshop on Automatic Performance Tuning.Berkeley, CA:Springer Verlag.2010.
[25]MALIK A M.Optimal tile size selection problem using machine learning[C]∥2012 11th International Conference on Machine Learning and Applications.IEEE, 2012, 2:275-280.
[26]BENGIO Y, COURVILLE A, GOODFELLOW I J.Deep learning:adaptive computation and machine learning[J].Bengio.A.Courville, 2016, 3:56-69.
[27]BONDHUGULA U, HARTONO A, RAMANUJAM J, et al.A practical automatic polyhedral parallelizer and locality optimizer[C]∥Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM, 2008:101-113.
[28]ALAN J S.Cache Memories[J].ACM Computing Surveys,
1982, 14(3):473-530.
[29]LEVINTHAL D.Performance analysis guide for intel core i7processor and intel xeon 5500 processors[J].Intel Performance Analysis Guide, 2009, 30:18.
[30]PATTERSON D A, HENNESSY J L, GOLDBERG D.Computer architecture:a quantitative approach[M]∥Computer Architecture:A Quantitative Approach.Morgan Kaufmann Publishers Inc.2008, 153-162.
[31]POUCHET L N.Polybench:The polyhedral benchmark suite[J/OL].http://www.cs.ucla.edu/pouchet/software/polybench, 2012.
[32]IPEK E, MCKEE S A, SINGH K, et al.Efficient architectural design space exploration via predictive modeling[J].ACM Transactions on Architecture and Code Optimization (TACO), 2008, 4(4):1-34.
[33]LEATHER H, BONILLA E, O’BOYLE M.Automatic feature generation for machine learning based optimizing compilation[C]∥Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization.IEEE Computer Society, 2009:81-91.
[34]HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580, 2012.
[35]LIU F, LI G, HU X, et al.Research progress of program understanding based on deep learning[J].Computer Research and Development, 2019, 56(8):1605-1620.
[36]JIN Z, LIU F, LI G.Program understanding:present and future[J].Journal of Software, 2019, 30(1):110-126.
[1] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[2] CHI Hao-yu, CHEN Chang-bo. Survey on Automatic Tuning of Compilers by Machine Learning [J]. Computer Science, 2022, 49(1): 241-251.
[3] WANG Peng, SU Wei, ZHANG Jiu-wen, LIU Ying-Jie and WANG Zhen-rui. Prediction of Vessel Load Based on Vessel Automatic Identification System and Artificial Neural Network [J]. Computer Science, 2020, 47(6A): 49-53.
[4] ZHANG Jing, YANG Jian, SU Peng. Survey of Monosyllable Recognition in Speech Recognition [J]. Computer Science, 2020, 47(11A): 172-174.
[5] WEI Jian-wen, XU Zhi-geng, WANG Bing-qiang, Simon SEE and James LIN. Accelerating Gene Clustering on Heterogeneous Clusters [J]. Computer Science, 2017, 44(3): 20-22.
[6] ZHAO Zhang-ming, FENG Jing, SHI En and SHU Xiao-cun. h-ACOR:An ACOR Algorithm with Heuristic Information for Neural Network Training [J]. Computer Science, 2017, 44(11): 284-288.
[7] ZHOU Xi and XUE Shan-liang. WSN Fault Diagnosis with Improved Rough Set and Neural Network [J]. Computer Science, 2016, 43(Z11): 21-25.
[8] DING Shan and SONG Li-xiao. Improved Method of Microaneurysm Detection Algorithm Based on Digital Fundus Images [J]. Computer Science, 2014, 41(12): 269-274.
[9] SHANG Xing-hong,SHANG Xi-yue,ZHANG Jing and QIAN Huan-yan. Research on Fault Diagnosis of Wireless Sensor Nodes [J]. Computer Science, 2013, 40(Z6): 327-329.
[10] . Application of Improved BP Neural Network in Predicting of Region GDP [J]. Computer Science, 2012, 39(Z11): 108-110.
[11] . Power Analysis for Executable Program on Single Computer Based on Artificial Neural Network [J]. Computer Science, 2012, 39(5): 282-286.
[12] . Situation-driven Framework for Context-aware Computing [J]. Computer Science, 2012, 39(3): 216-222.
[13] DENG Ya-dan,DING Ning,XIONG Wei. State of the Art and Future Challenge on Database Algorithm Optimization Based on Modern Processor [J]. Computer Science, 2009, 36(8): 17-20.
[14] . [J]. Computer Science, 2009, 36(3): 45-47.
[15] . [J]. Computer Science, 2008, 35(8): 122-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!