计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 18-26.doi: 10.11896/jsjkx.220100147
包怡坤1,2, 张鹏2,3, 徐小文2,3, 莫则尧3,4
BAO Yi-kun1,2, ZHANG Peng2,3, XU Xiao-wen2,3, MO Ze-yao3,4
摘要: stencil循环是科学与工程计算应用中最主要的计算核心之一。循环分块技术可有效改善stencil循环的数据局部性,提高计算并行度。分块的大小选择对stencil循环的性能影响很大,传统的分块大小选择方法通常在时间开销、人工成本、分块选择精度等方面存在短板,实用性较差。文中提出了一种基于人工神经网络的分块大小选择方法,用于预测三维Jacobi型stencil循环程序的最优分块。对来源于实际数值模拟软件中的11个stencil循环进行最优分块预测,实验结果显示,在单核串行和多核并行两种场景下,程序使用模型预测分块相比不分块的性能提升分别为2%和35%,与网格搜索方法的分块性能相当,但在线预测时间开销仅约为后者的1/30 000。此外,相比基于静态分析的Turbo-tiling方法,预测最优分块的实测性能平均提升了约9%。
中图分类号:
[1]DATTA K.Auto-tuning stencil codes for cache-based multicore platforms[D].Berkeley:University of California,2009. [2] XU Y T,FU H H,GAN L,et al.Performance Optimization and Analysis for Different Stencil Kernels on Multi-Core and Many-Core Architectures[C]//2013 CCF National Annual Conference on High Performance Computing.2013:628-637. [3]LIU S,WU W G,ZHAO B,et al.Loop Tiling for Optimization of Locality and Parallelism[J].Journal of Computer Research and Development,2015,52(5):1160-1176. [4]WOLF M,LAM M.A data Locality Optimizing Algorithm[C]//Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation.1991:30-44. [5]LAM M S.The cache preformance and optimizations of Blocked Argorithm[J].ACM SIGPLAN Notices,1991(4):63-74. [6]MITCHELL N,HOGSTEDT K,CARTER L,et al.Quantifying the Multi-Level Nature of Tiling Interactions[J].International Journal of Parallel Programming,1998,26(6):641-670. [7]SARKAR V,MEGIDDO N.An analytical model for loop tiling and its solution[C]//2000 IEEE International Symposium on Performance Analysis of Systems and Software.ISPASS(Cat.No.00EX422).IEEE,2000:146-153. [8]HSU C H,KREMER U.A quantitative analysis of tile size selection algorithms[J].The Journal of Supercomputing,2004,27(3):279-294. [9]LIU J,ZHANG Y,DING W,et al.On-chip cache hierarchy-aware tile scheduling for multicore machines[C]//International Symposium on Code Generation and Optimization(CGO 2011).IEEE,2011:161-170. [10]MEHTA S,BEERAKA G,YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,10(4):1-27. [11]MEHTA S,GARG R,TRIVEDI N,et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]//Proceedings of the 2016 International Conference on Supercompu-ting.2016:1-12. [12]SATO Y,YUKI T,ENDO T.An autotuning framework forscalable execution of tiled code via iterative polyhedral compilation[J].ACM Transactions on Architecture and Code Optimization(TACO),2019,15(4):1-23. [13]ZHU Y,PANG J M,XU J L,et al.Adaptive Tiling Size Algo-rithm for 3D Stencil Computation on SW26010 Many-core Processor[J].Journal of Computer Science,2021,48(6):10-18. [14]KISUKI T,KNIJNENBURG P M W,O'BOYLE M F P.Combined selection of tile sizes and unroll factors using iterative compilation[C]//Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques(Cat.No.PR00622).IEEE,2000:237-246. [15]WHALEY R C,PETITET A,DONGARRA J J.Automated em-pirical optimizations of software and the ATLAS project[J].Parallel Computing,2001,27(1/2):3-35. [16]KNIJNENBURG P M W,KISUKI T,GALLIVAN K,et al.The effect of cache models on iterative compilation for combined ti-ling and unrolling[J].Concurrency and Computation:Practice and Experience,2004,16(2/3):247-270. [17]CHEN C,CHAME J,HALL M.Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy[C]//International Symposium on Code Generation and Optimization.IEEE,2005:111-122. [18]YOTOV K,PINGALI K,STODGHILL P.Think globally,search locally[C]//Proceedings of the 19th Annual Interna-tional Conference on Supercomputing.2005:141-150. [19]TIWARI A,CHEN C,CHAME J,et al.A scalable auto-tuning framework for compiler optimization[C]//2009 IEEE International Symposium on Parallel & Distributed Processing.IEEE,2009:1-12. [20]GOTO K,VAN DE GEIJN R.High-performance implementa-tion of the level-3 BLAS[J].ACM Transactions on Mathematical Software(TOMS),2008,35(1):1-14. [21]SHIRAKO J,SHARMA K,FAUZIA N,et al.Analytical bounds for optimal tile size selection[C]//International Conference on Compiler Construction.Berlin:Springer,2012:101-121. [22]YUKI T,RENGANARAYANAN L,RAJOPADHYE S,et al.Automatic creation of tile size selection models[C]//Procee-dings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2010:190-199. [23]MALIK A M.Optimal tile size selection problem using machine learning[C]//2012 11th International Conference on Machine Learning and Applications.IEEE,2012:275-280. [24]ZHANG P,FANG J,YANG C,et al.Optimizing streaming para-llelism on heterogeneous many-core architectures[J].IEEE Transactions on Parallel and Distributed Systems,2020,31(8):1878-1896. [25]LIU S,CUI Y,JIANG Q,et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing,2018,121:27-41. [26]CHI H Y,CHEN C B.Prediction of Loop Tiling Size Based on Neural Network[J].Journal of Computer Science,2020:47(8):62-70. [27]KAMIL S,HUSBANDS P,OLIKER L,et al.Impact of modern memory subsystems on cache optimizations for stencil computations[C]//Proceedings of the 2005 Workshop on Memory System Performance.2005:36-43. [28]BORCHANI H,VARANDO G,BIELZA C,et al.A survey on multi-output regression[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2015,5(5):216-233. |
[1] | 冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028 |
[2] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[3] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[4] | 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094 |
[5] | 张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203 |
[6] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[7] | 李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208 |
[8] | 赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047 |
[9] | 肖治鸿, 韩晔彤, 邹永攀. 基于多源数据和逻辑推理的行为识别技术研究 Study on Activity Recognition Based on Multi-source Data and Logical Reasoning 计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270 |
[10] | 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮. 一种基于异质模型融合的 Android 终端恶意软件检测方法 Android Malware Detection Method Based on Heterogeneous Model Fusion 计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103 |
[11] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[12] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
[13] | 纪璎芮, 袁良, 张云泉. 红黑Gauss-Seidel Stencil并行性和局部性优化 Parallelization and Locality Optimization for Red-Black Gauss-Seidel Stencil 计算机科学, 2022, 49(5): 363-370. https://doi.org/10.11896/jsjkx.220100119 |
[14] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[15] | 李野, 陈松灿. 基于物理信息的神经网络:最新进展与展望 Physics-informed Neural Networks:Recent Advances and Prospects 计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158 |
|