计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 62-70.doi: 10.11896/jsjkx.191200180
所属专题: 高性能计算
池昊宇, 陈长波
CHI Hao-yu, CHEN Chang-bo
摘要: 循环程序的优化一直是程序优化的重点, 循环分块作为一种典型的循环程序优化技术已被广泛地研究和应用。分块大小的选择对循环程序的性能有着重要影响, 分块大小的选择复杂多变且高度依赖程序和硬件。传统的静态分析和启发式经验搜索的人工和时间成本过高, 缺少通用性和可移植性。为此, 考虑使用有良好高维表示特性的神经网络方法来学习程序与硬件复杂交互过程中分块大小与程序性能的隐含关联。从问题规模、循环结构、循环内操作的局部性等方面抽取出一组新的29维特征, 对问题规模为1024~2048的随机大小的6类内核程序(3维循环、2维数据)的数十万行示例进行实验。串行模型(TSS-T6)相比GCC-O2默认优化实现了6.64倍的平均加速比, 相比穷尽搜索实现了98.5%的平均最大可用性能, 相比Pluto默认分块优化实现了平均9.9%的性能提升。并行模型(TSSP-T6-Search)相比OpenMP默认优化实现了2.41倍的平均加速比, 相比穷尽搜索实现了91.7%的平均最大可用性能, 同时与Pluto默认分块并行优化相比得到了平均9%的性能提升。
中图分类号:
[1] KASPERSKY K.Code Optimization:Effective Memory Usage[M].A-List Publishing, 2003. [2] ASHOURI A H, PALERMO G, CAVAZOS J, et al.AutomaticTuning of Compilers Using Machine Learning[M].Springer International Publishing, 2018. [3] WANG Z, O’BOYLE M.Machine learning in compiler optimization[J].Proceedings of the IEEE, 2018, 106(11):1879-1901. [4] ASHOURI A H, KILLIAN W, CAVAZOS J, et al.A Survey onCompiler Autotuning using Machine Learning[J].ACM Computing Surveys, 2018, 51(5):1-42. [5] LIU S, WU W G, ZHAO B, et al.Loop tiling technology for local and parallel optimization[J].Computer research and development, 2015, 52(5):1160-1176. [6] YUKI T, RENGANARAYANAN L, RAJOPADHYE S, et al.Automatic creation of tile size selection models[C]∥Procee-dings of the 8th annual IEEE/ACM international symposium on Code generation and optimization.ACM, 2010:190-199. [7] KISUKI T, KNIJNENBURG P M W, O'BOYLE M F P.Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation[C]∥Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques.Piscataway, NJ:IEEE, 2000:237-246. [8] STEPHENSON M, AMARASINGHE S P.Predicting unrollfactors using supervised classification[C]∥ International Symposium on Code Generation & Optimization.IEEE, 2005. [9] CUMMINS C, PETOUMENOS P, WANG Z, et al.End-to-end deep learning of optimization heuristics[C]∥2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).IEEE, 2017:219-232. [10]ASHOURI A H, MARIANI G, PALERMO G, et al.Cobayn:Compiler autotuning framework using bayesian networks[J].ACM Transactions on Architecture and Code Optimization (TACO), 2016, 13(2):1-25. [11]WANG Z, O’BOYLE M F P.Mapping parallelism to multi-cores:a machine learning based approach[C]∥ACM Sigplan Notices.ACM, 2009:75-84. [12]CASTRO M, GOES L F W, RIBEIRO C P, et al.A machine learning-based approach for thread mapping on transactional memory applications[C]∥2011 18th International Conference on High Performance Computing.IEEE, 2011:1-10. [13]KIM D G, RENGANARAYANAN L, ROSTRON D, et al.Multi-level tiling:M for the price of one[C]∥Proceedings of the 2007 ACM/IEEE conference on Supercomputing(SC’ 07).IEEE, 2007:1-12. [14]MEHTA S, BEERAKA G, YEW P C.Tile size selection revisited[J].ACM Transactions on Architecture and Code Optimization (TACO), 2013, 10(4):1-27. [15]MEHTA S, GARG R, TRIVEDI N, et al.Turbotiling:Leveraging prefetching to boost performance of tiled codes[C]∥Proceedings of the 2016 International Conference on Supercompu-ting.ACM, 2016:1-12. [16]CHAME J, MOON S.A tile selection algorithm for data locality and cache interference[C]∥Proceedings of the 13th Internatio-nal Conference on Supercomputing.ACM, 1999:492-499. [17]COLEMAN S, MCKINLEY K S.Tile size selection using cache organization and data layout[J].ACM SIGPLAN Notices, 1995, 30(6):279-290. [18]YOTOV K, PINGALI K, STODGHILL P.Think globally, search locally[C]∥Proceedings of the 19th annual international conference on Supercomputing.ACM, 2005:141-150. [19]WHALEY R C, DONGARRA J J.Automatically tuned linear algebra software[C]∥Proceedings of the 1998 ACM/IEEE Conference on Supercomputing(SC’98).IEEE, 1998:38-38. [20]WHALEY R C, PETITET A, DONGARRA J J.Automated empirical optimizations of software and the ATLAS project[J].Parallel computing, 2001, 27(1/2):3-35. [21]SHIRAKO J, SHARMA K, FAUZIA N, et al.Analytical bounds for optimal tile size selection[C]∥International Conference on Compiler Construction.Springer, Berlin, Heidelberg, 2012:101-121. [22]FRAGUELA B B, CARMUEJA M G, ANDRADE D.Optimal Tile Size Selection Guided by Analytical Models[C]∥Parallel Computing, 2005:565-572. [23]LIU S, CUI Y, JIANG Q, et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing, 2018, 121:27-41. [24]RAHMAN M, POUCHET L N, SADAYAPPAN P.Neural network assisted tile size selection[C]∥International Workshop on Automatic Performance Tuning.Berkeley, CA:Springer Verlag.2010. [25]MALIK A M.Optimal tile size selection problem using machine learning[C]∥2012 11th International Conference on Machine Learning and Applications.IEEE, 2012, 2:275-280. [26]BENGIO Y, COURVILLE A, GOODFELLOW I J.Deep learning:adaptive computation and machine learning[J].Bengio.A.Courville, 2016, 3:56-69. [27]BONDHUGULA U, HARTONO A, RAMANUJAM J, et al.A practical automatic polyhedral parallelizer and locality optimizer[C]∥Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM, 2008:101-113. [28]ALAN J S.Cache Memories[J].ACM Computing Surveys, 1982, 14(3):473-530. [29]LEVINTHAL D.Performance analysis guide for intel core i7processor and intel xeon 5500 processors[J].Intel Performance Analysis Guide, 2009, 30:18. [30]PATTERSON D A, HENNESSY J L, GOLDBERG D.Computer architecture:a quantitative approach[M]∥Computer Architecture:A Quantitative Approach.Morgan Kaufmann Publishers Inc.2008, 153-162. [31]POUCHET L N.Polybench:The polyhedral benchmark suite[J/OL].http://www.cs.ucla.edu/pouchet/software/polybench, 2012. [32]IPEK E, MCKEE S A, SINGH K, et al.Efficient architectural design space exploration via predictive modeling[J].ACM Transactions on Architecture and Code Optimization (TACO), 2008, 4(4):1-34. [33]LEATHER H, BONILLA E, O’BOYLE M.Automatic feature generation for machine learning based optimizing compilation[C]∥Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization.IEEE Computer Society, 2009:81-91. [34]HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207.0580, 2012. [35]LIU F, LI G, HU X, et al.Research progress of program understanding based on deep learning[J].Computer Research and Development, 2019, 56(8):1605-1620. [36]JIN Z, LIU F, LI G.Program understanding:present and future[J].Journal of Software, 2019, 30(1):110-126. |
[1] | 宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053 |
[2] | 陆浩松, 胡勇华, 王书盈, 周新莲, 李慧祥. 向量DSP的混合资源启发式循环展开因子选择方法研究 Study on Hybrid Resource Heuristic Loop Unrolling Factor Selection Method Based on Vector DSP 计算机科学, 2022, 49(6A): 777-783. https://doi.org/10.11896/jsjkx.210400146 |
[3] | 池昊宇, 陈长波. 基于机器学习的编译器自动调优综述 Survey on Automatic Tuning of Compilers by Machine Learning 计算机科学, 2022, 49(1): 241-251. https://doi.org/10.11896/jsjkx.210100113 |
[4] | 唐镇, 胡勇华, 陆浩松, 王书盈. 基于弱约束指派的DSP寄存器偶对分配算法研究 Research on DSP Register Pairs Allocation Algorithm with Weak Assigning Constraints 计算机科学, 2021, 48(6A): 587-595. https://doi.org/10.11896/jsjkx.200600061 |
[5] | 胡伟方, 陈云, 李颖颖, 商建东. 基于数据重用分析的多面体循环合并策略 Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation 计算机科学, 2021, 48(12): 49-58. https://doi.org/10.11896/jsjkx.210200071 |
[6] | 王鹏, 苏伟, 张久文, 刘映杰, 王臻睿. 基于船舶自动识别系统与人工神经网络的船舶载重预测 Prediction of Vessel Load Based on Vessel Automatic Identification System and Artificial Neural Network 计算机科学, 2020, 47(6A): 49-53. https://doi.org/10.11896/JsJkx.191000074 |
[7] | 徐江峰谭玉龙. 基于机器学习的HBase配置参数优化研究 Research on HBase Configuration Parameter Optimization Based on Machine Learning 计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046 |
[8] | 张经, 杨健, 苏鹏. 语音识别中单音节识别研究综述 Survey of Monosyllable Recognition in Speech Recognition 计算机科学, 2020, 47(11A): 172-174. https://doi.org/10.11896/jsjkx.200200006 |
[9] | 韦建文,许志耿,王丙强,Simon SEE,林新华. 异构集群上的宏基因组聚类优化 Accelerating Gene Clustering on Heterogeneous Clusters 计算机科学, 2017, 44(3): 20-22. https://doi.org/10.11896/j.issn.1002-137X.2017.03.005 |
[10] | 赵章明,冯径,施恩,舒晓村. 带启发信息的蚁群神经网络训练算法 h-ACOR:An ACOR Algorithm with Heuristic Information for Neural Network Training 计算机科学, 2017, 44(11): 284-288. https://doi.org/10.11896/j.issn.1002-137X.2017.11.043 |
[11] | 周奚,薛善良. 基于改进的粗糙集和神经网络的WSN故障诊断 WSN Fault Diagnosis with Improved Rough Set and Neural Network 计算机科学, 2016, 43(Z11): 21-25. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.005 |
[12] | 乔非,葛彦昊. 基于BP神经网络的就业招聘企业客户分类问题研究 Customer Classification Model of Employers by Using BP Neural Networks 计算机科学, 2015, 42(Z11): 1-4. |
[13] | 丁山,宋丽晓. 一种改进的视网膜图像中微小动脉瘤的检测算法 Improved Method of Microaneurysm Detection Algorithm Based on Digital Fundus Images 计算机科学, 2014, 41(12): 269-274. https://doi.org/10.11896/j.issn.1002-137X.2014.12.058 |
[14] | 尚兴宏,尚曦乐,章静,钱焕延. 无线传感器节点的故障诊断研究 Research on Fault Diagnosis of Wireless Sensor Nodes 计算机科学, 2013, 40(Z6): 327-329. |
[15] | 葛红美,徐超,陈念,廖希密. 一种面向嵌入式系统总线的低功耗优化方法 Low Power Optimization Method Oriented to Embedded System’s Bus 计算机科学, 2013, 40(12): 31-36. |
|