计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 44-51.doi: 10.11896/jsjkx.210800247
王博漾1, 庞建民1,2, 徐金龙2, 赵捷2, 陶小涵2, 朱雨2
WANG Bo-yang1, PANG Jian-min1,2, XU Jin-long2, ZHAO Jie2, TAO Xiao-han2, ZHU Yu2
摘要: 矩阵乘法是众多科学计算的核心,而向量化编程是提升其性能的主要手段之一。针对现有的向量化优化往往存在需要手工进行调优以及与硬件结构映射的问题,基于多面体编译器PPCG,在多面体模型中引入向量代码生成框架,提出了基于多面体模型的矩阵乘法向量代码生成框架。通过对矩阵乘法的向量化方案进行收益分析来确定向量化方案,指导应用框架的代码生成,基于该代码生成框架,有利于矩阵乘法的向量化快速优化。选取13个规模在64×64×64到1 024×1 024×1 024之间的矩阵乘法用例进行实验,结果表明,该框架能够正确生成向量化代码,与基础编译器ICC的自动向量化功能相比,应用该框架生成的向量化代码最高获得了5.09倍的加速和3.39倍的平均加速。
中图分类号:
[1]KANG H,KWON H C,KIM D.HPMaX:heterogeneous parallel matrix multiplication using CPUs and GPUs[J].Computing,2020,102(12):2607-2631. [2]HEMEIDA A M,HASSAN S A,ALKHALAF S,et al.Optimi-zing matrix-matrix multiplication on intel'sadvanced vector extensions multicore processor[J].Ain Shams Engineering Journal,2020,11(4):1179-1190. [3]HOSSEIN A,ASADOLLAH S.SIMD programming using Intelvector extensions - ScienceDirect[J].Journal of Parallel and Distributed Computing,2020,135:83-100. [4]FEAUTRIER P,LENGAUER C.Polyhedron model[J/OL].https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_502. [5]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24. [6]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI'08).ACM,2008. [7]GAO W,ZHAO R C,HAN L,et al.Overview of SIMD automa-tic vectorization compilation optimization [J].Journal of Software,2015,26(6):1265-1284. [8]EDAMATSU T,TAKAHASHI D.Accelerating Large Integer Multiplication Using Intel AVX-512IFMA[M].Algorithms and Architectures for Parallel Processing,2020:60-74. [9]STEPHENS N,BILES S,BOETTCHER M,et al.The ARMScalable Vector Extension[J].IEEE Micro,2017,37(2):26-39. [10]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral compilation[J].Journal of Software,2018,29(8):2371-2396. [11]VERDOOLAEGE S,GUELTON S,GROSSER T.ScheduleTrees[J].Geodinamica Acta,2013,25(1/2):86-95. [12]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].Acm Transactions on Programming Languages & Systems,2015,37(4):1-50. [13]KONG M,VERAS R,STOCK K,et al.When polyhedral transformations meet SIMD code generation[C]// Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM,2013. [14]LENGAUER G C.Polly-Performing Polyhedral Optimizations on a Low-Level Intermediate Representation[J].Parallel Processing Letters,2012,22(4):14-53. [15]VERDOOLAEGE S,JANSSENS G.Scheduling for PPCG[EB/OL].https://www.researchgate.net/publication/317826152_Scheduling_for_PPCG. [16]LIU Z,TIAN X.Matrix multiplication vectorization method for multi-core vector processors[J].Chinese Journal of Computers,2018,41(10):2251-2264. [17]LIU G,ZHANG H,MAO R,et al.Optimization of DGEMMFunction for Loongson3B1500 Architecture[J].Small Microcomputer System,2014,35(7):1523-1527. |
[1] | 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立. 基于GPU加速的并行WMD算法 Parallel WMD Algorithm Based on GPU Acceleration 计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213 |
[2] | 胡伟方, 陈云, 李颖颖, 商建东. 基于数据重用分析的多面体循环合并策略 Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation 计算机科学, 2021, 48(12): 49-58. https://doi.org/10.11896/jsjkx.210200071 |
[3] | 杨萍, 王生原. CompCert编译器目标代码生成机制分析 Analysis of Target Code Generation Mechanism of CompCert Compiler 计算机科学, 2020, 47(9): 17-23. https://doi.org/10.11896/jsjkx.200400018 |
[4] | 韩晓冬, 高飞, 张立炜. 适用于线性网络编码关键路径的实时性算法 Novel Real-time Algorithm for Critical Path of Linear Network Coding 计算机科学, 2020, 47(9): 232-237. https://doi.org/10.11896/jsjkx.190800023 |
[5] | 丁嵘, 于千惠. 基于AADL的自主无人系统可成长框架 Growth Framework of Autonomous Unmanned Systems Based on AADL 计算机科学, 2020, 47(12): 87-92. https://doi.org/10.11896/jsjkx.201100173 |
[6] | 徐启泽, 韩文廷, 陈俊仕, 安虹. 众核平台上广度优先搜索算法的优化 Optimization of Breadth-first Search Algorithm Based on Many-core Platform 计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049 |
[7] | 周蓓, 黄永忠, 许瑾晨, 郭绍忠. 向量数学库的向量化方法研究 Study on SIMD Method of Vector Math Library 计算机科学, 2019, 46(1): 320-324. https://doi.org/10.11896/j.issn.1002-137X.2019.01.050 |
[8] | 姚金阳, 赵荣彩, 王琦, 李颖颖. 面向间接数组索引的向量化方法 Vectorization Methods for Indirect Array Index 计算机科学, 2018, 45(9): 220-223. https://doi.org/10.11896/j.issn.1002-137X.2018.09.036 |
[9] | 赵澄, 陈君新, 姚明海. 基于SVM分类器的XSS攻击检测技术 XSS Attack Detection Technology Based on SVM Classifier 计算机科学, 2018, 45(11A): 356-360. |
[10] | 杨飞,马昱春,侯金,徐宁. 基于MPSoC并行调度的矩阵乘法加速算法研究 Research on Acceleration of Matrix Multiplication Based on Parallel Scheduling on MPSoC 计算机科学, 2017, 44(8): 36-41. https://doi.org/10.11896/j.issn.1002-137X.2017.08.007 |
[11] | 吴卫祖,刘利群,谢冬青. 基于神经网络的异构网络向量化表示方法 Vectorized Representation of Heterogeneous Network Based on Neural Networks 计算机科学, 2017, 44(5): 272-275. https://doi.org/10.11896/j.issn.1002-137X.2017.05.049 |
[12] | 郝鑫,郭绍忠. 基于Intel MIC架构的3D有限差分算法优化 Optimization of 3D Finite Difference Algorithm on Intel MIC 计算机科学, 2017, 44(5): 26-32. https://doi.org/10.11896/j.issn.1002-137X.2017.05.005 |
[13] | 韩林,徐金龙,李颖颖,王阳. 面向部分向量化的循环分布及聚合优化 Method of Loop Distribution and Aggregation for Partial Vectorization 计算机科学, 2017, 44(2): 70-74. https://doi.org/10.11896/j.issn.1002-137X.2017.02.008 |
[14] | 陈勇,徐超. 基于符号执行和人机交互的自动向量化方法 Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method 计算机科学, 2016, 43(Z6): 461-466. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.109 |
[15] | 于海宁,韩林,李鹏远. 面向自动向量化的结构体优化 Structure Optimization for Automatic Vectorization 计算机科学, 2016, 43(2): 210-215. https://doi.org/10.11896/j.issn.1002-137X.2016.02.045 |
|