计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 44-51.doi: 10.11896/jsjkx.210800247

• 高性能计算* 上一篇    下一篇

基于多面体模型的矩阵乘法向量代码生成

王博漾1, 庞建民1,2, 徐金龙2, 赵捷2, 陶小涵2, 朱雨2   

  1. 1 郑州大学网络空间安全学院 郑州 450000
    2 数学工程与先进计算国家重点实验室(信息工程大学) 郑州 450000
  • 收稿日期:2021-08-27 修回日期:2022-06-02 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 庞建民(jianmin_pang@126.com)
  • 作者简介:(w_boyang1997@163.com)
  • 基金资助:
    国家自然科学基金区域创新发展联合基金(U20A20226)

Matrix Multiplication Vector Code Generation Based on Polyhedron Model

WANG Bo-yang1, PANG Jian-min1,2, XU Jin-long2, ZHAO Jie2, TAO Xiao-han2, ZHU Yu2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450000,China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,PLA Information Engineering University,Zhengzhou 450000, China
  • Received:2021-08-27 Revised:2022-06-02 Online:2022-10-15 Published:2022-10-13
  • About author:WANG Bo-yang,born in 1997,postgra-duate.Her main research interests include high-performance computing and so on.
    PANG Jian-min,born in 1964,professor,Ph.D,is a member of China Computer Federation.His main research interests include high-performance computing and information security.
  • Supported by:
    National Natural Science Foundation Regional Innovation and Development Joint Fund(U20A20226).

摘要: 矩阵乘法是众多科学计算的核心,而向量化编程是提升其性能的主要手段之一。针对现有的向量化优化往往存在需要手工进行调优以及与硬件结构映射的问题,基于多面体编译器PPCG,在多面体模型中引入向量代码生成框架,提出了基于多面体模型的矩阵乘法向量代码生成框架。通过对矩阵乘法的向量化方案进行收益分析来确定向量化方案,指导应用框架的代码生成,基于该代码生成框架,有利于矩阵乘法的向量化快速优化。选取13个规模在64×64×64到1 024×1 024×1 024之间的矩阵乘法用例进行实验,结果表明,该框架能够正确生成向量化代码,与基础编译器ICC的自动向量化功能相比,应用该框架生成的向量化代码最高获得了5.09倍的加速和3.39倍的平均加速。

关键词: 矩阵乘法, 多面体模型, 向量化, 调度变换, 代码生成

Abstract: Matrix multiplication is the core of many scientific calculations,and vectorized programming is one of the main means to improve its performance.In view of the existing vectorization optimization problems that often require manual tuning and need to be mapped to the hardware structure,based on the polyhedron compiler PPCG,a vector code generation framework is introduced into the polyhedron model,and a matrix multiplication vector code generation framework based on the polyhedron model is proposed.Through the profit analysis of the matrix multiplication vectorization program,the vectorization program is determined,and the code generation of the application framework is guided.Based on this framework,it is conducive to the rapid optimization of vectorization of matrix multiplication.Selecting 13 matrix multiplication cases with a scale between 64×64×64 and 1 024×1 024×1 024 for experiments.The results show that the framework can generate vectorized code correctly.Compared with the automatic vectorization of the basic compiler ICC,the vectorized code generated by the framework has a speedup of 5.09 times and an average speedup of 3.39 times.

Key words: Matrix multiplication, Polyhedron model, Vectorized, Scheduling transformation, Code generation

中图分类号: 

  • TP312
[1]KANG H,KWON H C,KIM D.HPMaX:heterogeneous parallel matrix multiplication using CPUs and GPUs[J].Computing,2020,102(12):2607-2631.
[2]HEMEIDA A M,HASSAN S A,ALKHALAF S,et al.Optimi-zing matrix-matrix multiplication on intel'sadvanced vector extensions multicore processor[J].Ain Shams Engineering Journal,2020,11(4):1179-1190.
[3]HOSSEIN A,ASADOLLAH S.SIMD programming using Intelvector extensions - ScienceDirect[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[4]FEAUTRIER P,LENGAUER C.Polyhedron model[J/OL].https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_502.
[5]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24.
[6]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI'08).ACM,2008.
[7]GAO W,ZHAO R C,HAN L,et al.Overview of SIMD automa-tic vectorization compilation optimization [J].Journal of Software,2015,26(6):1265-1284.
[8]EDAMATSU T,TAKAHASHI D.Accelerating Large Integer Multiplication Using Intel AVX-512IFMA[M].Algorithms and Architectures for Parallel Processing,2020:60-74.
[9]STEPHENS N,BILES S,BOETTCHER M,et al.The ARMScalable Vector Extension[J].IEEE Micro,2017,37(2):26-39.
[10]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral compilation[J].Journal of Software,2018,29(8):2371-2396.
[11]VERDOOLAEGE S,GUELTON S,GROSSER T.ScheduleTrees[J].Geodinamica Acta,2013,25(1/2):86-95.
[12]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].Acm Transactions on Programming Languages & Systems,2015,37(4):1-50.
[13]KONG M,VERAS R,STOCK K,et al.When polyhedral transformations meet SIMD code generation[C]// Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM,2013.
[14]LENGAUER G C.Polly-Performing Polyhedral Optimizations on a Low-Level Intermediate Representation[J].Parallel Processing Letters,2012,22(4):14-53.
[15]VERDOOLAEGE S,JANSSENS G.Scheduling for PPCG[EB/OL].https://www.researchgate.net/publication/317826152_Scheduling_for_PPCG.
[16]LIU Z,TIAN X.Matrix multiplication vectorization method for multi-core vector processors[J].Chinese Journal of Computers,2018,41(10):2251-2264.
[17]LIU G,ZHANG H,MAO R,et al.Optimization of DGEMMFunction for Loongson3B1500 Architecture[J].Small Microcomputer System,2014,35(7):1523-1527.
[1] 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立.
基于GPU加速的并行WMD算法
Parallel WMD Algorithm Based on GPU Acceleration
计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213
[2] 胡伟方, 陈云, 李颖颖, 商建东.
基于数据重用分析的多面体循环合并策略
Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation
计算机科学, 2021, 48(12): 49-58. https://doi.org/10.11896/jsjkx.210200071
[3] 杨萍, 王生原.
CompCert编译器目标代码生成机制分析
Analysis of Target Code Generation Mechanism of CompCert Compiler
计算机科学, 2020, 47(9): 17-23. https://doi.org/10.11896/jsjkx.200400018
[4] 韩晓冬, 高飞, 张立炜.
适用于线性网络编码关键路径的实时性算法
Novel Real-time Algorithm for Critical Path of Linear Network Coding
计算机科学, 2020, 47(9): 232-237. https://doi.org/10.11896/jsjkx.190800023
[5] 丁嵘, 于千惠.
基于AADL的自主无人系统可成长框架
Growth Framework of Autonomous Unmanned Systems Based on AADL
计算机科学, 2020, 47(12): 87-92. https://doi.org/10.11896/jsjkx.201100173
[6] 徐启泽, 韩文廷, 陈俊仕, 安虹.
众核平台上广度优先搜索算法的优化
Optimization of Breadth-first Search Algorithm Based on Many-core Platform
计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049
[7] 周蓓, 黄永忠, 许瑾晨, 郭绍忠.
向量数学库的向量化方法研究
Study on SIMD Method of Vector Math Library
计算机科学, 2019, 46(1): 320-324. https://doi.org/10.11896/j.issn.1002-137X.2019.01.050
[8] 姚金阳, 赵荣彩, 王琦, 李颖颖.
面向间接数组索引的向量化方法
Vectorization Methods for Indirect Array Index
计算机科学, 2018, 45(9): 220-223. https://doi.org/10.11896/j.issn.1002-137X.2018.09.036
[9] 赵澄, 陈君新, 姚明海.
基于SVM分类器的XSS攻击检测技术
XSS Attack Detection Technology Based on SVM Classifier
计算机科学, 2018, 45(11A): 356-360.
[10] 杨飞,马昱春,侯金,徐宁.
基于MPSoC并行调度的矩阵乘法加速算法研究
Research on Acceleration of Matrix Multiplication Based on Parallel Scheduling on MPSoC
计算机科学, 2017, 44(8): 36-41. https://doi.org/10.11896/j.issn.1002-137X.2017.08.007
[11] 吴卫祖,刘利群,谢冬青.
基于神经网络的异构网络向量化表示方法
Vectorized Representation of Heterogeneous Network Based on Neural Networks
计算机科学, 2017, 44(5): 272-275. https://doi.org/10.11896/j.issn.1002-137X.2017.05.049
[12] 郝鑫,郭绍忠.
基于Intel MIC架构的3D有限差分算法优化
Optimization of 3D Finite Difference Algorithm on Intel MIC
计算机科学, 2017, 44(5): 26-32. https://doi.org/10.11896/j.issn.1002-137X.2017.05.005
[13] 韩林,徐金龙,李颖颖,王阳.
面向部分向量化的循环分布及聚合优化
Method of Loop Distribution and Aggregation for Partial Vectorization
计算机科学, 2017, 44(2): 70-74. https://doi.org/10.11896/j.issn.1002-137X.2017.02.008
[14] 陈勇,徐超.
基于符号执行和人机交互的自动向量化方法
Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method
计算机科学, 2016, 43(Z6): 461-466. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.109
[15] 于海宁,韩林,李鹏远.
面向自动向量化的结构体优化
Structure Optimization for Automatic Vectorization
计算机科学, 2016, 43(2): 210-215. https://doi.org/10.11896/j.issn.1002-137X.2016.02.045
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!