Computer Science ›› 2022, Vol. 49 ›› Issue (10): 44-51.doi: 10.11896/jsjkx.210800247

• High Perfonnance Computing • Previous Articles     Next Articles

Matrix Multiplication Vector Code Generation Based on Polyhedron Model

WANG Bo-yang1, PANG Jian-min1,2, XU Jin-long2, ZHAO Jie2, TAO Xiao-han2, ZHU Yu2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450000,China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,PLA Information Engineering University,Zhengzhou 450000, China
  • Received:2021-08-27 Revised:2022-06-02 Online:2022-10-15 Published:2022-10-13
  • About author:WANG Bo-yang,born in 1997,postgra-duate.Her main research interests include high-performance computing and so on.
    PANG Jian-min,born in 1964,professor,Ph.D,is a member of China Computer Federation.His main research interests include high-performance computing and information security.
  • Supported by:
    National Natural Science Foundation Regional Innovation and Development Joint Fund(U20A20226).

Abstract: Matrix multiplication is the core of many scientific calculations,and vectorized programming is one of the main means to improve its performance.In view of the existing vectorization optimization problems that often require manual tuning and need to be mapped to the hardware structure,based on the polyhedron compiler PPCG,a vector code generation framework is introduced into the polyhedron model,and a matrix multiplication vector code generation framework based on the polyhedron model is proposed.Through the profit analysis of the matrix multiplication vectorization program,the vectorization program is determined,and the code generation of the application framework is guided.Based on this framework,it is conducive to the rapid optimization of vectorization of matrix multiplication.Selecting 13 matrix multiplication cases with a scale between 64×64×64 and 1 024×1 024×1 024 for experiments.The results show that the framework can generate vectorized code correctly.Compared with the automatic vectorization of the basic compiler ICC,the vectorized code generated by the framework has a speedup of 5.09 times and an average speedup of 3.39 times.

Key words: Matrix multiplication, Polyhedron model, Vectorized, Scheduling transformation, Code generation

CLC Number: 

  • TP312
[1]KANG H,KWON H C,KIM D.HPMaX:heterogeneous parallel matrix multiplication using CPUs and GPUs[J].Computing,2020,102(12):2607-2631.
[2]HEMEIDA A M,HASSAN S A,ALKHALAF S,et al.Optimi-zing matrix-matrix multiplication on intel'sadvanced vector extensions multicore processor[J].Ain Shams Engineering Journal,2020,11(4):1179-1190.
[3]HOSSEIN A,ASADOLLAH S.SIMD programming using Intelvector extensions - ScienceDirect[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[4]FEAUTRIER P,LENGAUER C.Polyhedron model[J/OL].
[5]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Transactions on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24.
[6]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI'08).ACM,2008.
[7]GAO W,ZHAO R C,HAN L,et al.Overview of SIMD automa-tic vectorization compilation optimization [J].Journal of Software,2015,26(6):1265-1284.
[8]EDAMATSU T,TAKAHASHI D.Accelerating Large Integer Multiplication Using Intel AVX-512IFMA[M].Algorithms and Architectures for Parallel Processing,2020:60-74.
[9]STEPHENS N,BILES S,BOETTCHER M,et al.The ARMScalable Vector Extension[J].IEEE Micro,2017,37(2):26-39.
[10]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral compilation[J].Journal of Software,2018,29(8):2371-2396.
[11]VERDOOLAEGE S,GUELTON S,GROSSER T.ScheduleTrees[J].Geodinamica Acta,2013,25(1/2):86-95.
[12]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].Acm Transactions on Programming Languages & Systems,2015,37(4):1-50.
[13]KONG M,VERAS R,STOCK K,et al.When polyhedral transformations meet SIMD code generation[C]// Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation.ACM,2013.
[14]LENGAUER G C.Polly-Performing Polyhedral Optimizations on a Low-Level Intermediate Representation[J].Parallel Processing Letters,2012,22(4):14-53.
[16]LIU Z,TIAN X.Matrix multiplication vectorization method for multi-core vector processors[J].Chinese Journal of Computers,2018,41(10):2251-2264.
[17]LIU G,ZHANG H,MAO R,et al.Optimization of DGEMMFunction for Loongson3B1500 Architecture[J].Small Microcomputer System,2014,35(7):1523-1527.
[1] CHEN Tao, SHU Hui, XIONG Xiao-bing. Study of Universal Shellcode Generation Technology [J]. Computer Science, 2021, 48(4): 288-294.
[2] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[3] HAN Xiao-dong, GAO Fei, ZHANG Li-wei. Novel Real-time Algorithm for Critical Path of Linear Network Coding [J]. Computer Science, 2020, 47(9): 232-237.
[4] YANG Ping, WANG Sheng-yuan. Analysis of Target Code Generation Mechanism of CompCert Compiler [J]. Computer Science, 2020, 47(9): 17-23.
[5] DING Rong, YU Qian-hui. Growth Framework of Autonomous Unmanned Systems Based on AADL [J]. Computer Science, 2020, 47(12): 87-92.
[6] YANG Fei, MA Yu-chun, HOU Jin and XU Ning. Research on Acceleration of Matrix Multiplication Based on Parallel Scheduling on MPSoC [J]. Computer Science, 2017, 44(8): 36-41.
[7] WU Wei-zu, LIU Li-qun and XIE Dong-qing. Vectorized Representation of Heterogeneous Network Based on Neural Networks [J]. Computer Science, 2017, 44(5): 272-275.
[8] GAO Wei, ZHAO Rong-cai, YU Hai-ning and ZHANG Qing-hua. Loop Unrolling in Vectorized Programs [J]. Computer Science, 2016, 43(1): 226-231.
[9] LI Peng-yuan, ZHAO Rong-cai, GAO Wei and ZHANG Qing-hua. Effective Vectorization Technique for Interleaved Data with Constant Strides [J]. Computer Science, 2015, 42(5): 194-199.
[10] YIN Meng-jia, XU Xian-bin, XIONG Zeng-gang and ZHANG Tao. Quantitative Performance Analysis Model of Matrix Multiplication Based on GPU [J]. Computer Science, 2015, 42(12): 13-17.
[11] XIANG Lin-hong,CHEN Yu-wen and ZHANG Yu-lin. High Order Matrix Multiplication by MapReduce Algorithm Based on Hadoop Platform [J]. Computer Science, 2013, 40(Z6): 96-98.
[12] ZHANG Li-yong CHEN Ping (Software Engineering Institute, Xidian University, Xi'an 710071, China ). [J]. Computer Science, 2008, 35(5): 284-287.
Full text



No Suggested Reading articles found!