计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 110-119.doi: 10.11896/jsjkx.230800106

• 高性能计算 • 上一篇    下一篇

基于多面体模型的矩阵乘法自动混合精度优化

何昊天, 周蓓, 郭绍忠, 张作言, 郝江伟, 许瑾晨   

  1. 信息工程大学网络空间安全学院 郑州 450001
  • 收稿日期:2023-08-16 修回日期:2024-01-15 出版日期:2024-12-15 发布日期:2024-12-10
  • 通讯作者: 许瑾晨(atao728208@126.com)
  • 作者简介:(m18503880251@163.com)

Optimisation of Automatic Matrix Multiplication Mixing Accuracy Based on Polyhedral Models

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, XU Jinchen   

  1. School of Cyberspace Security, University of Information Engineering, Zhengzhou 450001, China
  • Received:2023-08-16 Revised:2024-01-15 Online:2024-12-15 Published:2024-12-10
  • About author:HE Haotian,born in 1997,postgra-duate.His main research interest is high-performance computing.
    XU Jinchen,born in 1987,Ph.D,asso-ciate professor.His main research in-terest is high-performance computing.

摘要: 混合精度是计算机中的一种数值计算技术,通过将计算中的部分数据类型从高精度转换成低精度来提高计算效率。矩阵乘法在计算机科学和数学中有着重要而广泛的应用,在矩阵乘法中使用混合精度技术来加速计算过程是一项很有挑战性的工作。现有的混合精度优化存在一些问题,例如存储开销大,必须在特定的硬件单元上实现,限制了模型或算法的部署选项并降低了其可移植性。针对上述问题,提出并实现了基于多面体模型的混合精度代码自动生成工具AGMMMPC。通过将低精度乘高精度加基础混合精度矩阵乘代码生成功能添加到“源-源”的PPCG编译器中,并使用精度调优算法(Precision Tuning,PT)找到基础混合精度计算中的高频误差点,将这些点用高精度计算,其余点用基础混合精度计算,有效减小基础混合精度计算中的误差,首次实现了源到源的面向矩阵乘计算的混合精度代码自动生成。实验表明,以高精度计算为基准,AGMMMPC生成的高级混合精度代码在X86平台上的最大加速比为1.39,几何平均加速比为1.14。

关键词: 混合精度, 矩阵乘法, 多面体模型, 调度变换, 代码生成

Abstract: Mixed precision is a numerical computation technique in computers that improves the efficiency of computation by converting some of the data types in the computation from high precision to low precision.Matrix multiplication has an important and wide application in computer science and mathematics,using mixed precision techniques in matrix multiplication to speed up the computational process is a challenging task.Existing mixed-precision optimisation suffers from several problems,such as high storage overhead,having to be implemented on specific hardware units,limiting the deployment options of models or algorithms and reducing their portability.In the face of the above problems,this paper proposes and implements an automatic mixed-precision code generation tool based on polyhedral models AGMMMPC.By adding the low-precision-by-high-precision-plus-basic-mixed-precision matrix multiplication code generation functionality to the source-source PPCG compiler,and using the precision tuning(PT) algorithm to find high-frequency errors in the basic mixed-precision computation.These points are processed by high precision calculation method,while the rest are processed by the basic mixed precision calculation method,which effectively reduces the error in the basic mixed precision calculation,and realises the automatic generation of source-to-source mixed-precision code for matrix multiplication for the first time.Experiments show that the maximum acceleration ratio of the advanced mixed-precision code generated by AGMMMPC is 1.39 and the geometric mean acceleration ratio is 1.14 on X86 platform with high-precision computation as the benchmark.

Key words: Mixed precision, Matrix multiplication, Polyhedral model, Scheduling transformation, Code Generation

中图分类号: 

  • TP314
[1]FAWZI A,BALOG M,HUANG A.et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610:47-53.
[2]MICIKEVICIUS P,MARANG S,ALBEN J,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017.
[3]HUANG X,ZHANG J,CHEN T,et al.Mixed-Precision Trai-ning for NLP and Speech Recognition with OpenSeq2Seq[J].ar-Xiv:2010.12895,2018.
[4]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Polyhedral Parallel Code Generation for CUDA[J].ACM Transactions on Architecture & Code Optimization,2013,9(4):1-23.
[5]DUGHMI S,XU H F.Algorithmic Bayesian persuasion[C]//Proceedings of the forty-eighth annual ACM symposium on Theory of Computing(STOC’16).Association for Computing Machinery,New York,NY,USA,2016:412-425.
[6]DEMMEL J.Applied Numerical Linear Algebra[M].Tsinghua University Press:Siam,2011.
[7]WANG B Y,PANG J M,XU J L,et al.Matrix multiplication vector code generation based on polyhedral model[J].Computer Science,2022,49(10):44-51.
[8]KOTIPALLI P V,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS’19).Phoenix Arizona:ACM,2019:160-170.
[9]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorousestimation of floating-point round-off errors with symbolic taylor expansions[M].Cham:Springer International Publishing,2015:532-550.
[10]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219.
[11]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28.
[12]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8.
[13]RUBIO-GONZÁLEZ C,NGUYEN C,NGUYEN H D,et al.Precimonious:Tuning assistant for floating-point precision[C]//Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis.IEEE,2013:1-12.
[14]RUBIO-GONZÁLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085.
[15]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:614-626.
[16]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68.
[17]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246.
[18]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proceedings of the Encyclopedia of Parallel Computing.2011:1581-1592.
[19]ZHAO J,LI Y Y,ZHAO R C.Compilation “black magic” based on polyhedral model[J].Journal of Software,2018,29(8):2371-2396.
[20]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 29th ACM SIGPLAN ConfErence on Programming Language Design and Implementation(PLDI).ACM Press,2008:101-113.
[21]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proceedings of the 2nd GCC Research Opportunities Workshop(GROW).2010.
[22]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010.
[23]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings of the IEEE 1st International Conference on Algorithms and Architectures for Parallel Processing(ICAPP’95).1995.
[24]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic Composition of Loop Transformations for Deep Paralle-lism and Memory Hierarchies[J].International Journal of Parallel Programming,2006,34(3):261-317.
[25]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT’11).Charmonix,2011.
[26]GUO H,RUBIO-GONZÁLEZ C.Exploiting community struc-ture for floating-point precision tuning[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.Amsterdam Netherlands:ACM,2018:333-343.
[27]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proceedings of the 2nd Int’l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012.
[28]VERDOOLAEGE S.ISL:An integer set library for the polyhe-dral model[C]//Proceedings of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302.
[29]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!