面向矩阵乘计算的自动混合精度优化

doi:10.11896/jsjkx.240300057

计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300057-10.doi: 10.11896/jsjkx.240300057

• 计算机软件&体系架构 • 上一篇下一篇

面向矩阵乘计算的自动混合精度优化

何昊天, 周蓓, 郭绍忠, 张作言, 郝江伟, 冀立光, 许瑾晨

信息工程大学网络空间安全学院郑州 450001

出版日期:2024-11-16 发布日期:2024-11-13
通讯作者: 许瑾晨(atao728208@126.com)
作者简介:(m18503880251@163.com)

Automatic Mixing Precision Optimization for Matrix Multiplication Calculation

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen

School of Cyberspace Security,University of Information Engineering,Zhengzhou 450001,China

Online:2024-11-16 Published:2024-11-13
About author:HE Haotian,born in 1997,postgra-duate.His main research interest is high-performance computing.
XU Jinchen,born in 1987,Ph.D,asso-ciate professor.His main research in-terest is high-performance computing.

摘要/Abstract

摘要： 针对矩阵乘计算的混合精度优化的实现,极大地提升了矩阵乘计算的性能,但与高精度矩阵乘计算相比,混合精度矩阵乘计算时引入了误差。为有效降低混合精度计算中引入的误差,实现了一个面向矩阵乘计算的自动混合精度工具AMAO。该工具在低精度乘高精度加基础混合精度计算的基础上,通过迭代空间进行划分的精度优化算法将原本的基础混合精度计算按照一定比例划分成两部分计算,一部分用高精度计算,另一部分用基础混合精度计算,并根据该算法实现了混合精度代码自动生成工具。实验表明,与混合精度工具AGMMMPC相比,AMAO生成的混合精度代码性能平均降低5.90%,精度平均提升了49.31%。

关键词: 混合精度, 矩阵乘法, 多面体模型, 调度变换, 代码生成

Abstract: The implementation of mixed-precision optimization for matrix multiplication computation greatly improves the performance of matrix multiplication computation,but at the same time,compared with high-precision matrix multiplication computation,mixed-precision matrix multiplication computation introduces errors.In order to effectively reduce the errors introduced in the mixed-precision computation,this paper implements an automatic mixed-precision tool AMAO for matrix multiplication computation.On the basis of low precision times high precision plus basic mixing accuracy calculation,the tool divides the original basic mixing accuracy calculation into two parts according to a certain proportion through the precision optimization algorithm of iterative space division,one part uses high precision calculation method and the other part uses the basic mixing accuracy calculation method,and realizes the automatic generation tool of mixed accuracy code according to the algorithm.Experiments show that compared with the mixed-precision tool AGMMMPC,the performance of mixed-precision codes generated by AMAO is reduced by 5.90% on average,and the accuracy is improved by 49.31% on average.

Key words: Mixed precision, Matrix multiplication, Polyhedral model, Scheduling transformation, Code generation

中图分类号:

TP314

何昊天, 周蓓, 郭绍忠, 张作言, 郝江伟, 冀立光, 许瑾晨. 面向矩阵乘计算的自动混合精度优化[J]. 计算机科学, 2024, 51(11A): 240300057-10. https://doi.org/10.11896/jsjkx.240300057

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen. Automatic Mixing Precision Optimization for Matrix Multiplication Calculation[J]. Computer Science, 2024, 51(11A): 240300057-10. https://doi.org/10.11896/jsjkx.240300057

参考文献

[1]MICIKEVICIUS,PAULIUS,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017.
[2]CHERUBIN S,AGOSTA G.Tools for reduced precision computation:a survey[J].ACM Computing Surveys,2021,53(2):1-35.
[3]KOTIPALLI PV,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS'19).Phoen ix Arizona:ACM,2019:160-170.
[4]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorous estimation of floating-point round-off errors with symbolic taylor expansions[C]//FM 2015:Formal Methods.Cham:Springer International Publishing,2015,9109:532-550.
[5]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219.
[6]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28.
[7]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8.
[8]KUM K I,KANG J Y,SUNG W Y.Autoscaler for C:an Optimizing floating-point to integer C program converter for fixed-point digital signal processors[J].IEEE Transactions on Circuits and Systems II:Analog and Digital Signa l Processing,2000,47(9):840-848.
[9]RUBIO-GONZÀLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085.
[10]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//SC18:International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.IEEE,2018:614-626.
[11]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68.
[12]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246.
[13]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2018.
[14]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proc.of the Encyclopedia of Parallel Computing.2011:1581-1592.
[15]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].ACM Trans.on Programming Languages and Systems(TOPLAS),2015,37(4):12:1-12:50.
[16]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral com-pilation[J].Ruan Jian Xue Bao/Journal of Software,2018,29(8):2371-2396.
[17]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proc.of the 29th ACM SIGPLAN Conf.on Programming Language Design and Implementation(PLDI).ACM Press,2008:101 -113.
[18]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Trans.on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24.
[19]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proc.of the 2nd GCC Research Opportunities Workshop(GROW).2010.
[20]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010.
[21]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings 1st International Conference on Algorithms and Architectures for Parallel Proces-sing.Brisbane,Qld,Australia:IEEE,1995,1:153-162.
[22]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic composition of loop transformations for deep parallelism and memory hierarchies[J].Int'l Journal of Parallel Programming(IJPP),2006,34(3):261-317.
[23]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT'11).2011.
[24]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proc.of the 2nd Int'l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012.
[25]VERDOOLAEGE S.isl:An integer set library for the polyhe-dral model[C]//Proc.of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302.
[26]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向矩阵乘计算的自动混合精度优化

Automatic Mixing Precision Optimization for Matrix Multiplication Calculation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0