计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 110-119.doi: 10.11896/jsjkx.230800106
何昊天, 周蓓, 郭绍忠, 张作言, 郝江伟, 许瑾晨
HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, XU Jinchen
摘要: 混合精度是计算机中的一种数值计算技术,通过将计算中的部分数据类型从高精度转换成低精度来提高计算效率。矩阵乘法在计算机科学和数学中有着重要而广泛的应用,在矩阵乘法中使用混合精度技术来加速计算过程是一项很有挑战性的工作。现有的混合精度优化存在一些问题,例如存储开销大,必须在特定的硬件单元上实现,限制了模型或算法的部署选项并降低了其可移植性。针对上述问题,提出并实现了基于多面体模型的混合精度代码自动生成工具AGMMMPC。通过将低精度乘高精度加基础混合精度矩阵乘代码生成功能添加到“源-源”的PPCG编译器中,并使用精度调优算法(Precision Tuning,PT)找到基础混合精度计算中的高频误差点,将这些点用高精度计算,其余点用基础混合精度计算,有效减小基础混合精度计算中的误差,首次实现了源到源的面向矩阵乘计算的混合精度代码自动生成。实验表明,以高精度计算为基准,AGMMMPC生成的高级混合精度代码在X86平台上的最大加速比为1.39,几何平均加速比为1.14。
中图分类号:
[1]FAWZI A,BALOG M,HUANG A.et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610:47-53. [2]MICIKEVICIUS P,MARANG S,ALBEN J,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017. [3]HUANG X,ZHANG J,CHEN T,et al.Mixed-Precision Trai-ning for NLP and Speech Recognition with OpenSeq2Seq[J].ar-Xiv:2010.12895,2018. [4]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Polyhedral Parallel Code Generation for CUDA[J].ACM Transactions on Architecture & Code Optimization,2013,9(4):1-23. [5]DUGHMI S,XU H F.Algorithmic Bayesian persuasion[C]//Proceedings of the forty-eighth annual ACM symposium on Theory of Computing(STOC’16).Association for Computing Machinery,New York,NY,USA,2016:412-425. [6]DEMMEL J.Applied Numerical Linear Algebra[M].Tsinghua University Press:Siam,2011. [7]WANG B Y,PANG J M,XU J L,et al.Matrix multiplication vector code generation based on polyhedral model[J].Computer Science,2022,49(10):44-51. [8]KOTIPALLI P V,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS’19).Phoenix Arizona:ACM,2019:160-170. [9]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorousestimation of floating-point round-off errors with symbolic taylor expansions[M].Cham:Springer International Publishing,2015:532-550. [10]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219. [11]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28. [12]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8. [13]RUBIO-GONZÁLEZ C,NGUYEN C,NGUYEN H D,et al.Precimonious:Tuning assistant for floating-point precision[C]//Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis.IEEE,2013:1-12. [14]RUBIO-GONZÁLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085. [15]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:614-626. [16]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68. [17]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246. [18]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proceedings of the Encyclopedia of Parallel Computing.2011:1581-1592. [19]ZHAO J,LI Y Y,ZHAO R C.Compilation “black magic” based on polyhedral model[J].Journal of Software,2018,29(8):2371-2396. [20]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 29th ACM SIGPLAN ConfErence on Programming Language Design and Implementation(PLDI).ACM Press,2008:101-113. [21]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proceedings of the 2nd GCC Research Opportunities Workshop(GROW).2010. [22]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010. [23]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings of the IEEE 1st International Conference on Algorithms and Architectures for Parallel Processing(ICAPP’95).1995. [24]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic Composition of Loop Transformations for Deep Paralle-lism and Memory Hierarchies[J].International Journal of Parallel Programming,2006,34(3):261-317. [25]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT’11).Charmonix,2011. [26]GUO H,RUBIO-GONZÁLEZ C.Exploiting community struc-ture for floating-point precision tuning[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.Amsterdam Netherlands:ACM,2018:333-343. [27]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proceedings of the 2nd Int’l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012. [28]VERDOOLAEGE S.ISL:An integer set library for the polyhe-dral model[C]//Proceedings of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302. [29]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723. |
|