计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240300057-10.doi: 10.11896/jsjkx.240300057
何昊天, 周蓓, 郭绍忠, 张作言, 郝江伟, 冀立光, 许瑾晨
HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen
摘要: 针对矩阵乘计算的混合精度优化的实现,极大地提升了矩阵乘计算的性能,但与高精度矩阵乘计算相比,混合精度矩阵乘计算时引入了误差。为有效降低混合精度计算中引入的误差,实现了一个面向矩阵乘计算的自动混合精度工具AMAO。该工具在低精度乘高精度加基础混合精度计算的基础上,通过迭代空间进行划分的精度优化算法将原本的基础混合精度计算按照一定比例划分成两部分计算,一部分用高精度计算,另一部分用基础混合精度计算,并根据该算法实现了混合精度代码自动生成工具。实验表明,与混合精度工具AGMMMPC相比,AMAO生成的混合精度代码性能平均降低5.90%,精度平均提升了49.31%。
中图分类号:
[1]MICIKEVICIUS,PAULIUS,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017. [2]CHERUBIN S,AGOSTA G.Tools for reduced precision computation:a survey[J].ACM Computing Surveys,2021,53(2):1-35. [3]KOTIPALLI PV,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS'19).Phoen ix Arizona:ACM,2019:160-170. [4]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorous estimation of floating-point round-off errors with symbolic taylor expansions[C]//FM 2015:Formal Methods.Cham:Springer International Publishing,2015,9109:532-550. [5]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219. [6]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28. [7]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8. [8]KUM K I,KANG J Y,SUNG W Y.Autoscaler for C:an Optimizing floating-point to integer C program converter for fixed-point digital signal processors[J].IEEE Transactions on Circuits and Systems II:Analog and Digital Signa l Processing,2000,47(9):840-848. [9]RUBIO-GONZÀLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085. [10]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//SC18:International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.IEEE,2018:614-626. [11]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68. [12]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246. [13]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2018. [14]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proc.of the Encyclopedia of Parallel Computing.2011:1581-1592. [15]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].ACM Trans.on Programming Languages and Systems(TOPLAS),2015,37(4):12:1-12:50. [16]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral com-pilation[J].Ruan Jian Xue Bao/Journal of Software,2018,29(8):2371-2396. [17]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proc.of the 29th ACM SIGPLAN Conf.on Programming Language Design and Implementation(PLDI).ACM Press,2008:101 -113. [18]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Trans.on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24. [19]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proc.of the 2nd GCC Research Opportunities Workshop(GROW).2010. [20]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010. [21]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings 1st International Conference on Algorithms and Architectures for Parallel Proces-sing.Brisbane,Qld,Australia:IEEE,1995,1:153-162. [22]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic composition of loop transformations for deep parallelism and memory hierarchies[J].Int'l Journal of Parallel Programming(IJPP),2006,34(3):261-317. [23]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT'11).2011. [24]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proc.of the 2nd Int'l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012. [25]VERDOOLAEGE S.isl:An integer set library for the polyhe-dral model[C]//Proc.of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302. [26]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723. |
|