Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 240300057-10.doi: 10.11896/jsjkx.240300057

• Computer Software & Architecture • Previous Articles     Next Articles

Automatic Mixing Precision Optimization for Matrix Multiplication Calculation

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen   

  1. School of Cyberspace Security,University of Information Engineering,Zhengzhou 450001,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:HE Haotian,born in 1997,postgra-duate.His main research interest is high-performance computing.
    XU Jinchen,born in 1987,Ph.D,asso-ciate professor.His main research in-terest is high-performance computing.

Abstract: The implementation of mixed-precision optimization for matrix multiplication computation greatly improves the performance of matrix multiplication computation,but at the same time,compared with high-precision matrix multiplication computation,mixed-precision matrix multiplication computation introduces errors.In order to effectively reduce the errors introduced in the mixed-precision computation,this paper implements an automatic mixed-precision tool AMAO for matrix multiplication computation.On the basis of low precision times high precision plus basic mixing accuracy calculation,the tool divides the original basic mixing accuracy calculation into two parts according to a certain proportion through the precision optimization algorithm of iterative space division,one part uses high precision calculation method and the other part uses the basic mixing accuracy calculation method,and realizes the automatic generation tool of mixed accuracy code according to the algorithm.Experiments show that compared with the mixed-precision tool AGMMMPC,the performance of mixed-precision codes generated by AMAO is reduced by 5.90% on average,and the accuracy is improved by 49.31% on average.

Key words: Mixed precision, Matrix multiplication, Polyhedral model, Scheduling transformation, Code generation

CLC Number: 

  • TP314
[1]MICIKEVICIUS,PAULIUS,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017.
[2]CHERUBIN S,AGOSTA G.Tools for reduced precision computation:a survey[J].ACM Computing Surveys,2021,53(2):1-35.
[3]KOTIPALLI PV,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS'19).Phoen ix Arizona:ACM,2019:160-170.
[4]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorous estimation of floating-point round-off errors with symbolic taylor expansions[C]//FM 2015:Formal Methods.Cham:Springer International Publishing,2015,9109:532-550.
[5]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219.
[6]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28.
[7]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8.
[8]KUM K I,KANG J Y,SUNG W Y.Autoscaler for C:an Optimizing floating-point to integer C program converter for fixed-point digital signal processors[J].IEEE Transactions on Circuits and Systems II:Analog and Digital Signa l Processing,2000,47(9):840-848.
[9]RUBIO-GONZÀLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085.
[10]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//SC18:International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.IEEE,2018:614-626.
[11]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68.
[12]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246.
[13]MICIKEVICIUS P,NARANG S,ALBEN J,et al.Mixed Precision Training[J].arXiv:1710.03740,2018.
[14]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proc.of the Encyclopedia of Parallel Computing.2011:1581-1592.
[15]GROSSER T,VERDOOLAEGE S,COHEN A.Polyhedral AST generation is more than scanning polyhedra[J].ACM Trans.on Programming Languages and Systems(TOPLAS),2015,37(4):12:1-12:50.
[16]ZHAO J,LI Y Y,ZHAO R C.“Black magic” of polyhedral com-pilation[J].Ruan Jian Xue Bao/Journal of Software,2018,29(8):2371-2396.
[17]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proc.of the 29th ACM SIGPLAN Conf.on Programming Language Design and Implementation(PLDI).ACM Press,2008:101 -113.
[18]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Po-lyhedral parallel code generation for CUDA[J].ACM Trans.on Architecture and Code Optimization(TACO),2013,9(4):54:1-54:24.
[19]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proc.of the 2nd GCC Research Opportunities Workshop(GROW).2010.
[20]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010.
[21]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings 1st International Conference on Algorithms and Architectures for Parallel Proces-sing.Brisbane,Qld,Australia:IEEE,1995,1:153-162.
[22]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic composition of loop transformations for deep parallelism and memory hierarchies[J].Int'l Journal of Parallel Programming(IJPP),2006,34(3):261-317.
[23]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT'11).2011.
[24]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proc.of the 2nd Int'l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012.
[25]VERDOOLAEGE S.isl:An integer set library for the polyhe-dral model[C]//Proc.of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302.
[26]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723.
[1] GUO Shuaizhe, GAO Jianhua, JI Weixing. Optimizing Distributed GMRES Algorithm with Mixed Precision [J]. Computer Science, 2024, 51(9): 15-22.
[2] LING Shixiang, YANG Zhibin, ZHOU Yong. Integrated Avionics Software Code Automatic Generation Method for ARINC653 Operating System [J]. Computer Science, 2024, 51(7): 10-21.
[3] LEI Chao, LIU Jiang, SONG Jiawen. Time Cost Model and Optimal Configuration Method for GPU Parallel Computation of Matrix Multiplication [J]. Computer Science, 2024, 51(6A): 230300200-8.
[4] XU Yiran, ZHOU Yu. Prompt Learning Based Parameter-efficient Code Generation [J]. Computer Science, 2024, 51(6): 61-67.
[5] PEI Xue, WEI Shuai, SHAO Yangxue, YU Hong, GE Chenyang. Compilation Optimization and Implementation of High-order Cryptographic Operators on FPGA [J]. Computer Science, 2024, 51(11A): 231200184-11.
[6] MO Shangfeng, ZHOU Zhenfen, HU Yonghua, XU Minmin, MAO Chunxian, YUAN Yudi. Transplantation and Optimization of Row-vector-matrix Multiplication in Complex Domain Based on FT-M7002 [J]. Computer Science, 2023, 50(11A): 220900277-6.
[7] ZHU Jian, HU Kai, WANG Jun, LI Jie, YE Yafei, SHI Xiyan. Reliable Smart Contract Automatic Generation Based on Event-B [J]. Computer Science, 2023, 50(10): 343-349.
[8] GAO Xiu-wu, HUANG Liang-ming, JIANG Jun. Optimization Method of Streaming Storage Based on GCC Compiler [J]. Computer Science, 2022, 49(11): 76-82.
[9] WANG Bo-yang, PANG Jian-min, XU Jin-long, ZHAO Jie, TAO Xiao-han, ZHU Yu. Matrix Multiplication Vector Code Generation Based on Polyhedron Model [J]. Computer Science, 2022, 49(10): 44-51.
[10] CHEN Tao, SHU Hui, XIONG Xiao-bing. Study of Universal Shellcode Generation Technology [J]. Computer Science, 2021, 48(4): 288-294.
[11] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[12] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[13] HAN Xiao-dong, GAO Fei, ZHANG Li-wei. Novel Real-time Algorithm for Critical Path of Linear Network Coding [J]. Computer Science, 2020, 47(9): 232-237.
[14] YANG Ping, WANG Sheng-yuan. Analysis of Target Code Generation Mechanism of CompCert Compiler [J]. Computer Science, 2020, 47(9): 17-23.
[15] DING Rong, YU Qian-hui. Growth Framework of Autonomous Unmanned Systems Based on AADL [J]. Computer Science, 2020, 47(12): 87-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!