Computer Science ›› 2024, Vol. 51 ›› Issue (12): 110-119.doi: 10.11896/jsjkx.230800106

• High Performance Computing • Previous Articles     Next Articles

Optimisation of Automatic Matrix Multiplication Mixing Accuracy Based on Polyhedral Models

HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, XU Jinchen   

  1. School of Cyberspace Security, University of Information Engineering, Zhengzhou 450001, China
  • Received:2023-08-16 Revised:2024-01-15 Online:2024-12-15 Published:2024-12-10
  • About author:HE Haotian,born in 1997,postgra-duate.His main research interest is high-performance computing.
    XU Jinchen,born in 1987,Ph.D,asso-ciate professor.His main research in-terest is high-performance computing.

Abstract: Mixed precision is a numerical computation technique in computers that improves the efficiency of computation by converting some of the data types in the computation from high precision to low precision.Matrix multiplication has an important and wide application in computer science and mathematics,using mixed precision techniques in matrix multiplication to speed up the computational process is a challenging task.Existing mixed-precision optimisation suffers from several problems,such as high storage overhead,having to be implemented on specific hardware units,limiting the deployment options of models or algorithms and reducing their portability.In the face of the above problems,this paper proposes and implements an automatic mixed-precision code generation tool based on polyhedral models AGMMMPC.By adding the low-precision-by-high-precision-plus-basic-mixed-precision matrix multiplication code generation functionality to the source-source PPCG compiler,and using the precision tuning(PT) algorithm to find high-frequency errors in the basic mixed-precision computation.These points are processed by high precision calculation method,while the rest are processed by the basic mixed precision calculation method,which effectively reduces the error in the basic mixed precision calculation,and realises the automatic generation of source-to-source mixed-precision code for matrix multiplication for the first time.Experiments show that the maximum acceleration ratio of the advanced mixed-precision code generated by AGMMMPC is 1.39 and the geometric mean acceleration ratio is 1.14 on X86 platform with high-precision computation as the benchmark.

Key words: Mixed precision, Matrix multiplication, Polyhedral model, Scheduling transformation, Code Generation

CLC Number: 

  • TP314
[1]FAWZI A,BALOG M,HUANG A.et al.Discovering faster matrix multiplication algorithms with reinforcement learning[J].Nature,2022,610:47-53.
[2]MICIKEVICIUS P,MARANG S,ALBEN J,et al.Mixed precision training for deep neural networks[J].arXiv:1710.03740,2017.
[3]HUANG X,ZHANG J,CHEN T,et al.Mixed-Precision Trai-ning for NLP and Speech Recognition with OpenSeq2Seq[J].ar-Xiv:2010.12895,2018.
[4]VERDOOLAEGE S,CARLOS JUEGA J,COHEN A,et al.Polyhedral Parallel Code Generation for CUDA[J].ACM Transactions on Architecture & Code Optimization,2013,9(4):1-23.
[5]DUGHMI S,XU H F.Algorithmic Bayesian persuasion[C]//Proceedings of the forty-eighth annual ACM symposium on Theory of Computing(STOC’16).Association for Computing Machinery,New York,NY,USA,2016:412-425.
[6]DEMMEL J.Applied Numerical Linear Algebra[M].Tsinghua University Press:Siam,2011.
[7]WANG B Y,PANG J M,XU J L,et al.Matrix multiplication vector code generation based on polyhedral model[J].Computer Science,2022,49(10):44-51.
[8]KOTIPALLI P V,SINGH R,WOOD P,et al.AMPT-GA:Automatic mixed precision floating point tuning for GPU applications[C]//Proceedings of the ACM International Conference on Supercomputing(ICS’19).Phoenix Arizona:ACM,2019:160-170.
[9]SOLOVYEV A,JACOBSEN C,RAKAMARIĆZ,et al.Rigorousestimation of floating-point round-off errors with symbolic taylor expansions[M].Cham:Springer International Publishing,2015:532-550.
[10]DARULOVA E,HORN E,SHARMA S.Sound mixed-precision optimization with rewriting[C]//2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems(ICCPS).IEEE,2018:208-219.
[11]DARULOVA E,KUNCAK V.Towards a compiler for reals[J].ACM Transactions on Programming Languages and Systems(TOPLAS),2017,39(2):1-28.
[12]CHERUBIN S,CATTANEO D,CHIARI M,et al.TAFFO:Tuning assistant for floating to fixed point optimization[J].IEEE Embedded Systems Letters,2019,12(1):5-8.
[13]RUBIO-GONZÁLEZ C,NGUYEN C,NGUYEN H D,et al.Precimonious:Tuning assistant for floating-point precision[C]//Proceedings of the International Conference on High Performance Computing,Networking,Storage and Analysis.IEEE,2013:1-12.
[14]RUBIO-GONZÁLEZ C,NGUYEN C,MEHNE B,et al.Floa-ting-point precision tuning using blame analysis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:1074-1085.
[15]MENON H,LAM M O,OSEI-KUFFUOR D,et al.ADAPT:Algorithmic differentiation applied to floatingpoint precision tuning[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2018:614-626.
[16]HO N M,MANOGARAN E,WONG W F,et al.Efficient floa-ting point precision tuning for approximate computing[C]//2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017:63-68.
[17]LAGUNA I,WOOD P C,SINGH R,et al.Gpumixer:Perfor-mance-driven floating-point tuning for gpu scientific applications[C]//International Conference on High Performance Computing.Cham:Springer,2019:227-246.
[18]FEAUTRIER P,LENGAUER C.Polyhedron model[C]//Proceedings of the Encyclopedia of Parallel Computing.2011:1581-1592.
[19]ZHAO J,LI Y Y,ZHAO R C.Compilation “black magic” based on polyhedral model[J].Journal of Software,2018,29(8):2371-2396.
[20]BONDHUGULA U,HARTONO A,RAMANUJAM J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]//Proceedings of the 29th ACM SIGPLAN ConfErence on Programming Language Design and Implementation(PLDI).ACM Press,2008:101-113.
[21]TRIFUNOVIĆ K,COHEN A,EDELSOHN D,et al.Graphite two years a fter:First lessons learned from real-world polyhe-dral compilation[C]//Proceedings of the 2nd GCC Research Opportunities Workshop(GROW).2010.
[22]GROSSER T,GROESSLINGER A,LENGAUER C.Polly-Performing polyhedral optimizations on a low-level intermediate representation[J].Parallel Processing Letters,2012,22(4):1250010.
[23]KELLY W,PUGH W.A unifying framework for iteration reordering transformations[C]//Proceedings of the IEEE 1st International Conference on Algorithms and Architectures for Parallel Processing(ICAPP’95).1995.
[24]GIRBAL S,VASILACHE N,BASTOUL C,et al.Semi-Auto-matic Composition of Loop Transformations for Deep Paralle-lism and Memory Hierarchies[J].International Journal of Parallel Programming,2006,34(3):261-317.
[25]VERDOOLAEGE S.Counting affine calculator and applications[C]//Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques(IMPACT’11).Charmonix,2011.
[26]GUO H,RUBIO-GONZÁLEZ C.Exploiting community struc-ture for floating-point precision tuning[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.Amsterdam Netherlands:ACM,2018:333-343.
[27]VERDOOLAEGE S,GROSSER T.Polyhedral extraction tool[C]//Proceedings of the 2nd Int’l Workshop on Polyhedra l Compilation Techniques(IMPACT).2012.
[28]VERDOOLAEGE S.ISL:An integer set library for the polyhe-dral model[C]//Proceedings of the ICMS 2010.Berlin,Heidelberg:Springer-Verlag,2010:299-302.
[29]SONG G H,GUO S Z,ZHAO J,et al.Automatic hybrid accuracy optimization for Stencil computing[J].Journal of Software,2023,34(12):5704-5723.
[1] GUO Shuaizhe, GAO Jianhua, JI Weixing. Optimizing Distributed GMRES Algorithm with Mixed Precision [J]. Computer Science, 2024, 51(9): 15-22.
[2] LING Shixiang, YANG Zhibin, ZHOU Yong. Integrated Avionics Software Code Automatic Generation Method for ARINC653 Operating System [J]. Computer Science, 2024, 51(7): 10-21.
[3] LEI Chao, LIU Jiang, SONG Jiawen. Time Cost Model and Optimal Configuration Method for GPU Parallel Computation of Matrix Multiplication [J]. Computer Science, 2024, 51(6A): 230300200-8.
[4] XU Yiran, ZHOU Yu. Prompt Learning Based Parameter-efficient Code Generation [J]. Computer Science, 2024, 51(6): 61-67.
[5] LIAO Qihua, NIE Kai, HAN Lin, CHEN Mengyao, XIE Wenbing. Tile Selection Algorithm Based on Data Locality [J]. Computer Science, 2024, 51(12): 100-109.
[6] HE Haotian, ZHOU Bei, GUO Shaozhong, ZHANG Zuoyan, HAO Jiangwei, JI Liguang, XU Jinchen. Automatic Mixing Precision Optimization for Matrix Multiplication Calculation [J]. Computer Science, 2024, 51(11A): 240300057-10.
[7] PEI Xue, WEI Shuai, SHAO Yangxue, YU Hong, GE Chenyang. Compilation Optimization and Implementation of High-order Cryptographic Operators on FPGA [J]. Computer Science, 2024, 51(11A): 231200184-11.
[8] MO Shangfeng, ZHOU Zhenfen, HU Yonghua, XU Minmin, MAO Chunxian, YUAN Yudi. Transplantation and Optimization of Row-vector-matrix Multiplication in Complex Domain Based on FT-M7002 [J]. Computer Science, 2023, 50(11A): 220900277-6.
[9] ZHU Jian, HU Kai, WANG Jun, LI Jie, YE Yafei, SHI Xiyan. Reliable Smart Contract Automatic Generation Based on Event-B [J]. Computer Science, 2023, 50(10): 343-349.
[10] GAO Xiu-wu, HUANG Liang-ming, JIANG Jun. Optimization Method of Streaming Storage Based on GCC Compiler [J]. Computer Science, 2022, 49(11): 76-82.
[11] WANG Bo-yang, PANG Jian-min, XU Jin-long, ZHAO Jie, TAO Xiao-han, ZHU Yu. Matrix Multiplication Vector Code Generation Based on Polyhedron Model [J]. Computer Science, 2022, 49(10): 44-51.
[12] CHEN Tao, SHU Hui, XIONG Xiao-bing. Study of Universal Shellcode Generation Technology [J]. Computer Science, 2021, 48(4): 288-294.
[13] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[14] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[15] HAN Xiao-dong, GAO Fei, ZHANG Li-wei. Novel Real-time Algorithm for Critical Path of Linear Network Coding [J]. Computer Science, 2020, 47(9): 232-237.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!