计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 78-85.doi: 10.11896/jsjkx.230200024

• 高性能计算 • 上一篇    下一篇

一种基于指令MKS的自动向量化代价模型

王震1, 聂凯2, 韩林2   

  1. 1 郑州大学计算机与人工智能学院 郑州450000
    2 郑州大学国家超级计算郑州中心 郑州450000
  • 收稿日期:2023-02-04 修回日期:2023-06-13 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 韩林(hanlin@zzu.edu.cn)
  • 基金资助:
    2022年河南省重大科技专项(221100210600);22求是科研启动(自)(32213247)

Auto-vectorization Cost Model Based on Instruction MKS

WANG Zhen1, NIE Kai2, HAN Lin2   

  1. 1 School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China
    2 National Supercomputing Center in Zhengzhou,Zhengzhou University,Zhengzhou 450000,China
  • Received:2023-02-04 Revised:2023-06-13 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    Major Science and Technology Special Projects in Henan Province for 2022(221100210600) and 22 Qiushi Research Initiation(Natural Science)(32213247).

摘要: 自动向量化代价模型是编译器进行自动向量化优化时的重要组成部分,其作用是评估代码在应用向量化转换后能否获得性能提升。当代价模型不准确时,编译器会应用负收益的向量化转换,从而降低程序的执行效率。针对GCC编译器默认代价模型的不精确问题,以Intel Xeon Silver 4214R CPU为平台,提出了一种基于指令MKS的自动向量化代价模型。该模型充分考虑了指令的机器模式、运算类型以及运算强度等,并使用梯度下降算法自动搜索不同指令类型的近似代价。在SPEC2006以及SPEC2017上进行了单线程测试,实验结果表明,该模型能够减少收益评估错误的情况。与默认代价模型生成的向量程序相比,GCC编译器添加MKS代价模型后,在SPEC2006课题上最高获得了4.72%的提速,在SPEC2017课题上最高获得了7.08%的提速。

关键词: GCC编译器, 自动向量化, 代价模型, 收益评估, 梯度下降

Abstract: The auto-vectorization cost model is an important component of compiler's auto-vectorization optimization.Its role is to evaluate whether the code can achieve performance improvement after applying vectorization transformation.When the cost model is inaccurate,the compiler will apply vectorization transformation with negative benefit,thus reducing the execution efficiency of the program.Aiming at the inaccuracy of the default cost model of GCC compiler,based on Intel Xeon Silver 4214R CPU,an auto-vectorization cost model based on instruction MKS is proposed.The model fully considers the machine mode,operation type and operation intensity of instructions,and uses gradient descent algorithm to automatically search the approximate cost of different instruction types.Single-thread tests are carried out on SPEC2006 and SPEC2017.Experimental results show that the model can reduce the error of benefit estimation.Compared with the vector program generated by the default cost model,the GCC compiler,after adding the MKS cost model,achieves a maximum speedup of 4.72% on the SPEC2006 benchmark and 7.08% on the SPEC2017 benchmark.

Key words: GCC compiler, Auto-vectorization, Cost model, Profit evaluation, Gradient descent

中图分类号: 

  • TP314
[1]JIN Z,LU Z H,LI H Y,et al.Origin of High Performance Compu-ting--Current Status and Developments of Scientific Computing Applications[J].Bulletin of Chinese Academy of Sciences,2019,34(6):625-639.
[2]RABENSEIFNER R,HAGER G,JOST G.Hybrid MPI/OpenMPparallel programming on clusters of multi-core SMP nodes[C]//2009 17th Euromicro International Conference on Parallel,Distributed and Network-based Processing.IEEE,2009:427-436.
[3]WENDE F,MARSMAN M,ZHAO Z,et al.Porting VASP from MPI to MPI+ OpenMP [SIMD][C]//International Workshop on OpenMP.Cham:Springer,2017:107-122.
[4]HUA Z,ZHANG K,LI Y,et al.Visually secure image encryption using adaptive-thresholding sparsification and parallel compressive sensing[J].Signal Processing,2021,183:107998.
[5]HAUTANIEMI S,LAAKSO M.High-performance computingin biomedicine[C]//2013 International Conference on High Performance Computing & Simulation(HPCS).IEEE,2013:233-233.
[6]TANG Y,WANG C.Performance modeling on DaVinci AI core[J].Journal of Parallel and Distributed Computing,2023,175:134-149.
[7]GAO W,ZHAO R C,HAN L,et al.Research on SIMD auto-vectorization compiling optimization[J].Journal of Software,2015,26(6):1265-1284.
[8]NUZMAN D,HENDERSON R.Multi-platform auto-vectorization[C]//International Symposium on Code Generation & Optimization.IEEE,2006.
[9]Free Software Foundation,Inc.GCC,the GNU compiler collection [EB/OL].(2022-12-23).https://gcc.gnu.org /.
[10]TAN H,CHEN H,SHENG L,et al.Modeling and evaluationfor gather/scatter operations in Vector-SIMD architectures[C]//2017 IEEE 28th International Conference on Application-specific Systems,Architectures and Processors(ASAP).IEEE,2017.
[11]HARPER III D T,LINEBARGER D A.Conflict-free vector access using a dynamic storage scheme[J].IEEE Transactions on Computers,1991,40(3):276-283.
[12]LEATHER H,CUMMINS C.Machine learning in compilers:Past,present and future[C]//2020 Forum for Specification and Design Languages(FDL).IEEE,2020:1-8.
[13]ASHOURI A H,KILLIAN W,CAVAZOS J,et al.A survey on compiler autotuning using machine learning[J].ACM Computing Surveys(CSUR),2018,51(5):1-42.
[14]SUI Y,FAN X,ZHOU H,et al.Loop-oriented pointer analysis for automatic simd vectorization[J].ACM Transactions on Embedded Computing Systems(TECS),2018,17(2):1-31.
[15]FENG J G,HE Y P,TAO Q M.Auto-vectorization:recent development and prospect[J].Journal on Communications,2022,43(3):180-195.
[16]NAISHLOS D.Auto vectorization in GCC[C]//Proceedings of the 2004 GCC Developers Summit.2004:105-118.
[17]RUDER S.An overview of gradient descent optimization algorithms[J].arXiv:1609.04747,2016.
[18]MALEKI S,GAO Y,MJ GARZARÁN,et al.An Evaluation of Vectorizing Compilers[C]//International Conference on Parallel Architectures & Compilation Techniques.IEEE,2015.
[19]STOCK K,POUCHET L N,SADAYAPPAN P.Using machine learning to improve automatic vectorization[J].ACM Transactions on Architecture and Code Optimization(TACO),2012,8(4):1-23.
[20]POHL A,COSENZA B,JUURLINK B.Vectorization cost mo-deling for NEON,AVX and SVE[J].Performance Evaluation,2020,140:102106.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!