计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 78-85.doi: 10.11896/jsjkx.230200024
王震1, 聂凯2, 韩林2
WANG Zhen1, NIE Kai2, HAN Lin2
摘要: 自动向量化代价模型是编译器进行自动向量化优化时的重要组成部分,其作用是评估代码在应用向量化转换后能否获得性能提升。当代价模型不准确时,编译器会应用负收益的向量化转换,从而降低程序的执行效率。针对GCC编译器默认代价模型的不精确问题,以Intel Xeon Silver 4214R CPU为平台,提出了一种基于指令MKS的自动向量化代价模型。该模型充分考虑了指令的机器模式、运算类型以及运算强度等,并使用梯度下降算法自动搜索不同指令类型的近似代价。在SPEC2006以及SPEC2017上进行了单线程测试,实验结果表明,该模型能够减少收益评估错误的情况。与默认代价模型生成的向量程序相比,GCC编译器添加MKS代价模型后,在SPEC2006课题上最高获得了4.72%的提速,在SPEC2017课题上最高获得了7.08%的提速。
中图分类号:
[1]JIN Z,LU Z H,LI H Y,et al.Origin of High Performance Compu-ting--Current Status and Developments of Scientific Computing Applications[J].Bulletin of Chinese Academy of Sciences,2019,34(6):625-639. [2]RABENSEIFNER R,HAGER G,JOST G.Hybrid MPI/OpenMPparallel programming on clusters of multi-core SMP nodes[C]//2009 17th Euromicro International Conference on Parallel,Distributed and Network-based Processing.IEEE,2009:427-436. [3]WENDE F,MARSMAN M,ZHAO Z,et al.Porting VASP from MPI to MPI+ OpenMP [SIMD][C]//International Workshop on OpenMP.Cham:Springer,2017:107-122. [4]HUA Z,ZHANG K,LI Y,et al.Visually secure image encryption using adaptive-thresholding sparsification and parallel compressive sensing[J].Signal Processing,2021,183:107998. [5]HAUTANIEMI S,LAAKSO M.High-performance computingin biomedicine[C]//2013 International Conference on High Performance Computing & Simulation(HPCS).IEEE,2013:233-233. [6]TANG Y,WANG C.Performance modeling on DaVinci AI core[J].Journal of Parallel and Distributed Computing,2023,175:134-149. [7]GAO W,ZHAO R C,HAN L,et al.Research on SIMD auto-vectorization compiling optimization[J].Journal of Software,2015,26(6):1265-1284. [8]NUZMAN D,HENDERSON R.Multi-platform auto-vectorization[C]//International Symposium on Code Generation & Optimization.IEEE,2006. [9]Free Software Foundation,Inc.GCC,the GNU compiler collection [EB/OL].(2022-12-23).https://gcc.gnu.org /. [10]TAN H,CHEN H,SHENG L,et al.Modeling and evaluationfor gather/scatter operations in Vector-SIMD architectures[C]//2017 IEEE 28th International Conference on Application-specific Systems,Architectures and Processors(ASAP).IEEE,2017. [11]HARPER III D T,LINEBARGER D A.Conflict-free vector access using a dynamic storage scheme[J].IEEE Transactions on Computers,1991,40(3):276-283. [12]LEATHER H,CUMMINS C.Machine learning in compilers:Past,present and future[C]//2020 Forum for Specification and Design Languages(FDL).IEEE,2020:1-8. [13]ASHOURI A H,KILLIAN W,CAVAZOS J,et al.A survey on compiler autotuning using machine learning[J].ACM Computing Surveys(CSUR),2018,51(5):1-42. [14]SUI Y,FAN X,ZHOU H,et al.Loop-oriented pointer analysis for automatic simd vectorization[J].ACM Transactions on Embedded Computing Systems(TECS),2018,17(2):1-31. [15]FENG J G,HE Y P,TAO Q M.Auto-vectorization:recent development and prospect[J].Journal on Communications,2022,43(3):180-195. [16]NAISHLOS D.Auto vectorization in GCC[C]//Proceedings of the 2004 GCC Developers Summit.2004:105-118. [17]RUDER S.An overview of gradient descent optimization algorithms[J].arXiv:1609.04747,2016. [18]MALEKI S,GAO Y,MJ GARZARÁN,et al.An Evaluation of Vectorizing Compilers[C]//International Conference on Parallel Architectures & Compilation Techniques.IEEE,2015. [19]STOCK K,POUCHET L N,SADAYAPPAN P.Using machine learning to improve automatic vectorization[J].ACM Transactions on Architecture and Code Optimization(TACO),2012,8(4):1-23. [20]POHL A,COSENZA B,JUURLINK B.Vectorization cost mo-deling for NEON,AVX and SVE[J].Performance Evaluation,2020,140:102106. |
|