计算机科学 ›› 2025, Vol. 52 ›› Issue (5): 76-82.doi: 10.11896/jsjkx.231200174
刘丽丽1, 单征1, 李颖颖1, 武文浩2, 刘文博1
LIU Lili1, SHAN Zheng1, LI Yingying1, WU Wenhao2, LIU Wenbo1
摘要: 随着处理器技术的不断发展,SIMD(Single Instruction Multiple Data)向量化已经在各个领域得到广泛的应用。然而,过去的研究主要集中在循环和基本块上,而全函数向量化可以更好地利用SIMD指令的优势,从而提高应用程序的性能。文中提出了一种基于指导语句的函数向量化方法。首先,在涉及函数调用的循环上加上一种较为简单的指导语句,即可对循环中涉及函数调用的指令进行向量化。其次,对于被调函数的向量化采用全函数向量化的方式,生成向量化的全函数而不是对其内联。最后,处理循环中的函数调用点,生成向量化的函数调用指令。这种方法可以充分利用SIMD指令的优势,提高应用程序的性能。从ISPC基准测试和SIMD库基准测试中选取了10个基准测试来评估所提方法,实验结果表明该方法与标量相比,平均加速比达到了6.949倍。
中图分类号:
[1]KANDIAH V,LUSTIG D,VILLA O,et al.Parsimony:Enabling SIMD/Vector Programming in Standard Compiler Flows[C]//Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization.2023:186-198. [2]MOLL S,HACK S.Partial control-flow linearization[J].ACM SIGPLAN Notices,2018,53(4):543-556. [3]RAPAPORT G,ZAKS A,BEN-ASHER Y.Streamlining Whole Function Vectorization in C Using Higher Order Vector Semantics[C]//Parallel & Distributed Processing Symposium Workshop.IEEE,2015. [4]TIAN X,SAITO H,GIRKAR M,et al.Compiling C/C++SIMD extensions for function and loop vectorizaion on multicore-SIMD processors[C]//2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.IEEE,2012:2349-2358. [5]MASTEN M,TYURIN E,MITROPOULOU K,et al.Func-tion/Kernel Vectorization via Loop Vectorizer[C]//Workshop on the LLVM Compiler Infrastructure in HPC.2018. [6]KARRENBERG R.Whole-function vectorization[C]//Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2015:141-150. [7]GAO W,ZHAO R,HAN L,et al.SIMD automatic vectorization summary of compiler optimization [J].Journal of Software,2015,26(6):1265-1284. [8]FENG J,HE Y,TAO Q.Automatic vectorization,the recentprogress and future [J].Journal of communication,2022(3):43. [9]LARSEN S,AMARASINGHE S.Exploiting superword levelparallelism with multimedia instruction sets[C]//Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation.New York:ACM Press,2000:145-156. [10]PORPODAS V,ROCHA R C O,GÓES L F W.Look-aheadSLP:Auto-vectorization in the presence of commutative operations[C]//Proceedings of the 2018 International Symposium on Code Generation and Optimization.2018:163-174. [11]PORPODAS V,ROCHA R C O,BREVNOV E,et al.Super-Node SLP:Optimized vectorization for code sequences containing operators and their inverse elements[C]//2019 IEEE/ACM International Symposium on Code Generation and Optimization(CGO).IEEE,2019:206-216. [12]FENG J,HE Y,TAO Q,et al.An SLP Vectorization Method Based on Equivalent Extended Transformation[J/OL].https://onlinelibrary.wiley.com/doi/10.1155/2022/1832522. [13]ALLEN R,KENNEDY K.Automatic translation of Fortranprograms to vector form[J].ACM Transactions on Programming Languages and Systems,1987,9(4):491-542. [14]ALLEN R,KENNEDY K,PORTERFIELD C,et al.Conversion of control dependence to data dependence[C]//Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages.New York:ACM Press,1983:177-189. [15]BIK A J.The Software Vectorization Handbook:Applying Multimedia Extensions for Maximum Performance[M].Intel Press,2004. [16]HAMPTON M,ASANOVIC K.Compiling for vector-thread ar-chitectures[C]//Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization(CGO).2008:205-215. [17]NUZMAN D.loop aware SLP in GCC[C]//GCC Developers Summit.2007. [18]LI Y,GAO Y,WANG D,et al.Optimizations of the WholeFunction Vectorization Based on SIMD Characteristics[C]//Parallel Architecture,Algorithm and Programming:8th International Symposium(PAAP 2017).Haikou,China,Springer Singapore,2017:152-171. |
|