计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 186-194.doi: 10.11896/jsjkx.241100130
韩林1,2, 丁永强1, 崔平非1, 刘浩浩2, 李浩然2, 陈梦尧2
HAN Lin1,2, DING Yongqiang1, CUI Pingfei1, LIU Haohao2, LI Haoran2, CHEN Mengyao2
摘要: 自动向量化作为发掘数据级并行性、提升程序性能的重要方式,被广泛应用于主流编译器中。超字级向量化(Superword-Level Parallelism,SLP)专注于发掘相邻同构语句级别的数据并行性检测并聚合标量指令生成向量指令。然而,传统的SLP框架在发掘跨基本块的语句向量化时能力不足,特别是当连续的可向量化指令被基本块边界分割时,SLP分析无法有效发掘潜在的向量化语句。针对这一问题,提出了一种基于区域划分的跨基本块SLP向量化方法。该方法通过扩大分析范围至支配关系内的多个基本块,打破了基本块边界的限制,从而能捕捉更多潜在向量化机会,有效提升了SLP向量化效率。所提出的方法基于GCC10.3.0编译器实现,并挑选SPEC CPU2006测试集中包含相关程序段的测试程序进行了实验。实验结果显示,在SPEC CPU2006测试集挑选的测试程序中,与传统SLP方法相比,所提出的方法可使SPEC CPU2006测试程序加速比最高提升12%,相关测试程序的平均加速比提升8%,在polybench测试中获得了平均3%的加速比,其有效性得到验证。该工作可为提升GCC编译中SLP向量化效率提供技术参考。
中图分类号:
[1]GAO W,ZHAO R C,HAN L,et al.Research on SIMD auto-vectorization compiling optimization[J].Ruan Jian Xue Bao/Journal of Software,2015,26(6):1265-1284. [2]FENG J G,HE Y P,TAO Q M.Auto-vectorization:Recent de-velopment and prospect[J].Journal on Communications,2022,43(3):180-119. [3]LIU H H,HAN L,CUI P F.Insufficient SLP in GCC[J].Computer Systems & Applications,2022,31(9):265-271. [4]VENKATESAN A,BANERJEE K,BHATTACHARJEE A,et al.Deep learning inference on ARM:A survey of compute li-braries and quantization techniques[J].ACM Transactions on Embedded Computing Systems,2020,19(1). [5]HAN S,MAO H,DALLY W J.Neural network accelerationwith efficient floating-point SIMD on FPGAs[C]//2016 IEEE International Solid-State Circuits Conference.IEEE,2016:122-123. [6]NVIDIA Corporation.Tensor Cores enable high-performanceFP16 inference on NVIDIA Volta GPUs[EB/OL].https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tensor-core-whitepaper.pdf. [7]AMIRI H,SHAHBAHRAMI A.SIMD programming using Intel vector extensions[J].Journal of Parallel and Distributed Computing,2020,135:83-100. [8]STOJANOV A,TOSKOV I,ROMPF T,et al.SIMD intrinsics on managed language runtimes[C]//Proceedings of the 2018 International Symposium on Code Generation and Optimization.2018:2-15. [9]LI J N,HAN L,CHAI G D.Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors[J].Computer Engineering,2022,48(1):142-148. [10]NUZMAN D,ZAKS A.Outer-loop vectorization-revisited forshort SIMD architectures[C]//Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.2008. [11]HE T.An overview of compilation and optimization of automatic vector quantization based on data leve[J].Intelligent Computer and Application,2016,6(6):68-71. [12]LARSEN S,AMARASINGHE S.Exploiting superword levelparallelism with multimedia instruction sets[J].Programming Language Design and Implementation,2000,35(5):145-156. [13]ZHAO J,ZHAO R C.Identifying superword level parallelism with directed graph reachability[J].Scientia Sinica(Informationis),2017,47:310-325. [14]PORPODAS V,MAGNI A,JONES T M.PSLP:Padded SLP automatic vectorization[C]//Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2015:190-201. [15]FENG J,HE Y,TAO Q,et al.An SLP Vectorization MethodBased on Equivalent Extended Transformation[J].Wireless Communications and Mobile Computing,2022,2022(1):1832522. [16]FENG J G,HE Y P,TAO Q M,et al.SLP Vectorization MethodBased on Multiple Isomorphic Transformations[J].Journal of Computer Research and Development,2023,60(12):2907-2927. [17]ZHANG S P,WANG D,DING L L,et al.New framework based on SLP[J].Application Research of Computers,2017,34(1):21-26. [18]LI Y Y,XI H X,GAO W,et al.SLP vectorization method based on throttling[J].Application Research of Computer,2018,35(9):2578-2582. [19]XU J L,ZHAO R C,HAN L,et al.SIMD Code Selection Methodfor Inter-Basic-Block[J].Journal of Information Engineering University,2016,17(2):244-249. [20]CHEN Y S,MENDIS C,AMARASINGHE S.All You Need Is Superword-Level Parallelism:Systematic Control-Flow Vectorization with SLP[C]//Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation(PLDI ’22).New York:ACM,2022:301-315. [21]YE Z,JIAO J.Loop Unrolling Based on SLP and Register Pressure Awareness[C]//2024 20th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).2024:1-6. [22]LI J,GAO W,LI Y,et al.An Improved Method for Control Dependency in LLVM[C]//2024 5th International Conference on Intelligent Computing and Human-Computer Interaction(ICHCI).2024:291-294. [23]CHEN M Y,NEI K,LI J N,et al.An SLP automatic vectorization method,apparatus and electronic device:CN202311666914.7[P].2024-03-05. [24]TAYEB H,PAILLAT L,BRAMAS B.Autovesk:AutomaticVectorized Code Generation from Unstructured Static Kernels Using Graph Transformations[J].ACM Transactions on Architecture and Code Optimization,2023,21(1):1-25. |
|