Computer Science ›› 2025, Vol. 52 ›› Issue (9): 186-194.doi: 10.11896/jsjkx.241100130

• High Performance Computing • Previous Articles     Next Articles

SLP Vectorization Across Basic Blocks Based on Region Partitioning

HAN Lin1,2, DING Yongqiang1, CUI Pingfei1, LIU Haohao2, LI Haoran2, CHEN Mengyao2   

  1. 1 College of Cyber Security,Zhongyuan University of Technology,Zhengzhou 451191,China
    2 National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Received:2024-11-25 Revised:2025-04-15 Online:2025-09-15 Published:2025-09-11
  • About author:HAN Lin,born in 1978,professor,doctoral supervisor,is a member of CCF(No.16416M).His main research interests include high-performance computing,advanced compilation,program optimization and home-grown autonomous control.
    CUI Pingfei,born in 1975,associate professor,master supervisor.His main research interests include domestic so-vereign control,software reverse engineering and code security analysis.
  • Supported by:
    2024 Henan Provincial Science and Technology Tackling Project(242102211094) and 2022 Major Science and Technology Programs in Henan Province 17(221100210600).

Abstract: Automatic vectorization is a key technique in mainstream compilers for uncovering data-level parallelism and enhancing program performance.Traditional SLP vectorization struggles with cross-basic-block statement vectorization,particularly when consecutive vectorizable instructions are split by basic block boundaries,limiting its ability to detect potential vectorization opportunities.To address this,this paper proposes a region-based cross-basic-block SLP vectorization method that extends the analysis scope to multiple basic blocks within dominance relations,effectively breaking basic block boundaries and uncovering more vectori-zation opportunities.Implemented in the GCC 10.3.0 compiler,the proposed method is evaluated using relevant program segments from the SPEC CPU2006 benchmark.Experimental results demonstrate that the proposed method achieves up to a 12% speedup in SPEC CPU2006,an average speedup of 8% for related test programs,and a 3% average speedup in the Polybench benchmark compared to traditional SLP methods,validating its effectiveness.This work provides a technical reference for improving SLP vectorization efficiency in GCC compilers.

Key words: Compilation optimization, Automatic vectorization, SLP, Across basic blocks, Region partitioning

CLC Number: 

  • TP314
[1]GAO W,ZHAO R C,HAN L,et al.Research on SIMD auto-vectorization compiling optimization[J].Ruan Jian Xue Bao/Journal of Software,2015,26(6):1265-1284.
[2]FENG J G,HE Y P,TAO Q M.Auto-vectorization:Recent de-velopment and prospect[J].Journal on Communications,2022,43(3):180-119.
[3]LIU H H,HAN L,CUI P F.Insufficient SLP in GCC[J].Computer Systems & Applications,2022,31(9):265-271.
[4]VENKATESAN A,BANERJEE K,BHATTACHARJEE A,et al.Deep learning inference on ARM:A survey of compute li-braries and quantization techniques[J].ACM Transactions on Embedded Computing Systems,2020,19(1).
[5]HAN S,MAO H,DALLY W J.Neural network accelerationwith efficient floating-point SIMD on FPGAs[C]//2016 IEEE International Solid-State Circuits Conference.IEEE,2016:122-123.
[6]NVIDIA Corporation.Tensor Cores enable high-performanceFP16 inference on NVIDIA Volta GPUs[EB/OL].https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tensor-core-whitepaper.pdf.
[7]AMIRI H,SHAHBAHRAMI A.SIMD programming using Intel vector extensions[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[8]STOJANOV A,TOSKOV I,ROMPF T,et al.SIMD intrinsics on managed language runtimes[C]//Proceedings of the 2018 International Symposium on Code Generation and Optimization.2018:2-15.
[9]LI J N,HAN L,CHAI G D.Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors[J].Computer Engineering,2022,48(1):142-148.
[10]NUZMAN D,ZAKS A.Outer-loop vectorization-revisited forshort SIMD architectures[C]//Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.2008.
[11]HE T.An overview of compilation and optimization of automatic vector quantization based on data leve[J].Intelligent Computer and Application,2016,6(6):68-71.
[12]LARSEN S,AMARASINGHE S.Exploiting superword levelparallelism with multimedia instruction sets[J].Programming Language Design and Implementation,2000,35(5):145-156.
[13]ZHAO J,ZHAO R C.Identifying superword level parallelism with directed graph reachability[J].Scientia Sinica(Informationis),2017,47:310-325.
[14]PORPODAS V,MAGNI A,JONES T M.PSLP:Padded SLP automatic vectorization[C]//Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization.2015:190-201.
[15]FENG J,HE Y,TAO Q,et al.An SLP Vectorization MethodBased on Equivalent Extended Transformation[J].Wireless Communications and Mobile Computing,2022,2022(1):1832522.
[16]FENG J G,HE Y P,TAO Q M,et al.SLP Vectorization MethodBased on Multiple Isomorphic Transformations[J].Journal of Computer Research and Development,2023,60(12):2907-2927.
[17]ZHANG S P,WANG D,DING L L,et al.New framework based on SLP[J].Application Research of Computers,2017,34(1):21-26.
[18]LI Y Y,XI H X,GAO W,et al.SLP vectorization method based on throttling[J].Application Research of Computer,2018,35(9):2578-2582.
[19]XU J L,ZHAO R C,HAN L,et al.SIMD Code Selection Methodfor Inter-Basic-Block[J].Journal of Information Engineering University,2016,17(2):244-249.
[20]CHEN Y S,MENDIS C,AMARASINGHE S.All You Need Is Superword-Level Parallelism:Systematic Control-Flow Vectorization with SLP[C]//Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation(PLDI ’22).New York:ACM,2022:301-315.
[21]YE Z,JIAO J.Loop Unrolling Based on SLP and Register Pressure Awareness[C]//2024 20th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).2024:1-6.
[22]LI J,GAO W,LI Y,et al.An Improved Method for Control Dependency in LLVM[C]//2024 5th International Conference on Intelligent Computing and Human-Computer Interaction(ICHCI).2024:291-294.
[23]CHEN M Y,NEI K,LI J N,et al.An SLP automatic vectorization method,apparatus and electronic device:CN202311666914.7[P].2024-03-05.
[24]TAYEB H,PAILLAT L,BRAMAS B.Autovesk:AutomaticVectorized Code Generation from Unstructured Static Kernels Using Graph Transformations[J].ACM Transactions on Architecture and Code Optimization,2023,21(1):1-25.
[1] LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing. Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs [J]. Computer Science, 2025, 52(6A): 240300156-7.
[2] JIANG Jun, ZHAI Yanhe, ZENG Zhiheng, GU Yichao, HUANG Liangming. Loop-invariant Code Motion Algorithm Based on Loop Cost Analysis [J]. Computer Science, 2025, 52(6): 44-51.
[3] LIU Lili, SHAN Zheng, LI Yingying, WU Wenhao, LIU Wenbo. Research on Function Vectorization Technology Based on Directive Statements [J]. Computer Science, 2025, 52(5): 76-82.
[4] PEI Xue, WEI Shuai, SHAO Yangxue, YU Hong, GE Chenyang. Compilation Optimization and Implementation of High-order Cryptographic Operators on FPGA [J]. Computer Science, 2024, 51(11A): 231200184-11.
[5] FAN Lilin, QIAO Yihang, LI Junfei, CHAI Xuqing, CUI Rongpei, HAN Bingyu. CP2K Software Porting and Optimization Based on Domestic c86 Processor [J]. Computer Science, 2023, 50(6): 58-65.
[6] CHI Hao-yu, CHEN Chang-bo. Prediction of Loop Tiling Size Based on Neural Network [J]. Computer Science, 2020, 47(8): 62-70.
[7] ZHAO Bo,ZHAO Rong-cai,LI Yan-bing and GAO Wei. SLP Exploitation Method for Type Conversion Statements [J]. Computer Science, 2014, 41(11): 16-21.
[8] SUO Wei-yi,ZHAO Rong-cai,YAO Yuan and ZHANG Xiao-mei. SLP Optimization Algorithm Using Across Basic Block Transformation and Loop Distribution [J]. Computer Science, 2013, 40(10): 24-28.
[9] . [J]. Computer Science, 2009, 36(3): 45-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!