Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241100012-7.doi: 10.11896/jsjkx.241100012

• Computer Software & Architecture • Previous Articles     Next Articles

Speculative Control Flow Vectorization Method for SIMD

HAN Lin1,2, WU Ruofeng1, LIU Haohao2, NIE Kai2, LI Haoran2, CHEN Mengyao2   

  1. 1 College of Cyber Security,Zhongyuan University of Technology,Zhengzhou 451191,Chiha
    2 National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    2024 Major Science and Technology Project of Henan Province 1(241100210100),2024 Henan Province Science and Technology Key Project(242102211094),2022 Major Science and Technology Project of Henan Province 17(221100210600) and 2023 National Key Research and Development Program of China-High Performance Computing Special Project(2023YFB3002505).

Abstract: SIMD automatic vectorization is an important means to give full play to the computing power of processors and improve the performance of applications,but the existence of control flow brings great challenges to automatic vectorization.The traditional control flow quantization method relies on IF transformation technology,but this technology also brings the problem of low efficiency of code execution.Therefore,in order to alleviate this problem,a speculative vectorization method of control flow for SIMD is proposed.The method detects the predicate-related region in vector code,uses the cost model to guide the implementation of the speculative transformation for branch consistency in the region,and eliminates the useless predicate execution at runtime,thus eliminating the problem of low code efficiency caused by redundant computation.The work of this method is based on the current mainstream GCC10.3 compiler.The experiment selected the industry-recognized SPEC CPU 2006 test set topic and the TSVC test set of testing vectorization ability.The results showed that the performance of SPEC2006 test set 481 topic was improved by 10% after using this method.The acceleration ratio of typical TSVC_2 test cases can reach more than 20%.Experimental results on standard test sets show that this method can effectively improve the execution efficiency of GCC compiler’scontrol flow quantization code

Key words: SIMD, GCC, Control flow, Cost model, Speculative vectorization

CLC Number: 

  • TP314
[1]XIN N J,CHEN X C.Extending the vector instr-uction set for high-performance DSP matrix based on GCC[J].Computer Engineering & Science,2012,34(1):57-63.
[2]GAO W,LI Y Y,SUN H H,et al.An improved SIMD Vectorization method for Co-ntrol Flow [J].Journal of Software,2017,28(8):2046-2063.
[3]SRERAMAN N,GOVINDARAJAN R.A Vectorizing Compiler for Multi-media Extensions[J].International Journal of Parallel Programming,2000,28(4):363-400.
[4]LARSEN S,AMARASINGHE S.Exploiting Superword LevelParallelsm with Multimedia Inst ruction Sets[C]//Conference on Programming Language Design and Implementation.2000:145-156
[5]SUN H H,ZHAO R C,GAO W,et al.Quantification of control Flow Direction Based on Conditional Classification [J].Computer Science,2015,42(11):240-247.
[6]SUN H,FEY F,ZHAO J,et al.WCCV:Improvi-ng the vectorization of IF-statements with wa-rpcoherent conditions[C]//Proceedings of the ACM International Conference on Supercomputing.2019:319-329.
[7]LANG H,KIPF A,PASSINGL,et al.Make the most out of your SIMD investments:counter control flow divergence in compiled query pipelines[C]//Proceedings of the 14th International Workshop on Data Management on New Hardware.2018:1-8.
[8]KHORASANI F,GUPTA R,BHUYAN L N.Efficient warpexecution in presence of divergence with collaborative context collection[C]//Proceedings of the 48th International Symposiumon Microarchitecture.2015:204-215.
[9]ALLEN F E,COCKE J.A Catalogue of Optimizing Transformations [M]//Rustin R,ed.Design and Optimization of Compilers.Prentice-Hall,Englewood Cliffs,1972:1-30.
[10]LIU B,LAIRD A,TSANG W H,et al.Combining Run-timeChecks and Compile-time Analysis to Improve Control Flow Auto-Vectorization[C]//Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.2022:439-450.
[11]SUJON M H,WHALEY R C,YI Q.Vectorization past dependent branches through speculation[C]//Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.IEEE,2013:353-362.
[12]FUNG W W L,AAMODT T M.Thread block compaction for efficient SIMT control flow[C]//2011 IEEE 17th International Symposium on High Performance Computer Architecture.IEEE,2011:25-36.
[13]ALLEN J,KENNEDY K,PORTERFIELD C,et al.Conversion of Control Dependence to Data Dependence[C]//Annual Symposium on Principles of Programming Languages.1983:177-189.
[14]SHIN J,HALL M,CHAME J.Superword-level parallelism in the presence of control flow[C]//International Symposium on Code Generation and Optimization.IEEE,2005:165-175.
[15]SHIN J,HALL M W,CHAME J.Evaluating compiler technology for control-flow optimizations for multimedia extension architectures[J].Microprocessors and Microsystems,2009,33(4):235-243.
[16]PRAHARENKA W,PANKRATZ D,DE CARVALHO J P L,et al.Vectorizing divergent control flow with active-lane consolidation on long-vector architectures[J].The Journal of Supercomputing,2022,78(10):12553-12588.
[17]MOLL S,HACk S.Partial control-flow linearization[J].ACM SIGPLAN Notices,2018,53(4):543-556.
[18]SHIN J.Introducing control flow intovectoriz-ed code[C]//16th International Conference on Parallel Architecture and Compilation Techniques(PACT 2007).IEEE,2007:280-291.
[1] XU Jinlong, WANG Gengwu, HAN Lin, NIE Kai, LI Haoran, CHEN Mengyao, LIU Haohao. Research on Parallel Scheduling Strategy Optimization Technology Based on Sunway Compiler [J]. Computer Science, 2025, 52(9): 137-143.
[2] LIU Mengzhen, ZHOU Qinglei, HAN Lin, NIE Kai, LI Haoran, CHEN Mengyao, LIU Haohao. Research on Automatic Vectorization Benefit Evaluation Model Based on Particle SwarmAlgorithm [J]. Computer Science, 2025, 52(7): 248-254.
[3] JIANG Jun, GU Xiaoyang, XU Kunkun, LYU Yongshuai, HUANG Liangming. Design and Research of SIMD Programming Interface for Sunway [J]. Computer Science, 2025, 52(6): 66-73.
[4] LIU Lili, SHAN Zheng, LI Yingying, WU Wenhao, LIU Wenbo. Research on Function Vectorization Technology Based on Directive Statements [J]. Computer Science, 2025, 52(5): 76-82.
[5] WANG Zhen, NIE Kai, HAN Lin. Auto-vectorization Cost Model Based on Instruction MKS [J]. Computer Science, 2024, 51(4): 78-85.
[6] MO Shangfeng, ZHOU Zhenfen, HU Yonghua, XU Minmin, MAO Chunxian, YUAN Yudi. Transplantation and Optimization of Row-vector-matrix Multiplication in Complex Domain Based on FT-M7002 [J]. Computer Science, 2023, 50(11A): 220900277-6.
[7] LIANG Yao, XIE Chun-li, WANG Wen-jie. Code Similarity Measurement Based on Graph Embedding [J]. Computer Science, 2022, 49(11A): 211000186-6.
[8] SHI Rui-heng, ZHU Yun-cong, ZHAO Yi-ru, ZHAO Lei. Semantic Restoration and Automatic Transplant for ROP Exploit Script [J]. Computer Science, 2022, 49(11): 49-54.
[9] GAO Xiu-wu, HUANG Liang-ming, JIANG Jun. Optimization Method of Streaming Storage Based on GCC Compiler [J]. Computer Science, 2022, 49(11): 76-82.
[10] YAO Jian-yu, ZHANG Yi-wei, ZHANG Guang-ting, JIA Hai-peng. High Performance Implementation and Optimization of Trigonometric Functions Based on SIMD [J]. Computer Science, 2021, 48(12): 29-35.
[11] LI Shuang, ZHAO Rong-cai, WANG Lei. Implementation and Optimization of Sunway1621 General Matrix Multiplication Algorithm [J]. Computer Science, 2021, 48(11A): 699-704.
[12] HAN Lei, HU Jian-peng. Deduplication Algorithm of Abstract Syntax Tree in GCC Based on Trie Tree of Keywords [J]. Computer Science, 2020, 47(9): 47-51.
[13] YANG Hao-ran, FANG Xian-wen. Business Process Consistency Analysis of Petri Net Based on Probability and Time Factor [J]. Computer Science, 2020, 47(5): 59-63.
[14] GONG Tong-yan,ZHANG Guang-ting,JIA Hai-peng,YUAN Liang. High-performance Implementation Method for Even Basis of Cooley-Tukey FFT [J]. Computer Science, 2020, 47(1): 31-39.
[15] SIDIKE Pa-erhatijiang, MA Jian-feng, SUN Cong. Fine-grained Control Flow Integrity Method on Binaries [J]. Computer Science, 2019, 46(11A): 417-420.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!