计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241100012-7.doi: 10.11896/jsjkx.241100012

• 计算机软件 • 上一篇    下一篇

一种面向SIMD的控制流投机向量化方法

韩林1,2, 吴若枫1, 刘浩浩2, 聂凯2, 李浩然2, 陈梦尧2   

  1. 1 中原工学院网络空间安全学院 郑州 451191
    2 国家超级计算郑州中心 郑州 450001
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 聂凯(ieknie@zzu.edu.cn)
  • 作者简介:674767963@qq.com
  • 基金资助:
    2024河南省重大科技专项1(241100210100);2024河南省科技攻关项目(242102211094);2022河南省重大科技专项17(221100210600);2023国家重点研发计划高性能计算专项(2023YFB3002505)

Speculative Control Flow Vectorization Method for SIMD

HAN Lin1,2, WU Ruofeng1, LIU Haohao2, NIE Kai2, LI Haoran2, CHEN Mengyao2   

  1. 1 College of Cyber Security,Zhongyuan University of Technology,Zhengzhou 451191,Chiha
    2 National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    2024 Major Science and Technology Project of Henan Province 1(241100210100),2024 Henan Province Science and Technology Key Project(242102211094),2022 Major Science and Technology Project of Henan Province 17(221100210600) and 2023 National Key Research and Development Program of China-High Performance Computing Special Project(2023YFB3002505).

摘要: SIMD自动向量化是充分发挥处理器计算能力、提升应用程序性能的重要手段,但是控制流的存在给自动向量化带来了极大的挑战。传统的控制流向量化方法依赖于IF转换技术,但此技术也带来了代码执行效率低的问题。因此,为了缓解这一问题,提出了一种面向SIMD的控制流投机向量化方法。该方法在向量代码中检测谓词相关区域,使用代价模型在区域内引导实施针对分支一致的投机变换,在运行时消除无用的谓词执行,从而消除冗余计算导致的代码效率低的问题。该方法基于当前主流的GCC10.3编译器实现,实验选取业界公认的SPEC CPU 2006 测试集课题和测试向量化能力的TSVC测试集,结果显示SPEC2006测试集481课题在使用该方法后性能提升10%,TSVC_2测试部分典型用例的性能提升在20%以上。在标准测试集上进行,结果表明,此方法能够有效提升GCC编译器的控制流向量化代码的执行效率。

关键词: SIMD, GCC, 控制流, 代价模型, 投机向量化

Abstract: SIMD automatic vectorization is an important means to give full play to the computing power of processors and improve the performance of applications,but the existence of control flow brings great challenges to automatic vectorization.The traditional control flow quantization method relies on IF transformation technology,but this technology also brings the problem of low efficiency of code execution.Therefore,in order to alleviate this problem,a speculative vectorization method of control flow for SIMD is proposed.The method detects the predicate-related region in vector code,uses the cost model to guide the implementation of the speculative transformation for branch consistency in the region,and eliminates the useless predicate execution at runtime,thus eliminating the problem of low code efficiency caused by redundant computation.The work of this method is based on the current mainstream GCC10.3 compiler.The experiment selected the industry-recognized SPEC CPU 2006 test set topic and the TSVC test set of testing vectorization ability.The results showed that the performance of SPEC2006 test set 481 topic was improved by 10% after using this method.The acceleration ratio of typical TSVC_2 test cases can reach more than 20%.Experimental results on standard test sets show that this method can effectively improve the execution efficiency of GCC compiler’scontrol flow quantization code

Key words: SIMD, GCC, Control flow, Cost model, Speculative vectorization

中图分类号: 

  • TP314
[1]XIN N J,CHEN X C.Extending the vector instr-uction set for high-performance DSP matrix based on GCC[J].Computer Engineering & Science,2012,34(1):57-63.
[2]GAO W,LI Y Y,SUN H H,et al.An improved SIMD Vectorization method for Co-ntrol Flow [J].Journal of Software,2017,28(8):2046-2063.
[3]SRERAMAN N,GOVINDARAJAN R.A Vectorizing Compiler for Multi-media Extensions[J].International Journal of Parallel Programming,2000,28(4):363-400.
[4]LARSEN S,AMARASINGHE S.Exploiting Superword LevelParallelsm with Multimedia Inst ruction Sets[C]//Conference on Programming Language Design and Implementation.2000:145-156
[5]SUN H H,ZHAO R C,GAO W,et al.Quantification of control Flow Direction Based on Conditional Classification [J].Computer Science,2015,42(11):240-247.
[6]SUN H,FEY F,ZHAO J,et al.WCCV:Improvi-ng the vectorization of IF-statements with wa-rpcoherent conditions[C]//Proceedings of the ACM International Conference on Supercomputing.2019:319-329.
[7]LANG H,KIPF A,PASSINGL,et al.Make the most out of your SIMD investments:counter control flow divergence in compiled query pipelines[C]//Proceedings of the 14th International Workshop on Data Management on New Hardware.2018:1-8.
[8]KHORASANI F,GUPTA R,BHUYAN L N.Efficient warpexecution in presence of divergence with collaborative context collection[C]//Proceedings of the 48th International Symposiumon Microarchitecture.2015:204-215.
[9]ALLEN F E,COCKE J.A Catalogue of Optimizing Transformations [M]//Rustin R,ed.Design and Optimization of Compilers.Prentice-Hall,Englewood Cliffs,1972:1-30.
[10]LIU B,LAIRD A,TSANG W H,et al.Combining Run-timeChecks and Compile-time Analysis to Improve Control Flow Auto-Vectorization[C]//Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.2022:439-450.
[11]SUJON M H,WHALEY R C,YI Q.Vectorization past dependent branches through speculation[C]//Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.IEEE,2013:353-362.
[12]FUNG W W L,AAMODT T M.Thread block compaction for efficient SIMT control flow[C]//2011 IEEE 17th International Symposium on High Performance Computer Architecture.IEEE,2011:25-36.
[13]ALLEN J,KENNEDY K,PORTERFIELD C,et al.Conversion of Control Dependence to Data Dependence[C]//Annual Symposium on Principles of Programming Languages.1983:177-189.
[14]SHIN J,HALL M,CHAME J.Superword-level parallelism in the presence of control flow[C]//International Symposium on Code Generation and Optimization.IEEE,2005:165-175.
[15]SHIN J,HALL M W,CHAME J.Evaluating compiler technology for control-flow optimizations for multimedia extension architectures[J].Microprocessors and Microsystems,2009,33(4):235-243.
[16]PRAHARENKA W,PANKRATZ D,DE CARVALHO J P L,et al.Vectorizing divergent control flow with active-lane consolidation on long-vector architectures[J].The Journal of Supercomputing,2022,78(10):12553-12588.
[17]MOLL S,HACk S.Partial control-flow linearization[J].ACM SIGPLAN Notices,2018,53(4):543-556.
[18]SHIN J.Introducing control flow intovectoriz-ed code[C]//16th International Conference on Parallel Architecture and Compilation Techniques(PACT 2007).IEEE,2007:280-291.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!