计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241100012-7.doi: 10.11896/jsjkx.241100012
韩林1,2, 吴若枫1, 刘浩浩2, 聂凯2, 李浩然2, 陈梦尧2
HAN Lin1,2, WU Ruofeng1, LIU Haohao2, NIE Kai2, LI Haoran2, CHEN Mengyao2
摘要: SIMD自动向量化是充分发挥处理器计算能力、提升应用程序性能的重要手段,但是控制流的存在给自动向量化带来了极大的挑战。传统的控制流向量化方法依赖于IF转换技术,但此技术也带来了代码执行效率低的问题。因此,为了缓解这一问题,提出了一种面向SIMD的控制流投机向量化方法。该方法在向量代码中检测谓词相关区域,使用代价模型在区域内引导实施针对分支一致的投机变换,在运行时消除无用的谓词执行,从而消除冗余计算导致的代码效率低的问题。该方法基于当前主流的GCC10.3编译器实现,实验选取业界公认的SPEC CPU 2006 测试集课题和测试向量化能力的TSVC测试集,结果显示SPEC2006测试集481课题在使用该方法后性能提升10%,TSVC_2测试部分典型用例的性能提升在20%以上。在标准测试集上进行,结果表明,此方法能够有效提升GCC编译器的控制流向量化代码的执行效率。
中图分类号:
| [1]XIN N J,CHEN X C.Extending the vector instr-uction set for high-performance DSP matrix based on GCC[J].Computer Engineering & Science,2012,34(1):57-63. [2]GAO W,LI Y Y,SUN H H,et al.An improved SIMD Vectorization method for Co-ntrol Flow [J].Journal of Software,2017,28(8):2046-2063. [3]SRERAMAN N,GOVINDARAJAN R.A Vectorizing Compiler for Multi-media Extensions[J].International Journal of Parallel Programming,2000,28(4):363-400. [4]LARSEN S,AMARASINGHE S.Exploiting Superword LevelParallelsm with Multimedia Inst ruction Sets[C]//Conference on Programming Language Design and Implementation.2000:145-156 [5]SUN H H,ZHAO R C,GAO W,et al.Quantification of control Flow Direction Based on Conditional Classification [J].Computer Science,2015,42(11):240-247. [6]SUN H,FEY F,ZHAO J,et al.WCCV:Improvi-ng the vectorization of IF-statements with wa-rpcoherent conditions[C]//Proceedings of the ACM International Conference on Supercomputing.2019:319-329. [7]LANG H,KIPF A,PASSINGL,et al.Make the most out of your SIMD investments:counter control flow divergence in compiled query pipelines[C]//Proceedings of the 14th International Workshop on Data Management on New Hardware.2018:1-8. [8]KHORASANI F,GUPTA R,BHUYAN L N.Efficient warpexecution in presence of divergence with collaborative context collection[C]//Proceedings of the 48th International Symposiumon Microarchitecture.2015:204-215. [9]ALLEN F E,COCKE J.A Catalogue of Optimizing Transformations [M]//Rustin R,ed.Design and Optimization of Compilers.Prentice-Hall,Englewood Cliffs,1972:1-30. [10]LIU B,LAIRD A,TSANG W H,et al.Combining Run-timeChecks and Compile-time Analysis to Improve Control Flow Auto-Vectorization[C]//Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.2022:439-450. [11]SUJON M H,WHALEY R C,YI Q.Vectorization past dependent branches through speculation[C]//Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.IEEE,2013:353-362. [12]FUNG W W L,AAMODT T M.Thread block compaction for efficient SIMT control flow[C]//2011 IEEE 17th International Symposium on High Performance Computer Architecture.IEEE,2011:25-36. [13]ALLEN J,KENNEDY K,PORTERFIELD C,et al.Conversion of Control Dependence to Data Dependence[C]//Annual Symposium on Principles of Programming Languages.1983:177-189. [14]SHIN J,HALL M,CHAME J.Superword-level parallelism in the presence of control flow[C]//International Symposium on Code Generation and Optimization.IEEE,2005:165-175. [15]SHIN J,HALL M W,CHAME J.Evaluating compiler technology for control-flow optimizations for multimedia extension architectures[J].Microprocessors and Microsystems,2009,33(4):235-243. [16]PRAHARENKA W,PANKRATZ D,DE CARVALHO J P L,et al.Vectorizing divergent control flow with active-lane consolidation on long-vector architectures[J].The Journal of Supercomputing,2022,78(10):12553-12588. [17]MOLL S,HACk S.Partial control-flow linearization[J].ACM SIGPLAN Notices,2018,53(4):543-556. [18]SHIN J.Introducing control flow intovectoriz-ed code[C]//16th International Conference on Parallel Architecture and Compilation Techniques(PACT 2007).IEEE,2007:280-291. |
|
||