面向SIMD扩展部件的循环优化研究

doi:10.11896/j.issn.1002-137X.2014.05.006

计算机科学 ›› 2014, Vol. 41 ›› Issue (5): 27-32.doi: 10.11896/j.issn.1002-137X.2014.05.006

面向SIMD扩展部件的循环优化研究

侯永生,赵荣彩,黄磊,韩林

数字工程与先进计算国家重点实验室郑州450002;数字工程与先进计算国家重点实验室郑州450002;数字工程与先进计算国家重点实验室郑州450002;数字工程与先进计算国家重点实验室郑州450002

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受“核高基”重大专项“支持国产CPU的编译系统及工具链”分课题“自动并行化与二进制翻译系统”(2009ZX10036-001-001-2)资助

Research on SIMD-oriented Loop Optimizations

HOU Yong-sheng,ZHAO Rong-cai,HUANG Lei and HAN Lin

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 高性能微处理器中普遍采用SIMD向量扩展作为计算加速部件。在深入研究SIMD扩展部件数据依赖关系约束条件的基础上,提出一种基于依赖关系逆向图的Tarjan扩展算法,提高了SIMD并行性识别率,并结合传统向量化方法,实现了面向SIMD扩展部件的循环优化技术,消除了不可向量化语句对可向量化语句在数据重组中不必要的开销。实际程序测试结果显示,其在基于依赖关系的SIMD并行性判定方面优于ICC编译器,经过循环优化后,最终生成的SIMD代码其执行效率平均提高了12%。

关键词: SIMD,依赖关系,循环优化,Tarjan

Abstract: Accelerating program performance via SIMD vector is very common in modern processors.Based on SIMD data dependency constraints,we presented the design and implementation of the improved Tarjan algorithm with reverse dependency graph to increase recognition rate of SIMD parallelization．And in combination with traditional vectorization method,the SIMD-oriented loop optimization technology was proposed that avoids unnecessary overhead in data reorganization．Our experimental results demonstrate that the performance of generated SIMD using the technology code is increased by 12% compared with ICC.

Key words: SIMD,Dependence analysis,Loop optimization,Tarjan

侯永生,赵荣彩,黄磊,韩林. 面向SIMD扩展部件的循环优化研究[J]. 计算机科学, 2014, 41(5): 27-32. https://doi.org/10.11896/j.issn.1002-137X.2014.05.006

HOU Yong-sheng,ZHAO Rong-cai,HUANG Lei and HAN Lin. Research on SIMD-oriented Loop Optimizations[J]. Computer Science, 2014, 41(5): 27-32. https://doi.org/10.11896/j.issn.1002-137X.2014.05.006

参考文献

[1] Larsen S,Amarasinghe S．Exploiting superword level parallelism with multimedia instruction sets[C]∥Conference on Programming Language Design and Implementation．Vancouver,BCCanada,June 2000:145-156
[2] Intel．IA-32Intel Architecture Optimization Reference Manual．http://www.intel.com/products/processor/manuals/index.htm,2005
[3] Franchetti F,Kral S,Lorenz J,et al.Efficientutilization of SIMD extensions[J].Proc．IEEE,2005,3(2):409-425
[4] Al-Shayka O,Chen S,Mattavelli M．Introduction to the special issue on multimedia implementation[J]．IEEE Transactions on Circuits and System for Video Technology,2002,2(8):629-632
[5] Wu Peng,Eichenberger A E,Wang A．Efficient SIMD CodeGeneration for Runtime Alignment and Length Conversion[C]∥Proceedings of the International Symposium on Code Generation and Optimization．March 2005:153-164
[6] Wu Peng,Eichenberger A E,Wang A,et al.An integrated simdization framework using virtual vectors[C]∥Proceedings of the 19th Annual International Conference on Supercomputing．June 2005:169-178
[7] Kennedy K,Allen J R．Optimizing compilers for modern archi-tectures:a dependence-based approach[M].San Francisco CA,USA:Morgan Kaufmann Publishers Inc,2002
[8] Bacon D F,Graham S L,Sharp O J．Compiler Transformations for High-Performance Computing[J].ACM Computing Surveys,1994,26(4)
[9] Tarjan R E.Depth-first search and linear graph algorithms[J]．SIAM Journal on Computing,1972,1(2):146-160
[10] Fournier J J A,Moore S W．A vector approach to cryptography implementation[M]∥Digital Rights Management.Technologyies,issues,challenges and systems.Springer Berlin Heidlberg,2005:277-297
[11] Franchetti F．A portable short vector version of FFTW [C]∥Proc．Fourth IMACS Symposium on Mathematical Modeling (MATHMOD 2003)．2003,2:1539-1548
[12] Slingerland N T,Smith A J．Measuring instruction sets for ge-neral purpose microprocessors:A survey [R]．CSD-00-1122．University of California at Berkeley Technical Report,2000
[13] Shin J,Hall M,Chame J．Superword-level parallelism in thepresence of control flow[C]∥International Symposium on Code Generation and Optimization.2005:165-175
[14] Feld D,Soddemann T,Jünger M,et al.Facilitate SIMD-Code-Generation in the Polyhedral Model by Hardware-aware Automatic Code-Transformation[C]∥Proceeding of the 3rd International Workshop on Polyhedral Compilation Techniques.2013:45-54
[15] Eichenberger A,Wu P,O’Brien K．Vectorization for simd architectures with alignment constraints[C]∥PLDI．2004
[16] Nuzman D,Rosen I,Zaks A．Auto-vectorization of interleaveddata for simd[C]∥PLDI．2006
[17] Kong M,Veras R,Stock K,et al.Polyhedral TransformationsMeet SIMD Code Generation[C]∥PLID．I2013
[18] Nuzman D,Zaks A．Outer-loop vectorization:revisited for short simd architectures[C]∥PACT．2008
[19] Chen C,Chame J,Hall M．Chill:A framework for composinghigh-level loop transformations[R]．Technical Report 08-897．USC Computer Science Technical Report．2008
[20] Trifunovic K,Nuzman D,Cohen A,et al.Polyhedral-modelguided loop-nest auto-vectorization[C]∥PACT．Sept.2009
[21] Vasilache N,Meister B,Baskaran M,et al．Joint scheduling and layout optimization to enable multi-level vectorization[C]∥Proc.of IMPACT’12．Jan．2012

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向SIMD扩展部件的循环优化研究

Research on SIMD-oriented Loop Optimizations

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0