计算机科学 ›› 2014, Vol. 41 ›› Issue (5): 27-32.doi: 10.11896/j.issn.1002-137X.2014.05.006
侯永生,赵荣彩,黄磊,韩林
HOU Yong-sheng,ZHAO Rong-cai,HUANG Lei and HAN Lin
摘要: 高性能微处理器中普遍采用SIMD向量扩展作为计算加速部件。在深入研究SIMD扩展部件数据依赖关系约束条件的基础上,提出一种基于依赖关系逆向图的Tarjan扩展算法,提高了SIMD并行性识别率,并结合传统向量化方法,实现了面向SIMD扩展部件的循环优化技术,消除了不可向量化语句对可向量化语句在数据重组中不必要的开销。实际程序测试结果显示,其在基于依赖关系的SIMD并行性判定方面优于ICC编译器,经过循环优化后,最终生成的SIMD代码其执行效率平均提高了12%。
[1] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Conference on Programming Language Design and Implementation.Vancouver,BCCanada,June 2000:145-156 [2] Intel.IA-32Intel Architecture Optimization Reference Manual.http://www.intel.com/products/processor/manuals/index.htm,2005 [3] Franchetti F,Kral S,Lorenz J,et al.Efficientutilization of SIMD extensions[J].Proc.IEEE,2005,3(2):409-425 [4] Al-Shayka O,Chen S,Mattavelli M.Introduction to the special issue on multimedia implementation[J].IEEE Transactions on Circuits and System for Video Technology,2002,2(8):629-632 [5] Wu Peng,Eichenberger A E,Wang A.Efficient SIMD CodeGeneration for Runtime Alignment and Length Conversion[C]∥Proceedings of the International Symposium on Code Generation and Optimization.March 2005:153-164 [6] Wu Peng,Eichenberger A E,Wang A,et al.An integrated simdization framework using virtual vectors[C]∥Proceedings of the 19th Annual International Conference on Supercomputing.June 2005:169-178 [7] Kennedy K,Allen J R.Optimizing compilers for modern archi-tectures:a dependence-based approach[M].San Francisco CA,USA:Morgan Kaufmann Publishers Inc,2002 [8] Bacon D F,Graham S L,Sharp O J.Compiler Transformations for High-Performance Computing[J].ACM Computing Surveys,1994,26(4) [9] Tarjan R E.Depth-first search and linear graph algorithms[J].SIAM Journal on Computing,1972,1(2):146-160 [10] Fournier J J A,Moore S W.A vector approach to cryptography implementation[M]∥Digital Rights Management.Technologyies,issues,challenges and systems.Springer Berlin Heidlberg,2005:277-297 [11] Franchetti F.A portable short vector version of FFTW [C]∥Proc.Fourth IMACS Symposium on Mathematical Modeling (MATHMOD 2003).2003,2:1539-1548 [12] Slingerland N T,Smith A J.Measuring instruction sets for ge-neral purpose microprocessors:A survey [R].CSD-00-1122.University of California at Berkeley Technical Report,2000 [13] Shin J,Hall M,Chame J.Superword-level parallelism in thepresence of control flow[C]∥International Symposium on Code Generation and Optimization.2005:165-175 [14] Feld D,Soddemann T,Jünger M,et al.Facilitate SIMD-Code-Generation in the Polyhedral Model by Hardware-aware Automatic Code-Transformation[C]∥Proceeding of the 3rd International Workshop on Polyhedral Compilation Techniques.2013:45-54 [15] Eichenberger A,Wu P,O’Brien K.Vectorization for simd architectures with alignment constraints[C]∥PLDI.2004 [16] Nuzman D,Rosen I,Zaks A.Auto-vectorization of interleaveddata for simd[C]∥PLDI.2006 [17] Kong M,Veras R,Stock K,et al.Polyhedral TransformationsMeet SIMD Code Generation[C]∥PLID.I2013 [18] Nuzman D,Zaks A.Outer-loop vectorization:revisited for short simd architectures[C]∥PACT.2008 [19] Chen C,Chame J,Hall M.Chill:A framework for composinghigh-level loop transformations[R].Technical Report 08-897.USC Computer Science Technical Report.2008 [20] Trifunovic K,Nuzman D,Cohen A,et al.Polyhedral-modelguided loop-nest auto-vectorization[C]∥PACT.Sept.2009 [21] Vasilache N,Meister B,Baskaran M,et al.Joint scheduling and layout optimization to enable multi-level vectorization[C]∥Proc.of IMPACT’12.Jan.2012 |
No related articles found! |
|