计算机科学 ›› 2013, Vol. 40 ›› Issue (10): 24-28.
索维毅,赵荣彩,姚远,张小妹
SUO Wei-yi,ZHAO Rong-cai,YAO Yuan and ZHANG Xiao-mei
摘要: 现有的SLP优化算法无法处理内层循环中存在的依赖环和归约,并且在基本块边界产生大量的冗余拆包和赋值语句,从而导致向量化效率不高。针对该问题,提出了一种基于跨基本块变换和循环分布的SLP优化算法。该算法以控制流图为基础,根据基本块间各数组变量的Define-Use关系以及跨越基本块之间的数据依赖关系进行跨基本块的向量化变换,有序地采用跨基本块变换和循环分布,尽可能发掘最内层循环基本块内语句的并行性,使SLP自动向量化编译器生成具有更多SIMD指令的向量化代码。实验结果表明,该算法能够隐藏更多跨基本块冗余操作的开销,同时利用跨基本块的数据依赖生成更优的SIMD指令,有效地提高了向量化程序的加速比。
[1] Franchetti F,Kral S,Lorenz J,et al.Efficiem utilization of SIMD extensions[J].Proceedings ofthe IEEE,2005,93(2):409-425 [2] TMS320C6000CPU and Instruction Set Reference Guide(Rev.F)[M].TexasInstruments Inc.2000 [3] SC140DSP Core Reference Manual[R/OL].http:// cache.freescale.com/files/dsp/doc/ref_manual/MNSC140CORE.pdf,2012-05-20 [4] Fridman J,Greenfield Z.The Tiger SHARC DSP Architecture[J].IEEE Micro,2000,0(1):66-76 [5] Tanaka H,Ota Y,Matsumoto N,et al.A New CompilationTechnique for SIMD Code Generation Across Basic Block Boundaries[C]∥Design Automation Conference(ASP-DAC),201015th Asia and South Pacifi.Jan.2010:101-106 [6] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Proc of the ACM SIGPLAN Conference on Programming Language Design and Implementation.June 2000:145-156 [7] Shin J,Hall M,Chame J.Superword-level Parallelism in thePresence of Control Flow[C]∥Proc.of the International Symposium on Code Generation and optimization.March 2005:165-175 [8] Nuzman D,Zaks A.Outer-loop vectorization:revisited for short simd architectures[C]∥Proceedings of the 17th international conference on parallel architectures and compilation techniques,PACT ’08.New York,NY,USA,ACM,2008:2-11 [9] Aho A V,Lam M S,Sethi R,et al.编译原理[M].赵建华,郑滔,戴新宇,译.北京:机械工业出版社,2009 [10] 陈火旺,刘春林,谭庆平,等.程序设计语言编译原理(第3版)[M].北京:国防工业出版社,2001 [11] Allen F E,Cocke J.A program data flow analysis procedure[J].Communications of ACM,1976,19(3) [12] Open64[EB/OL].http://open64.sourceforge.net,2012-09-10 |
No related articles found! |
|