计算机科学 ›› 2013, Vol. 40 ›› Issue (10): 24-28.

• 综述 • 上一篇    下一篇

基于跨基本块变换和循环分布的SLP优化技术

索维毅,赵荣彩,姚远,张小妹   

  1. 解放军信息工程大学 郑州450002;解放军信息工程大学 郑州450002;解放军信息工程大学 郑州450002;解放军73311部队自动化站 晋江362200
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受核高基重大专项(2009ZX01036-001-001-2)资助

SLP Optimization Algorithm Using Across Basic Block Transformation and Loop Distribution

SUO Wei-yi,ZHAO Rong-cai,YAO Yuan and ZHANG Xiao-mei   

  • Online:2018-11-16 Published:2018-11-16

摘要: 现有的SLP优化算法无法处理内层循环中存在的依赖环和归约,并且在基本块边界产生大量的冗余拆包和赋值语句,从而导致向量化效率不高。针对该问题,提出了一种基于跨基本块变换和循环分布的SLP优化算法。该算法以控制流图为基础,根据基本块间各数组变量的Define-Use关系以及跨越基本块之间的数据依赖关系进行跨基本块的向量化变换,有序地采用跨基本块变换和循环分布,尽可能发掘最内层循环基本块内语句的并行性,使SLP自动向量化编译器生成具有更多SIMD指令的向量化代码。实验结果表明,该算法能够隐藏更多跨基本块冗余操作的开销,同时利用跨基本块的数据依赖生成更优的SIMD指令,有效地提高了向量化程序的加速比。

关键词: SLP,跨基本块变换,循环分布,数据依赖,控制流图,Define-Use关系

Abstract: The existing SLP algorithms cannot handle dependent ring and the reduction of the inner loop,and generate a large number of redundant packet disassembly and assignment statements in a basic block boundary,which leads to the lower quantization efficiency.In order to solve the problem,this paper proposed a SLP optimization algorithm using cross basic block transformation and loop distribution.Based on the control flow graph,according to the basic blocks of the array variable between Define-Use and across basic block data relation between across basic block,the algorithm makes the quantized transform,orderly uses across basic block transform and loop distribution,and then expands inner loop within a basic block sentence parallelism as far as possible,making SLP automatic vectorization compiler to generate the vectorization code which has more SIMD instruction.The experimental results show that the algorithm can hide more across basic block redundancy operation cost,at the same time generate better SIMD instructions across basic block data dependence,effectively improving the vectorization program speedup.

Key words: SLP,Cross basic block,Loop distribution,Data dependence,Control flow graph,Define-Use relationship

[1] Franchetti F,Kral S,Lorenz J,et al.Efficiem utilization of SIMD extensions[J].Proceedings ofthe IEEE,2005,93(2):409-425
[2] TMS320C6000CPU and Instruction Set Reference Guide(Rev.F)[M].TexasInstruments Inc.2000
[3] SC140DSP Core Reference Manual[R/OL].http:// cache.freescale.com/files/dsp/doc/ref_manual/MNSC140CORE.pdf,2012-05-20
[4] Fridman J,Greenfield Z.The Tiger SHARC DSP Architecture[J].IEEE Micro,2000,0(1):66-76
[5] Tanaka H,Ota Y,Matsumoto N,et al.A New CompilationTechnique for SIMD Code Generation Across Basic Block Boundaries[C]∥Design Automation Conference(ASP-DAC),201015th Asia and South Pacifi.Jan.2010:101-106
[6] Larsen S,Amarasinghe S.Exploiting superword level parallelism with multimedia instruction sets[C]∥Proc of the ACM SIGPLAN Conference on Programming Language Design and Implementation.June 2000:145-156
[7] Shin J,Hall M,Chame J.Superword-level Parallelism in thePresence of Control Flow[C]∥Proc.of the International Symposium on Code Generation and optimization.March 2005:165-175
[8] Nuzman D,Zaks A.Outer-loop vectorization:revisited for short simd architectures[C]∥Proceedings of the 17th international conference on parallel architectures and compilation techniques,PACT ’08.New York,NY,USA,ACM,2008:2-11
[9] Aho A V,Lam M S,Sethi R,et al.编译原理[M].赵建华,郑滔,戴新宇,译.北京:机械工业出版社,2009
[10] 陈火旺,刘春林,谭庆平,等.程序设计语言编译原理(第3版)[M].北京:国防工业出版社,2001
[11] Allen F E,Cocke J.A program data flow analysis procedure[J].Communications of ACM,1976,19(3)
[12] Open64[EB/OL].http://open64.sourceforge.net,2012-09-10

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!