Computer Science ›› 2016, Vol. 43 ›› Issue (1): 226-231.doi: 10.11896/j.issn.1002-137X.2016.01.049

Previous Articles     Next Articles

Loop Unrolling in Vectorized Programs

GAO Wei, ZHAO Rong-cai, YU Hai-ning and ZHANG Qing-hua   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Loop unrolling is a loop transformation that has been developed well and widely used in serial programs.However,it usually loses its efficiency in practical vectorized applications.To solve this problem,this paper proposed a loop unrolling technique for vectorized loops.First,we presented the CUFVL algorithm to automatically determine the unroll factors via taking the register pressure and code explosion problems into account.Second,we designed a complete unrolling strategy according to the specific characters of vectorized loops.Finally,we showed the flow chart of the proposed technique based on the CUFVL algorithm and the complete unrolling strategy.Experimental results show that the proposed technique can find out the most suitable unroll factor for the programs used in the experiments,thereby enhancing their performance efficiently.

Key words: Vectorized programs,Loop unrolling,Unroll factor,Complete unrolling

[1] Sarkar V.Optimized unrolling of nested loops[C]∥Proc.of the 14th Int’l Conf.on Supercomputing.New Mexico:ACM Press,2000
[2] Li W L,Liu L,Tang Z Z.Loop unrolling optimization for software pipelining[J].Journal of Beijing University of Aeronautics and Astronautics,2004,30(11):1111-1115(in Chinese)李文龙,刘利,汤志忠.软件流水中的循环展开优化[J].北京航空航天大学学报,2004,30(11):1111-1115
[3] Lam Y M,Coutinho J G F,Luk W,et al.Unrolling-based loop mapping and scheduling[C]∥2008 International Conference on Field-Programmable Technolugy(ICFPT’2008).Taipei,Taiwan,2008:321-324
[4] Callahan D,Kennedy K,Porterfield A.Software prefetching[C]∥Proc.of the 4th Int’l Conf.on Architectural Support for Programming Languages and Operating Systems.ACM Press,1991
[5] Zhu J H.Research on SIMD Compiling Optimization Techniques[D].Shanghai:Fudan University,2005(in Chinese)朱嘉华.SIMD编译优化技术研究[D].上海:复旦大学,2005
[6] Jiang W H,Mei C,Guo Y.Vectorization for Real-life MultimediaApplications on Processors’ Multimedia Extensions [J].Chinese Journal of Computers,2005,28(8):1254-1266(in Chinese) 姜伟华,梅超,郭一.一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法[J].计算机学报,2005,28(8):1254-1266
[7] Zhang W H,Zang B Y.Research on SIMD Compiling Optimization Techniques [J].Communications of the China Computer Federation(CCCF),2007,3(2):27-36(in Chinese)张为华,臧斌宇.SIMD编译优化技术研究概述[J].中国计算机学会通讯,2007,3(2):27-36
[8] Intel Corp.Intel C/C++ and Intel Fortran Compilers for Linux[EB/OL].http://www.intel.com/software/products/compilers
[9] The GNU Compiler Collection[EB/OL].http://gcc.gnu.org
[10] Pathscale Compiler User’s Guide[EB/OL].http://www.pathscale.com
[11] http://sourceforge.net/projects/open64/files/open64/Open64-5.0
[12] Bacon D F,Graham S L,Shap O J.Compiler Transformations for High-Performance Computing[J].ACM Computing Surveys,1994,26(4):345-420
[13] Alexander M J,Bailey M W,Childers B R,et al.Memory bandwidth optimizations for wide-bus machines[C]∥Proceedings of the 26th Hawaii International Conference on System Sciences.Wailea,Hawaii,1993:466-475
[14] Mowry T C.Tolerating Latency Through Software-ControlledData Prefetching[D].Stanford University,March 1994
[15] Fog A.Optimizing subroutines in assembly language[D].Co-penhagen University College of Engineering,September 2012
[16] Ma Y,Carr S.Register Pressure Guided Unroll-and-Jam[M]∥The 2008 Open64 Workshop.Boston,MA,USA,April 6,2008
[17] Carr S,Guan Y.Unroll and Jam using Uniformly Generated Sets[C]∥Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO-30).1997:349-357
[18] Carr S,Kennedy K.Improving the Ratio of Memory Operations to Floating-Point Operations in Loops[J].ACM Transactions on Programming Languages and Systems (ToPLaS),1994,16(6):1768-1810
[19] Mark S,Saman A.Predicting Unroll Factors Using SupervisedClassification[C]∥Proceedings of the International Symposium on Code Generation and Optimization(CGO).2003:204-215
[20] Monsifrot A,Bodin F,Quiniou R.A Machine Learning Ap-proach to Automatic Production of Compiler Heuristics[M]∥Artificial Intelligence:Methodology,Systems,Applications.2002:41-50
[21] Mowry T C,Lam M S,Gupta A.Design and evaluation of a compiler algorithm for prefetching[C]∥Proceeding of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.Massachusetts:ACM Press,1992:62-73

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!