计算机科学 ›› 2016, Vol. 43 ›› Issue (1): 226-231.doi: 10.11896/j.issn.1002-137X.2016.01.049
高伟,赵荣彩,于海宁,张庆花
GAO Wei, ZHAO Rong-cai, YU Hai-ning and ZHANG Qing-hua
摘要: 循环展开是一项常用的循环优化技术。当前针对串行程序的循环展开技术已经比较成熟,但是在实际应用中没有针对向量程序进行有效的循环展开。为了解决这个问题,提出了一种面向向量程序的循环展开技术。首先,针对向量寄存器压力和代码膨胀等限制因素,提出了一种自动计算展开因子的CUFVL算法;其次,根据向量循环展开的特点,制定了完全展开策略;最后结合CUFVL算法和完全展开策略,设计了向量循环展开的总体流程。实验结果表明,该方案能够计算出合适的展开因子,进而对向量程序进行适当的循环展开或完全展开,从而有效提升应用程序的性能。
[1] Sarkar V.Optimized unrolling of nested loops[C]∥Proc.of the 14th Int’l Conf.on Supercomputing.New Mexico:ACM Press,2000 [2] Li W L,Liu L,Tang Z Z.Loop unrolling optimization for software pipelining[J].Journal of Beijing University of Aeronautics and Astronautics,2004,30(11):1111-1115(in Chinese)李文龙,刘利,汤志忠.软件流水中的循环展开优化[J].北京航空航天大学学报,2004,30(11):1111-1115 [3] Lam Y M,Coutinho J G F,Luk W,et al.Unrolling-based loop mapping and scheduling[C]∥2008 International Conference on Field-Programmable Technolugy(ICFPT’2008).Taipei,Taiwan,2008:321-324 [4] Callahan D,Kennedy K,Porterfield A.Software prefetching[C]∥Proc.of the 4th Int’l Conf.on Architectural Support for Programming Languages and Operating Systems.ACM Press,1991 [5] Zhu J H.Research on SIMD Compiling Optimization Techniques[D].Shanghai:Fudan University,2005(in Chinese)朱嘉华.SIMD编译优化技术研究[D].上海:复旦大学,2005 [6] Jiang W H,Mei C,Guo Y.Vectorization for Real-life MultimediaApplications on Processors’ Multimedia Extensions [J].Chinese Journal of Computers,2005,28(8):1254-1266(in Chinese) 姜伟华,梅超,郭一.一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法[J].计算机学报,2005,28(8):1254-1266 [7] Zhang W H,Zang B Y.Research on SIMD Compiling Optimization Techniques [J].Communications of the China Computer Federation(CCCF),2007,3(2):27-36(in Chinese)张为华,臧斌宇.SIMD编译优化技术研究概述[J].中国计算机学会通讯,2007,3(2):27-36 [8] Intel Corp.Intel C/C++ and Intel Fortran Compilers for Linux[EB/OL].http://www.intel.com/software/products/compilers [9] The GNU Compiler Collection[EB/OL].http://gcc.gnu.org [10] Pathscale Compiler User’s Guide[EB/OL].http://www.pathscale.com [11] http://sourceforge.net/projects/open64/files/open64/Open64-5.0 [12] Bacon D F,Graham S L,Shap O J.Compiler Transformations for High-Performance Computing[J].ACM Computing Surveys,1994,26(4):345-420 [13] Alexander M J,Bailey M W,Childers B R,et al.Memory bandwidth optimizations for wide-bus machines[C]∥Proceedings of the 26th Hawaii International Conference on System Sciences.Wailea,Hawaii,1993:466-475 [14] Mowry T C.Tolerating Latency Through Software-ControlledData Prefetching[D].Stanford University,March 1994 [15] Fog A.Optimizing subroutines in assembly language[D].Co-penhagen University College of Engineering,September 2012 [16] Ma Y,Carr S.Register Pressure Guided Unroll-and-Jam[M]∥The 2008 Open64 Workshop.Boston,MA,USA,April 6,2008 [17] Carr S,Guan Y.Unroll and Jam using Uniformly Generated Sets[C]∥Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO-30).1997:349-357 [18] Carr S,Kennedy K.Improving the Ratio of Memory Operations to Floating-Point Operations in Loops[J].ACM Transactions on Programming Languages and Systems (ToPLaS),1994,16(6):1768-1810 [19] Mark S,Saman A.Predicting Unroll Factors Using SupervisedClassification[C]∥Proceedings of the International Symposium on Code Generation and Optimization(CGO).2003:204-215 [20] Monsifrot A,Bodin F,Quiniou R.A Machine Learning Ap-proach to Automatic Production of Compiler Heuristics[M]∥Artificial Intelligence:Methodology,Systems,Applications.2002:41-50 [21] Mowry T C,Lam M S,Gupta A.Design and evaluation of a compiler algorithm for prefetching[C]∥Proceeding of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.Massachusetts:ACM Press,1992:62-73 |
No related articles found! |
|