计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 777-783.doi: 10.11896/jsjkx.210400146
陆浩松, 胡勇华, 王书盈, 周新莲, 李慧祥
LU Hao-song, HU Yong-hua, WANG Shu-ying, ZHOU Xin-lian, LI Hui-xiang
摘要: 在现代处理器中,具有向量处理单元的VLIW体系结构逐渐成为高性能DSP体系结构的典型代表。基于这类体系结构的寄存器资源丰富、执行单元多等特点,研究了相应的循环展开因子选择问题,提出了一种循环展开因子选择方法来提升循环展开这种重要优化的效果。该方法考虑了循环体代码的向量标量属性、基址寄存器和索引寄存器资源使用规则等因素,并且在展开因子选择算法中增加了执行单元使用占比和展开因子按幂次对齐这两种启发式因素。针对3种常用数字信息处理算法开展了实验研究,实验结果表明了该方法的有效性。对于这三种DSP算法,用所提方法获得的循环展开因子进行循环展开处理后,它们的平均性能相比已有方法提升了10%以上。
中图分类号:
[1] LEE Y,AVIZIENIS R,BISHARA A,et al.Exploring thetradeoffs between programmability and efficiency in data-parallel accelerators[J].ACM Sigarch Computer Architecture News,2011,39(3):129-140. [2] LIN C C,GU N J,LEI Y M,et al.SIMD compiler optimization of clustered VLIW DSP[J].Journal of University of Science and Technology of China,2011(8):53-59. [3] HE G Q,CHEN Y.Research on Huarui DSP software architecture[J].Modern Radar,2016(9):17-22. [4] BLAKE G,DRESLINSKI R G,MUDGE T.A survey of mul-ticore processors[J].IEEE Signal Processing Magazine,2009,26(6):26-37. [5] WOH M,SEO S,MAHLKE S A,et al.AnySP:Anytime Anywhere Anyway Signal Processing[C]//36th International Symposium on Computer Architecture(ISCA 2009).Austin,TX,USA.ACM,2009:20-24. [6] ROWEN C,DAN N,RAVINDRAN R,et al.The world's fastest DSP core:Breaking the 100 GMAC/s barrier[C]//2011 IEEE Hot Chips 23 Symposium(HCS).IEEE,2011. [7] CHEN S M,LIU S,WAN J H,et al.Architecture and implementation of collaborative multi-core DSP YHFT qmbase[J].Chinese Science:Information Science,2015,45(4):560-573. [8] ZHOU N,WANG R,QIN Y Y,et al.Design and implementation of inter core communication mechanism based on heterogeneous multi-core environment[J].Computer Engineering and Design,2019,40(3):294-300,308. [9] PADUA D A.Advanced compiler optimizations for supercomputers[J].Communications of the ACM,1986,29(12):1184-1201. [10] SHIVAM A,WATKINSON N,NICOLAU A,et al.Towards an Achievable Performance for the Loop Nests[J].arXiv:1902.00603,2019. [11] CUI Y Z,LIU S,WANG Q,et al.Study on cyclic optimization technique of lattice Boltzmann method[J].Acta Computa Sinica,2020,450(6):116-132. [12] WEISS S,SMITH J E.A study of scalar compilation techniques for pipelined supercomputers[J].ACM Sigarch Computer Architecture News,1990,15(5):105-109. [13] SARKAR V.Optimized Unrolling of Nested Loops[J].International Journal of Parallel Programming,2001,29(5):545-581. [14] CARR S,GUAN Y.Unroll-and-jam using uniformly generated sets[C]//Proceedings of 30th Annual International Symposium on Microarchitecture.IEEE,1997:349-357. [15] CARR S M, KENNEDY K W.Improving the ratio of memory operations to floating-point operations in loops[J].ACM Tran-sactions on Programming Languages & Systems,1994,16(6):1768-1810. [16] LI W L,LIU L,TANG Z Z.Optimization of loop unfolding in software pipeline[J].Journal of Beijing University of Aeronautics and Astronautics,2004,30(11):1111-1115. [17] HUANG Y B,LI C J.Implementation of tail loop Vectorization Based on llvm[C]//Proceedings of the 20th Annual Conference of Computer Engineering and Technology and the 6th Microprocessor Technology Forum.Computer Society of China,2016. |
[1] | 唐镇, 胡勇华, 陆浩松, 王书盈. 基于弱约束指派的DSP寄存器偶对分配算法研究 Research on DSP Register Pairs Allocation Algorithm with Weak Assigning Constraints 计算机科学, 2021, 48(6A): 587-595. https://doi.org/10.11896/jsjkx.200600061 |
[2] | 胡伟方, 陈云, 李颖颖, 商建东. 基于数据重用分析的多面体循环合并策略 Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation 计算机科学, 2021, 48(12): 49-58. https://doi.org/10.11896/jsjkx.210200071 |
[3] | 池昊宇, 陈长波. 基于神经网络的循环分块大小预测 Prediction of Loop Tiling Size Based on Neural Network 计算机科学, 2020, 47(8): 62-70. https://doi.org/10.11896/jsjkx.191200180 |
[4] | 高伟,赵荣彩,于海宁,张庆花. 循环展开技术在向量程序中的应用 Loop Unrolling in Vectorized Programs 计算机科学, 2016, 43(1): 226-231. https://doi.org/10.11896/j.issn.1002-137X.2016.01.049 |
[5] | 刘鹏,赵荣彩,赵博,高伟. 一种面向SIMD扩展部件的向量化统一架构 Unified Vectorization Framework for SIMD Extensions 计算机科学, 2014, 41(9): 28-31. https://doi.org/10.11896/j.issn.1002-137X.2014.09.004 |
[6] | 葛红美,徐超,陈念,廖希密. 一种面向嵌入式系统总线的低功耗优化方法 Low Power Optimization Method Oriented to Embedded System’s Bus 计算机科学, 2013, 40(12): 31-36. |
[7] | 田祖伟,孙光. 基于谓词代码的编译优化技术研究 Research of Compiler Optimization Technology Based on Predicated Code 计算机科学, 2010, 37(5): 130-133. |
[8] | . 基于汇编代码的指令调度器的设计与实现 计算机科学, 2009, 36(3): 45-47. |
[9] | . 数据流分析的关键技术研究 计算机科学, 2005, 32(12): 91-93. |
[10] | 连瑞琦 张兆庆. 低功耗编译的若干相关技术 计算机科学, 2004, 31(8): 164-167. |
[11] | 舒辉 康绯. CME分析中的丢番图方程求解 计算机科学, 2002, 29(7): 149-151. |
|