Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 777-783.doi: 10.11896/jsjkx.210400146

• Interdiscipline & Application • Previous Articles     Next Articles

Study on Hybrid Resource Heuristic Loop Unrolling Factor Selection Method Based on Vector DSP

LU Hao-song, HU Yong-hua, WANG Shu-ying, ZHOU Xin-lian, LI Hui-xiang   

  1. School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411201,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:LU Hao-song,born in 1995,postgra-duate.His main research interests include DSP compilation and code optimization technology.
    HU Yong-hua,born in 1981,Ph.D,professor,Ph.D supervisor.His main research interests include DSP compilation and code optimization technology.
  • Supported by:
    Research Projects of Hunan Provincial Department of Education(20B242,19A169),Natural Science Foundation of Hunan Province,China(2017JJ3087) and National Natural Science Foundation of China(61872138).

Abstract: For modern microprocessors,the very long instruction word(VLIW) architecture integrating vector processing units has gradually become a typical representative of high-performance digital signal processor(DSP) architectures.This architecture is mainly characterized by rich register resources and many instruction execution units.Based on these characteristics,a selection method for the corresponding loop unrolling factor is proposed to improve the effect of loop unrolling optimization.This method takes into account the vector or scalar attribute of the code in a loop body,and the usage rules of base address registers and index registers.Moreover,another two heuristics,i.e.,the proportion of the times that the execution units are used and the power alignment of unrolling factor,are used in the loop unrolling factor selection algorithm.The ability of this method in developing more instruction level parallelism is proved by experiments performed on three commonly used digital signal processing algorithms.Experiment results show that the average performance of the algorithms improves by more than 10% compared with the existing methods.In particular,experiments on FFT algorithm show that the proposed method can analyze the usage of related hardware resources more accurately through the hybrid resource heuristics,and makes the judgment of unrolling and obtains the corresponding value of loop unrolling factor.

Key words: Compiler optimization, Loop unrolling, Unrolling factor, Vector DSP, VLIW

CLC Number: 

  • TP314
[1] LEE Y,AVIZIENIS R,BISHARA A,et al.Exploring thetradeoffs between programmability and efficiency in data-parallel accelerators[J].ACM Sigarch Computer Architecture News,2011,39(3):129-140.
[2] LIN C C,GU N J,LEI Y M,et al.SIMD compiler optimization of clustered VLIW DSP[J].Journal of University of Science and Technology of China,2011(8):53-59.
[3] HE G Q,CHEN Y.Research on Huarui DSP software architecture[J].Modern Radar,2016(9):17-22.
[4] BLAKE G,DRESLINSKI R G,MUDGE T.A survey of mul-ticore processors[J].IEEE Signal Processing Magazine,2009,26(6):26-37.
[5] WOH M,SEO S,MAHLKE S A,et al.AnySP:Anytime Anywhere Anyway Signal Processing[C]//36th International Symposium on Computer Architecture(ISCA 2009).Austin,TX,USA.ACM,2009:20-24.
[6] ROWEN C,DAN N,RAVINDRAN R,et al.The world's fastest DSP core:Breaking the 100 GMAC/s barrier[C]//2011 IEEE Hot Chips 23 Symposium(HCS).IEEE,2011.
[7] CHEN S M,LIU S,WAN J H,et al.Architecture and implementation of collaborative multi-core DSP YHFT qmbase[J].Chinese Science:Information Science,2015,45(4):560-573.
[8] ZHOU N,WANG R,QIN Y Y,et al.Design and implementation of inter core communication mechanism based on heterogeneous multi-core environment[J].Computer Engineering and Design,2019,40(3):294-300,308.
[9] PADUA D A.Advanced compiler optimizations for supercomputers[J].Communications of the ACM,1986,29(12):1184-1201.
[10] SHIVAM A,WATKINSON N,NICOLAU A,et al.Towards an Achievable Performance for the Loop Nests[J].arXiv:1902.00603,2019.
[11] CUI Y Z,LIU S,WANG Q,et al.Study on cyclic optimization technique of lattice Boltzmann method[J].Acta Computa Sinica,2020,450(6):116-132.
[12] WEISS S,SMITH J E.A study of scalar compilation techniques for pipelined supercomputers[J].ACM Sigarch Computer Architecture News,1990,15(5):105-109.
[13] SARKAR V.Optimized Unrolling of Nested Loops[J].International Journal of Parallel Programming,2001,29(5):545-581.
[14] CARR S,GUAN Y.Unroll-and-jam using uniformly generated sets[C]//Proceedings of 30th Annual International Symposium on Microarchitecture.IEEE,1997:349-357.
[15] CARR S M, KENNEDY K W.Improving the ratio of memory operations to floating-point operations in loops[J].ACM Tran-sactions on Programming Languages & Systems,1994,16(6):1768-1810.
[16] LI W L,LIU L,TANG Z Z.Optimization of loop unfolding in software pipeline[J].Journal of Beijing University of Aeronautics and Astronautics,2004,30(11):1111-1115.
[17] HUANG Y B,LI C J.Implementation of tail loop Vectorization Based on llvm[C]//Proceedings of the 20th Annual Conference of Computer Engineering and Technology and the 6th Microprocessor Technology Forum.Computer Society of China,2016.
[1] TANG Zhen, HU Yong-hua, LU Hao-song, WANG Shu-ying. Research on DSP Register Pairs Allocation Algorithm with Weak Assigning Constraints [J]. Computer Science, 2021, 48(6A): 587-595.
[2] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[3] GAO Wei, ZHAO Rong-cai, YU Hai-ning and ZHANG Qing-hua. Loop Unrolling in Vectorized Programs [J]. Computer Science, 2016, 43(1): 226-231.
[4] LIU Peng,ZHAO Rong-cai,ZHAO Bo and GAO Wei. Unified Vectorization Framework for SIMD Extensions [J]. Computer Science, 2014, 41(9): 28-31.
[5] LIU Fei,CHEN Yue-yue,SUN Hai-yan and YANG Liu. Implement of Matrix Compiler’s If-convertion Algorithm [J]. Computer Science, 2013, 40(4): 55-58.
[6] GE Hong-mei,XU Chao,CHEN Nian and LIAO Xi-mi. Low Power Optimization Method Oriented to Embedded System’s Bus [J]. Computer Science, 2013, 40(12): 31-36.
[7] TIAN Zu-wei,SUN Guang. Research of Compiler Optimization Technology Based on Predicated Code [J]. Computer Science, 2010, 37(5): 130-133.
[8] TANG Wei, WU Cheng-Yong, ZHANG Zhao-Qing (Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100080). [J]. Computer Science, 2006, 33(4): 250-252.
[9] . [J]. Computer Science, 2006, 33(2): 257-262.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!