计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240300156-7.doi: 10.11896/jsjkx.240300156

• 计算机软件&体系架构 • 上一篇    下一篇

向量DSP的数组计算高效代码生成技术研究

廖泽明, 刘桂开, 胡勇华, 谢安星   

  1. 湖南科技大学计算机科学与工程学院 湖南 湘潭 411100
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 胡勇华(huyh@hnust.cn)
  • 作者简介:(1003191976@qq.com)
  • 基金资助:
    湖南省自然科学基金(2023JJ50019,2024JJ7172)

Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs

LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing   

  1. School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411100,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:LIAO Zeming,born in 1998,postgra-duate.His main research interests include high performance computing and code generation.
    HU Yonghua,born in 1981,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.E200034324M).His main research interests include compilation technology,code optimization for parallel computing,etc.
  • Supported by:
    Natural Science Foundation of Hunan Province,China(2023JJ50019,2024JJ7172).

摘要: 随着大规模集成电路技术不断发展,融合SIMD、VLIW等指令并行处理技术的向量DSP在高性能计算领域获得日益广泛的关注和应用。适配不同种类的算法函数库成了向量DSP的关键挑战之一。只有减少编程时重复性工作的投入,更加集中精力于基于向量DSP架构和硬件资源进行代码优化,才能有效提高应用开发效率。综合考虑向量DSP代码中的计算涉及的数据数量,提出基于模板的数组计算高效代码的自动生成方法,实现自动化的动态缓存分配,针对不连续的数据访存进行数据重排,并对标量指令进行优化,使生成的代码能够使用处理器的专用向量资源。实验结果表明,使用技术生成代码大幅度提高了获得相关函数代码的工作效率,并且生成的向量计算汇编代码平均性能达到手写汇编代码平均性能的75%左右,与标量汇编代码性能相比有平均8.7倍的加速比。

关键词: 高性能计算, 代码生成, 自动向量化, 向量DSP

Abstract: With the continuous development of large-scale integrated circuits technology,vector DSPs incorporating SIMD,VLIW and other instruction parallel processing technologies have gained more and more attention and applications in the field of high-performance computing.Adapting different kinds of algorithm function libraries becomes one of the key challenges for vector DSPs.Only by reducing the input of repetitive work in programming and concentrating more on code optimization based on vector DSP architecture and hardware resources can the application development efficiency be effectively improved.Taking into account the amount of data involved in the computation in vector DSP codes,we proposed an automatic generation method for efficient code generation based on template-based array computation,which implements automated dynamic cache allocation,data rearrangement for discontinuous data accesses,and optimization of scalar instructions,so that the generated code could use the dedicated vector resources of the processor.Experimental results show that using the technique to generate code substantially improves the efficiency of obtaining relevant function code,and that the average performance of the generated vector computation assembly code reaches about 75% of the average performance of handwritten assembly code,and has an average speedup ratio of 8.7 times compared to the performance of scalar assembly code on average.

Key words: High performance computing, Code generation, Automatic vectorization, Vector DSP

中图分类号: 

  • TP314
[1]GUO J,GUO Z D,YANG Z Q,et al.Research on pseudo-color system for long-wave infrared polarization image based on DSP[J].Optics and Photonics Technology,2022,20(02):126-133.
[2]YANG B,HAN J,SUN L Y.CANFD communication realization based on DSP and FPGA[J].Navigation and Control,2021,20(6):53-59.
[3]GU C Y,CHEN Y Q,CHEN H M,et al.A RISC-V digital signal processing processor for narrowband communication and voice processing[J].China Integrated Circuits,2021,30(12):42-47.
[4]ZHU C Q,WU X Y.Analysis of DSP development history and future development trend[J].Industry and Technology Forum,2013,12(11):122-123.
[5]GAO W,ZHAO R C,HAN L,et al.Overview of SIMD automatic vectorized compilation optimization[J].Journal of Software,2015,26(6):1265-1284.
[6]WANG J,SOHL J,KRAIGHER O,et al.ePUMA:Embedded Parallel DSP Processor Architecture with Unique Memory Access[C]//International Conference on Information and Communication Security.IEEE,2011.
[7]BORKAR S,CHIENA A.The future of microprocessors[J].Communications of the ACM,2011,54(5):67-77.
[8]YU A.The future of microprocessors[J].IEEE Micro,1978,16(6):46-53.
[9]LARSENS,AMARASINGHE S.Exploiting superword levelparallelism with multimedia instruction sets[J].ACM SIGPLAN Notices,2000,35(5):145-165.
[10]LEUPERS R.Code selection for media processors with SIMD instructions[C]//Design,Automation and Test in Europe Conference and Exhibition 2000.IEEE,2000.
[11]SRERAMAN N,GOVINDARAJAN R.A Vectorizing Compiler for Multimedia Extensions[J].International Journal of Parallel Programming,2000,28(4):363-400.
[12]SRERAMAN N,GOVINDARAJAN R.A Vectorizing Compiler for Multimedia Extensions[J].International Journal of Parallel Programming,2000,28(4):363-400.
[13]REICHE O,KOBYLKO C,HANNIGF,et al.Auto-vectorization for image processing DSLs[J].ACM SIGPLAN Notices,2017,52(5):21-30.
[14]YAO J Y,ZHAO R C,WANG Q,et al.Loop-nest Auto-vectoriz-ation Method Based on Benefit Analysis[C]//Proceedings of 2018 the 2nd International Conference on Advances in Image Processing(ICAIP 2018).ACM,2018.
[15]WEI S.Research of SIMD vectorization algorithm and regroup technology [D].Zhengzhou:Information Engineering University,2012.
[16]XIAR J.Research and realization of key technology of automatic vectorization based on FT-Matrix2 [D].Changsha:National University of Defense Technology,2017.
[17]LI W.Research on automatic vector optimization method based on improved VEGEN [D].Harbin:Harbin Engineering University,2024.
[18]LI J N,HAN L,CHAI E D.Automatic vectorized porting and optimization of LLVM for domestic platforms[J].Computer Engineering,2022,48(1):142-148.
[19]LI P Y,ZHAO R C,GAO W.et al.A vectorized code generation method supporting cross-amplitude access[J].Computer Science,2015,42(5):194-199,203.
[20]CHEN L,LENG L,YANG Z,et al.Enhanced Multitask Learning for Hash Code Generation of Palmprint Biometrics[J].International Journal of Neural Systems,2024,34(4):2450020.
[21]YEO S,MA Y,KIM C S,et al.Framework for evaluating code generation ability of large language models[J].ETRI Journal,2024,46(1):106-117.
[22]GUANG Y,YU Z,XIANG C,et al.A syntax-guided multi-task learning approach for Turducken-style code generation[J].ar-Xiv:2303.05061,2023:
[23]YING J,WENJUN Y,YANG Y.The Metric for AutomaticCode Generation Based on Dynamic Abstract Syntax Tree[J].International Journal of Digital Crime and Forensics(IJDCF),2023,15(1):1-20.
[24]YUAN R T,XIAO C,LIU M J,et al.CAN message unpack and pack based on simulink automatic code generation technology[C]//2021 International Conference on Control Theory and Application.2021.
[25]HU K,DUAN Z,WANG J,et al.Template-based AADL automatic code generation[J].Frontiers of Computer Science,2019,13(4):698-714.
[26]LU M L,HUANG Z M.Design and realization of code automatic generation system based on Spring Boot[J].Popular Science and Technology,2023,25(4):11-16.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!