计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220900277-6.doi: 10.11896/jsjkx.220900277
莫尚丰, 周振芬, 胡勇华, 徐敏敏, 毛春献, 袁钰迪
MO Shangfeng, ZHOU Zhenfen, HU Yonghua, XU Minmin, MAO Chunxian, YUAN Yudi
摘要: FT-M7002是我国自主研发的高性能DSP,具有强大的向量处理能力。为有效地发挥它的性能优势,亟待优化移植面向FT-M7002的高效VSIP函数库。复数域行向量矩阵乘法是VSIP库中频繁使用的算法,在数字通信、图像处理等应用领域中大量使用。文中在FT-M7002 DSP上研究优化复数域行向量矩阵乘法算法,通过改变计算矩阵列向量为计算矩阵行向量、向量化、循环展开和软件流水等手段提升算法性能。测试结果表明:优化后的向量C算法相比VSIP库函数获得了6.2~20.6的加速比,汇编优化算法相比向量C算法获得了3.4~14.3的加速比,加速效果明显。
中图分类号:
[1]ZHANG Y H,LIU X G.Parallel Algorithm of Matrix Multiplication Based on MPI & OpenMP[J].Computer and Modernization,2011(7):84-87. [2]LIM R,LEE Y,et al.An implementation of matrix-matrix mul-tiplication on the Intel KNL processor with AVX-512[J].Cluster Computing,2018,21:1785-1795. [3]LI X W,CUI X.Performance optimization of matrix multiplication and FFT in GPU[J].Modern Electronics Technique,2013,36(4):80-84. [4]ZHANG M Y.Parallel implementation of matrix multiplication based on CUDA[J].Changjiang Information & Communications,2012(2):20-21. [5]SHAO Y M,ZHOU J.Implementation of Customized Instruc-tion for RISC-V CPU Based on FPGA[J].Software,2022,43(1):161-164. [6]TIAN X,ZHOU F.Design of field programmable gate arraybased real time double-precision floating-point matrix multiplier[J].Journal of Zhejiang University(Engineering Science),2008(9):1611-1615. [7]WANG Y H,LI C,LIU C,et al.Advancing DSP into HPC,AI,and beyond:challenges,mechanisms,and future directions[J].CCF Transactions on High Performance Computing,2021,3(1):114-125. [8]LI H X,ZHANG H F.A Cholesky decomposition vector processing algorithm for FT-M7002[J].Journal of Shaoyang University(Natural ScienceEdition),2022,19(3):9-17. |
|