基于GPU的稀疏矩阵向量乘优化

Computer Science ›› 2010, Vol. 37 ›› Issue (8): 168-171.

Optimizing Sparse Matrix-vector Multiplication Based on GPU

BAI Hong-tao,OUYANG Dan-tong,LI Xi-ming,LI Ting,HE Li-li

Online:2018-12-01 Published:2018-12-01

Abstract

Abstract: Sparse matrix computations present additional challenges for harnessing the potential of modern graphics processing unit(GPU) for general-purpose computing. We investigated various optimizations on thread-mapping, data reuse etc. and a parallel Sparse Matrix-Vector multiplication(SpMV) on GPU with compute unified device architecture(CUDA) was proposed under compressed sparse row(CSR) structure afterwards. The optimizations include; (1) exploiting each clement using half-warp threads, which synchronize free within one warp; (2) making up integer address to achieve coalesced accesses; (3) data reuse through reading from texture vector resides in; (4) data transfer using page locked memory; (5) reading results in shared memory. We compared the performance of our approach with that of efficicnt parallel SpMV implementations such as(1)the one from NVIDIA’s CUDPP library and(2)the one from NVIDIA's SpMV library. Our approach outperforms two both in memory bandwidth and GFLOPS. In addition, the total performance of our approach is three times greater than that of a CPU counterpart.

Key words: Sparse matrix, Compressed sparse row, Graphics processing unit, Compute unified device architecture, Optimizations

BAI Hong-tao,OUYANG Dan-tong,LI Xi-ming,LI Ting,HE Li-li. Optimizing Sparse Matrix-vector Multiplication Based on GPU[J].Computer Science, 2010, 37(8): 168-171.