Computer Science ›› 2014, Vol. 41 ›› Issue (7): 290-296.doi: 10.11896/j.issn.1002-137X.2014.07.060

Previous Articles     Next Articles

Quasi-diagonal Matrix Hybrid Compression Algorithm and Implementation for SpMV on GPU

YANG Wang-dong,LI Ken-li and SHI Lin   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Sparse matrix-vector multiplication(SpMV) is of singular importance in sparse linear algebra,which is an important issue in scientific computing and engineering practice.Much effort has been put into accelerating the SpMV and a few parallel solutions have been proposed.In this paper we focused on a special SpMV,sparse quasi-diagonal matrix multiplication(SQDMV).The sparse quasi diagonal matrix is the key to solve many differential equation and very little research is done on this field.We discussed data structures and algorithms for SQDMV that were efficiently implemented on the CUDA platform for the fine-grained parallel architecture of the GPU.We presented a new diagonal storage format HDC,which overcomes the inefficiency of DIA in storing irregular matrix and the imbalances of CSR in storing non-zero element.Further,HDC can adjust the storage bandwidth of the diagonal to adapt to different discrete degree of sparse matrix,so as to get higher compression ratio than the DIA and CSR,reduce the computation complexity.Our implementation in GPU shows that the performance of HDC is better than other format especially for matrix with some discrete points outside the main diagonal.In addition,we combined the different parts of HDC to a unified kernel to get better compress ration and higher speedup ratio in GPU.

Key words: GPU,Sparse matrix,SpMV,CUDA

[1] Amestoy P R,Davis T A,Duff I S.An approximate minimum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17(4):886-905
[2] 袁娥,张云泉,刘芳芳,等.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,6(7):1117-1126
[3] 白洪涛,欧阳丹彤,李熙铭.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-172,1
[4] 王伟,陈建平,曾国荪.大规模稀疏矩阵的主特征向量计算优化方法[J].计算机科学与探索,2012,6(2):118-124
[5] Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008
[6] Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004. Demcember 2008
[7] Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]∥Proceedings of IEEE 8th Symposium on Application Specific Processors(SASP’10).Anaheim,California,USA,June 2010:67-70
[8] 王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,0(3):46-49
[9] Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures [C]∥Proceedings of International Conference on High-Performance and Embedded Architectures and Compilers(HiPEAC’10).2010:111-125
[10] V′azquez F,Ortega G,Fern′andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]∥Proceedings of IEEE International Conference on Computer and Information Technology(CIT’10).Bradford,June 2010:1146-1151
[11] Baskaran M M,Bordaker R.Optimizing sparse matrix-vectormultiplication on gpus[R].IBM Research Report RC24704(W0812-047).April 2009
[12] Dehnavi M M,Fern′andez D M,Giannacopoulos D.Finite-ele-ment sparse matrix vector multiplication on graphic processing units[J].IEEE Transactions on Magnetics,2010,46(8):2982-2985
[13] Buatois L,Caumon G,L′evy B.Concurrent number cruncher:An efficient sparse linear solver on the gpu[C]∥High Performance Computing and Communications(HPCC’07).Springer-Verlag,2007,4782:358-371
[14] Karakasis V,Goumas G,Koziris N.Perfomance models forblocked sparse matrix-vector multiplication kernels[C]∥Proceedings of the 2009International Conference on Parallel Processing(ICPP’09).Vienna,Austria,September 2009:356-364
[15] Feng Xiao-wen,Jin Hai,Zheng Ran,et al.Optimization ofSparse Matrix-Vector Multiplication with Variant CSR on GPUs[C]∥Proceedings of the 2011IEEE 17th International Conference on Parallel and Distributed Systems(ICPADS’2011).Tainan,Taiwan,2011:165-172
[16] Choi J W,Singh A,Vuduc R W.Model-driven autotuning ofsparse matrix-vector multiply on gpus[C]∥Proceedings of the 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP’10).Bangalore,India,January,2010:115-126
[17] Saad Y.Sparskit:A basic tool-kit for sparse matrix computa-tions,version 2.http://www-users.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html,2005-03
[18] Cevahir A,Nukada A,Matsuoka S.Fast conjugate gradientswith multiple gpus[C]∥Proceedings of the International Conference on Computational Science(ICCS’09).Baton Rouge,Louisiana,USA,May 2010
[19] Hugues M R,Petiton S G.Sparse matrix formats evaluation and optimization on a gpu[C]∥Proceedings of 12th IEEE International Conference on High Performance Computing and Communications(HPCC’10).Melbourne,VIC,September 2010:112-119
[20] Y Xin-tian,Parthasarathy S,Sadayappan P.Fast sparse matrix-vector multiplication on gpus:Implications for graph mining[C]∥Proceedings of the VLDB Endowment.Seattle,Washington,2011
[21] Pichel J C,Rivera F F,Fernández M.Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs[J].Microprocessors and Microsystems,2012,36:65-77
[22] NVIDIA Corporation.NVIDIA Tesla GPU Computing Technical Brief 1.0[Z/0L].http://developer.Nvidia.com/object/cuda.html,2010-06-03
[23] NVIDIA Corporation.CUDA ProgrammingGuide 4.1[EB/OL].http://www.nvidia.com,Nov,2011
[24] Harris M.Optimizing parallel reduction in cuda.http://developer.download.nvidia.com/compute/cuda/11/Website/projects/reduction/doc/reduction.pdf
[25] The University of Florida Sparse Matrix Collection.http://www.cise.ufl.edu/research/sparse/matrices/groups.html
[26] NVIDIA Corporation.CUDA Toolkit 4.2 CUBLAS Library.http://developer.nvidia.com/cuda/cublas/,2012
[27] NVIDIA Corporation.The NVIDIA CUDA Sparse Matrix library(cuSPARSE).http://developer.nvidia.com/cuda/cusparse,2012

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!