计算机科学 ›› 2014, Vol. 41 ›› Issue (7): 290-296.doi: 10.11896/j.issn.1002-137X.2014.07.060

• 人工智能 • 上一篇    下一篇

一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现

阳王东,李肯立,石林   

  1. 湖南城市学院信息科学与工程学院 益阳413082;国家超级计算长沙中心 长沙410082;国家超级计算长沙中心 长沙410082;国家超级计算长沙中心 长沙410082
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金重点项目(61133005),国家自然基金项目(61070057),国家科技支撑计划项目(2012BAH09B02),教育部科技创新工程重大项目培育资金项目(708066),教育部博士点基金(20100161110019),教育部新世纪优秀人才支持计划(NCET-08-0177),湖南省教育厅重点科研项目(13A011)资助

Quasi-diagonal Matrix Hybrid Compression Algorithm and Implementation for SpMV on GPU

YANG Wang-dong,LI Ken-li and SHI Lin   

  • Online:2018-11-14 Published:2018-11-14

摘要: 稀疏矩阵与向量乘(SpMV)属于科学计算和工程应用中的一种基本运算,其高性能实现与优化是计算科学的研究热点之一。在微分方程的求解过程中会产生大规模的稀疏矩阵,而且很大一部分是一种准对角矩阵。针对准对角矩阵存在的一些不规则性,提出一种混合对角存储(DIA)和行压缩存储(CSR)格式来进行SpMV计算,对于分割出来的对角线区域之外的离散非零元素采用CSR存储,这样能够克服DIA在不规则情况下存储矩阵的列迅速增加的缺陷,同时对角线采用DIA存储又能充分利用矩阵的对角特征,以减少CSR的行非零元素数目的不均衡现象,并可以通过调整存储对角线的带宽来适应准对角矩阵的不同的离散形式,以获得比DIA和CSR更高的压缩比,减小计算的数据规模。利用CUDA平台在GPU上进行了实验测试,结果表明该方法比DIA和CSR具有更高的加速比。

关键词: 图形处理芯片,稀疏矩阵,稀疏矩阵与向量相乘,CUDA 中图法分类号TP311文献标识码A

Abstract: Sparse matrix-vector multiplication(SpMV) is of singular importance in sparse linear algebra,which is an important issue in scientific computing and engineering practice.Much effort has been put into accelerating the SpMV and a few parallel solutions have been proposed.In this paper we focused on a special SpMV,sparse quasi-diagonal matrix multiplication(SQDMV).The sparse quasi diagonal matrix is the key to solve many differential equation and very little research is done on this field.We discussed data structures and algorithms for SQDMV that were efficiently implemented on the CUDA platform for the fine-grained parallel architecture of the GPU.We presented a new diagonal storage format HDC,which overcomes the inefficiency of DIA in storing irregular matrix and the imbalances of CSR in storing non-zero element.Further,HDC can adjust the storage bandwidth of the diagonal to adapt to different discrete degree of sparse matrix,so as to get higher compression ratio than the DIA and CSR,reduce the computation complexity.Our implementation in GPU shows that the performance of HDC is better than other format especially for matrix with some discrete points outside the main diagonal.In addition,we combined the different parts of HDC to a unified kernel to get better compress ration and higher speedup ratio in GPU.

Key words: GPU,Sparse matrix,SpMV,CUDA

[1] Amestoy P R,Davis T A,Duff I S.An approximate minimum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17(4):886-905
[2] 袁娥,张云泉,刘芳芳,等.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,6(7):1117-1126
[3] 白洪涛,欧阳丹彤,李熙铭.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-172,1
[4] 王伟,陈建平,曾国荪.大规模稀疏矩阵的主特征向量计算优化方法[J].计算机科学与探索,2012,6(2):118-124
[5] Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008
[6] Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004. Demcember 2008
[7] Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]∥Proceedings of IEEE 8th Symposium on Application Specific Processors(SASP’10).Anaheim,California,USA,June 2010:67-70
[8] 王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,0(3):46-49
[9] Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures [C]∥Proceedings of International Conference on High-Performance and Embedded Architectures and Compilers(HiPEAC’10).2010:111-125
[10] V′azquez F,Ortega G,Fern′andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]∥Proceedings of IEEE International Conference on Computer and Information Technology(CIT’10).Bradford,June 2010:1146-1151
[11] Baskaran M M,Bordaker R.Optimizing sparse matrix-vectormultiplication on gpus[R].IBM Research Report RC24704(W0812-047).April 2009
[12] Dehnavi M M,Fern′andez D M,Giannacopoulos D.Finite-ele-ment sparse matrix vector multiplication on graphic processing units[J].IEEE Transactions on Magnetics,2010,46(8):2982-2985
[13] Buatois L,Caumon G,L′evy B.Concurrent number cruncher:An efficient sparse linear solver on the gpu[C]∥High Performance Computing and Communications(HPCC’07).Springer-Verlag,2007,4782:358-371
[14] Karakasis V,Goumas G,Koziris N.Perfomance models forblocked sparse matrix-vector multiplication kernels[C]∥Proceedings of the 2009International Conference on Parallel Processing(ICPP’09).Vienna,Austria,September 2009:356-364
[15] Feng Xiao-wen,Jin Hai,Zheng Ran,et al.Optimization ofSparse Matrix-Vector Multiplication with Variant CSR on GPUs[C]∥Proceedings of the 2011IEEE 17th International Conference on Parallel and Distributed Systems(ICPADS’2011).Tainan,Taiwan,2011:165-172
[16] Choi J W,Singh A,Vuduc R W.Model-driven autotuning ofsparse matrix-vector multiply on gpus[C]∥Proceedings of the 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP’10).Bangalore,India,January,2010:115-126
[17] Saad Y.Sparskit:A basic tool-kit for sparse matrix computa-tions,version 2.http://www-users.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html,2005-03
[18] Cevahir A,Nukada A,Matsuoka S.Fast conjugate gradientswith multiple gpus[C]∥Proceedings of the International Conference on Computational Science(ICCS’09).Baton Rouge,Louisiana,USA,May 2010
[19] Hugues M R,Petiton S G.Sparse matrix formats evaluation and optimization on a gpu[C]∥Proceedings of 12th IEEE International Conference on High Performance Computing and Communications(HPCC’10).Melbourne,VIC,September 2010:112-119
[20] Y Xin-tian,Parthasarathy S,Sadayappan P.Fast sparse matrix-vector multiplication on gpus:Implications for graph mining[C]∥Proceedings of the VLDB Endowment.Seattle,Washington,2011
[21] Pichel J C,Rivera F F,Fernández M.Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs[J].Microprocessors and Microsystems,2012,36:65-77
[22] NVIDIA Corporation.NVIDIA Tesla GPU Computing Technical Brief 1.0[Z/0L].http://developer.Nvidia.com/object/cuda.html,2010-06-03
[23] NVIDIA Corporation.CUDA ProgrammingGuide 4.1[EB/OL].http://www.nvidia.com,Nov,2011
[24] Harris M.Optimizing parallel reduction in cuda.http://developer.download.nvidia.com/compute/cuda/11/Website/projects/reduction/doc/reduction.pdf
[25] The University of Florida Sparse Matrix Collection.http://www.cise.ufl.edu/research/sparse/matrices/groups.html
[26] NVIDIA Corporation.CUDA Toolkit 4.2 CUBLAS Library.http://developer.nvidia.com/cuda/cublas/,2012
[27] NVIDIA Corporation.The NVIDIA CUDA Sparse Matrix library(cuSPARSE).http://developer.nvidia.com/cuda/cusparse,2012

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!