计算机科学 ›› 2014, Vol. 41 ›› Issue (7): 290-296.doi: 10.11896/j.issn.1002-137X.2014.07.060
阳王东,李肯立,石林
YANG Wang-dong,LI Ken-li and SHI Lin
摘要: 稀疏矩阵与向量乘(SpMV)属于科学计算和工程应用中的一种基本运算,其高性能实现与优化是计算科学的研究热点之一。在微分方程的求解过程中会产生大规模的稀疏矩阵,而且很大一部分是一种准对角矩阵。针对准对角矩阵存在的一些不规则性,提出一种混合对角存储(DIA)和行压缩存储(CSR)格式来进行SpMV计算,对于分割出来的对角线区域之外的离散非零元素采用CSR存储,这样能够克服DIA在不规则情况下存储矩阵的列迅速增加的缺陷,同时对角线采用DIA存储又能充分利用矩阵的对角特征,以减少CSR的行非零元素数目的不均衡现象,并可以通过调整存储对角线的带宽来适应准对角矩阵的不同的离散形式,以获得比DIA和CSR更高的压缩比,减小计算的数据规模。利用CUDA平台在GPU上进行了实验测试,结果表明该方法比DIA和CSR具有更高的加速比。
[1] Amestoy P R,Davis T A,Duff I S.An approximate minimum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17(4):886-905 [2] 袁娥,张云泉,刘芳芳,等.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,6(7):1117-1126 [3] 白洪涛,欧阳丹彤,李熙铭.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-172,1 [4] 王伟,陈建平,曾国荪.大规模稀疏矩阵的主特征向量计算优化方法[J].计算机科学与探索,2012,6(2):118-124 [5] Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008 [6] Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004. Demcember 2008 [7] Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]∥Proceedings of IEEE 8th Symposium on Application Specific Processors(SASP’10).Anaheim,California,USA,June 2010:67-70 [8] 王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,0(3):46-49 [9] Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures [C]∥Proceedings of International Conference on High-Performance and Embedded Architectures and Compilers(HiPEAC’10).2010:111-125 [10] V′azquez F,Ortega G,Fern′andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]∥Proceedings of IEEE International Conference on Computer and Information Technology(CIT’10).Bradford,June 2010:1146-1151 [11] Baskaran M M,Bordaker R.Optimizing sparse matrix-vectormultiplication on gpus[R].IBM Research Report RC24704(W0812-047).April 2009 [12] Dehnavi M M,Fern′andez D M,Giannacopoulos D.Finite-ele-ment sparse matrix vector multiplication on graphic processing units[J].IEEE Transactions on Magnetics,2010,46(8):2982-2985 [13] Buatois L,Caumon G,L′evy B.Concurrent number cruncher:An efficient sparse linear solver on the gpu[C]∥High Performance Computing and Communications(HPCC’07).Springer-Verlag,2007,4782:358-371 [14] Karakasis V,Goumas G,Koziris N.Perfomance models forblocked sparse matrix-vector multiplication kernels[C]∥Proceedings of the 2009International Conference on Parallel Processing(ICPP’09).Vienna,Austria,September 2009:356-364 [15] Feng Xiao-wen,Jin Hai,Zheng Ran,et al.Optimization ofSparse Matrix-Vector Multiplication with Variant CSR on GPUs[C]∥Proceedings of the 2011IEEE 17th International Conference on Parallel and Distributed Systems(ICPADS’2011).Tainan,Taiwan,2011:165-172 [16] Choi J W,Singh A,Vuduc R W.Model-driven autotuning ofsparse matrix-vector multiply on gpus[C]∥Proceedings of the 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming(PPoPP’10).Bangalore,India,January,2010:115-126 [17] Saad Y.Sparskit:A basic tool-kit for sparse matrix computa-tions,version 2.http://www-users.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html,2005-03 [18] Cevahir A,Nukada A,Matsuoka S.Fast conjugate gradientswith multiple gpus[C]∥Proceedings of the International Conference on Computational Science(ICCS’09).Baton Rouge,Louisiana,USA,May 2010 [19] Hugues M R,Petiton S G.Sparse matrix formats evaluation and optimization on a gpu[C]∥Proceedings of 12th IEEE International Conference on High Performance Computing and Communications(HPCC’10).Melbourne,VIC,September 2010:112-119 [20] Y Xin-tian,Parthasarathy S,Sadayappan P.Fast sparse matrix-vector multiplication on gpus:Implications for graph mining[C]∥Proceedings of the VLDB Endowment.Seattle,Washington,2011 [21] Pichel J C,Rivera F F,Fernández M.Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs[J].Microprocessors and Microsystems,2012,36:65-77 [22] NVIDIA Corporation.NVIDIA Tesla GPU Computing Technical Brief 1.0[Z/0L].http://developer.Nvidia.com/object/cuda.html,2010-06-03 [23] NVIDIA Corporation.CUDA ProgrammingGuide 4.1[EB/OL].http://www.nvidia.com,Nov,2011 [24] Harris M.Optimizing parallel reduction in cuda.http://developer.download.nvidia.com/compute/cuda/11/Website/projects/reduction/doc/reduction.pdf [25] The University of Florida Sparse Matrix Collection.http://www.cise.ufl.edu/research/sparse/matrices/groups.html [26] NVIDIA Corporation.CUDA Toolkit 4.2 CUBLAS Library.http://developer.nvidia.com/cuda/cublas/,2012 [27] NVIDIA Corporation.The NVIDIA CUDA Sparse Matrix library(cuSPARSE).http://developer.nvidia.com/cuda/cusparse,2012 |
No related articles found! |
|