Computer Science ›› 2017, Vol. 44 ›› Issue (4): 182-187.doi: 10.11896/j.issn.1002-137X.2017.04.040

Previous Articles     Next Articles

Performance Model of Sparse Matrix Vector Multiplication on GPU

YIN Meng-jia, XU Xian-bin, HE Shui-bing, HU Jing, YE Cong-huan and ZHANG Tao   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Sparse matrix-vector multiplication algorithm is widely used in large-scale linear system and solving matrix eigenvalue problems.It also often becomes the bottleneck of processing in iterative process and affects the whole algorithm performance.Choosing a different store format for different forms of matrix,the corresponding algorithms tend to generate large performance impact.Through experimental analysis,the variation performance characteristics were found under different storage structure,so as to build up an effective performance measurement model for assessment of sparse matrix computation overhead,and we selected reasonable storage format and made effective guidance.Based on 14 groups of CSR,COO,HYB format,8 groups of ELL format test cases,the difference between the performance prediction model and measurement is less than 9%.

Key words: GPU,Sparse matrix-vector multiplication,Performance model

[1] SHERESHEVSKY M,CUKIC B,CROWEL J,et al.Software Aging and Multifractality of Memory Resources[C]∥Procee-dings of DSN 2003.Los Alamitos,USA:IEEE Computer Society,2003:721-730.
[2] BAI H T,OUYANG D T,LI X M,et al.Optimizing Sparse Matrix-vecotr Multiplication Based on GPU[J].Computer Science,2010,7(8):168-171,1.(in Chinese) 白洪涛,欧阳丹彤,李熙铭,等.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-171,1.
[3] BELL N,GARLAND M.Implementing Sparse Matrix-VectorMultiplication On Throughput-Oriented Processors[C]∥Proceedings of the Conference on High Performance Computing Networking.New York:Association for Computing Machinery,2009:1-11.
[4] RESIOS A.GPU Performance Prediction Using ParametrizedModels[D].Utrecht:Utrecht University,2011.
[5] VZQUEZ F,GARZON E M,FEMANDEZ J J.A matrix approach to grahpic reconstruction and its implementation on GPUs[J].Journal of Structural Biology,2010,0(1):146-151.
[6] BIK A J C,WIJSHOFF H A G.Automatic data structure selection and transformation for sparse matrix computations[J].IEEE Transactions on Parallel and Distributed Systems,1996,7(2):109-126.
[7] IM E J,YELICK K.Optimizing Sparse Matrix-Vector Multiplication on SMPs[EB/OL].http://www.cs.berkeley.edu/yelick/ejim/ppsc99.ps.
[8] LEE BENJIAMIN C,VUDUC RICHARD W,YELICK D J W,et al.Performance Model for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply[C]∥2004 International Conference on Parallel Processing.United States:IEEE Computer Society,2004:169-174.
[9] FATAHAIAN K,SUGERMAN J,HANRAHAN P.Under-standing the efficiency of GPU algorithms for matrix-matrix multiplications[C]∥Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware.New York:Association for Computing Machinery,2004:133-138.
[10] BOLZ J,FARMER I,GRINSPUN E,et al.Sparse Matrix Sol-vers on the GPU:Conjugate Gradients and Multigrid[J].ACM Transactions on Graphics,2003,2(3):917-924.
[11] VZQUEZ F,ORTEGA G,FERNANDEZ J J,et al.Improving the Performance of the Sparse Matrix Vector Product with GPUs[C]∥10th IEEE International Conference on Computer and Information Technology.Bradford,United States:IEEE Computer Society,2010:1146-1151.
[12] KORNILIOS K,GEORGIOS G,NECTARIOS K.Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression[C]∥Proceedings of the International Conference on Parallel Processing.United States:Institute of Electrical and Electronics Engineers,2008:511-519.
[13] MONAKOV A,AVETISYAN A.Implementing Blocked Sparse Matrix Vector Multiplication on NVIDIA GPUs[C]∥9th International Workshop on Embedded Computer Systems:Architectures,Modeling,and Simulation.Germany:Springer Verlag,2009,5657:289-297.
[14] BULUC A,FINEMAN J T,MATTEO F,et al.Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks[C]∥Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures.New York:Association for Computing Machinery,2009:233-244.
[15] XU S M,XUE W,LIN H X.Performance Modeling and Optimization of Sparse Matrix-Vector Multiplication on NVIDIA CUDA Platform[J].Journal of Supercomputing,2013,3(3):710-721.
[16] ZHANG Y,OWENS J D.A Quantitative Performance Analysis Model for GPU Architectures[C]∥International Symposium on High-Performance Computer Architecture.United states:IEEE Computer Society,2011:382-393.
[17] SARA S B,MATTHIEU D,PATEL S J,et al.An AdaptivePerformance Modeling Tool for GPU Architectures[C]∥Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York:Association for Computing Machinery,2010:105-114.
[18] HONG S,KIM H.An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness[C]∥International Symposium on Computer Architecture.United States:Institute of Electrical and Electronics Engineers,2009:152-163.
[19] KISHORE K,RISHABH M,REHMAN S M,et al.A Performance Prediction Model for the CUDA GPGPU Platform[C]∥16th International Conference on High Performance Computing.United States:IEEE Computer Society,2009:463-472.
[20] LIANG T.The Research on Optimizing Sparse Matrix Computation Based on GPU[D].Wuhan:Huazhong University of Scien-ce and Technology,2012.(in Chinese) 梁添.基于GPU的稀疏矩阵运算优化研究[D].武汉:华中科技大学,2012.
[21] BARRETT R,et al.Templates for the solution of Linear Systems:Building blocks for Iterative Methods[R].Philadelphia:SIAM Press,1994.
[22] RUKHSANA S,ANILA U,CHUGTAI I R.Implementationand Evaluation of Parallel Sparse Matrix-Vector Products on Distributed Memory Parallel Computers[C]∥IEEE International Conference on Cluster Computing.United States:Institute of Electrical and Electronics Engineers,2006.
[23] WANG Z W,CHENG L L,ZHAO W Q.Parallel Computer Performance Analysis Model Based on GPU[J].Computer Science,2014,41(1):31-38.(in Chinese) 王卓薇,程良伦,赵武清.基于GPU的并行计算性能分析模型[J].计算机科学,2014,41(1):31-38.
[24] PING G,WANG L Q,CHEN P.A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs[J].IEEE Transactions on Parallel And Distributed Systems,2014,5(5):1112-1123.
[25] DAVIS TIMOTHY A,HU Y F.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,8(1):1-28.
[26] SAMUEL W,LEONID O,RICHARD V,et al.Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms[J].Parallel Computing,2009,5(3):178-194.
[27] VASILEIOS K,GOUMAS G,NECTARIOS K.Performance Mo-dels for Blocked Sparse Matrix-vector Multiplication Kernels[C]∥2009 International Conference on Parallel Processing.United States:IEEE,2009:356-364.
[28] FENG X W.Research on Key Technology of Similarity Calculation based on GPU[D].Wuhan:Huazhong University of Science and Technology,2014.(in Chinese) 冯晓文.基于GPU的相似度计算关键技术研究[D].武汉:华中科技大学,2014.
[29] CHEN R X.Parallelized XML Query Based on Functional Intermediate Language[J].Journal of Chongqing Uiversity of Technology(Natural Science),2011,5(7):81-86.(in Chinese) 陈荣鑫.基于函数式中间语言的XML查询并行化[J].重庆理工大学学报(自然科学),2011,5(7):81-86.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!