计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 182-187.doi: 10.11896/j.issn.1002-137X.2017.04.040

• 高性能计算 • 上一篇    下一篇

GPU稀疏矩阵向量乘的性能模型构造

尹孟嘉,许先斌,何水兵,胡婧,叶从欢,张涛   

  1. 武汉大学计算机学院 武汉430072;湖北工程学院计算机与信息科学学院 孝感432000,武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072,武汉大学计算机学院 武汉430072,湖北工程学院计算机与信息科学学院 孝感432000,湖北工程学院计算机与信息科学学院 孝感432000
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金面上项目(61572377),国家自然科学基金青年项目(61502154),湖北省教育厅项目(2016179)资助

Performance Model of Sparse Matrix Vector Multiplication on GPU

YIN Meng-jia, XU Xian-bin, HE Shui-bing, HU Jing, YE Cong-huan and ZHANG Tao   

  • Online:2018-11-13 Published:2018-11-13

摘要: 稀疏矩阵向量乘(Sparse matrix-vector multiplication,SPMV)是广泛应用于大规模线性求解系统和求解矩阵特征值等问题的基本运算,但在迭代处理过程中它也常常成为处理的瓶颈,影响算法的整体性能。对于不同形态的矩阵,选择不同的存储格式 ,对应的算法往往会产生较大的性能影响。通过实验分析,找到各种矩阵形态在不同存储结构下体现的性能变化特征,构建一个有效的性能度量模型,为评估稀疏矩阵运算开销、合理选择存储格式做出有效的指导。在14组CSR,COO,HYB格式和8组ELL格式的测试用例下,性能预测模型和测量之间的差异低于9%。

关键词: GPU,稀疏矩阵向量乘,性能模型

Abstract: Sparse matrix-vector multiplication algorithm is widely used in large-scale linear system and solving matrix eigenvalue problems.It also often becomes the bottleneck of processing in iterative process and affects the whole algorithm performance.Choosing a different store format for different forms of matrix,the corresponding algorithms tend to generate large performance impact.Through experimental analysis,the variation performance characteristics were found under different storage structure,so as to build up an effective performance measurement model for assessment of sparse matrix computation overhead,and we selected reasonable storage format and made effective guidance.Based on 14 groups of CSR,COO,HYB format,8 groups of ELL format test cases,the difference between the performance prediction model and measurement is less than 9%.

Key words: GPU,Sparse matrix-vector multiplication,Performance model

[1] SHERESHEVSKY M,CUKIC B,CROWEL J,et al.Software Aging and Multifractality of Memory Resources[C]∥Procee-dings of DSN 2003.Los Alamitos,USA:IEEE Computer Society,2003:721-730.
[2] BAI H T,OUYANG D T,LI X M,et al.Optimizing Sparse Matrix-vecotr Multiplication Based on GPU[J].Computer Science,2010,7(8):168-171,1.(in Chinese) 白洪涛,欧阳丹彤,李熙铭,等.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-171,1.
[3] BELL N,GARLAND M.Implementing Sparse Matrix-VectorMultiplication On Throughput-Oriented Processors[C]∥Proceedings of the Conference on High Performance Computing Networking.New York:Association for Computing Machinery,2009:1-11.
[4] RESIOS A.GPU Performance Prediction Using ParametrizedModels[D].Utrecht:Utrecht University,2011.
[5] VZQUEZ F,GARZON E M,FEMANDEZ J J.A matrix approach to grahpic reconstruction and its implementation on GPUs[J].Journal of Structural Biology,2010,0(1):146-151.
[6] BIK A J C,WIJSHOFF H A G.Automatic data structure selection and transformation for sparse matrix computations[J].IEEE Transactions on Parallel and Distributed Systems,1996,7(2):109-126.
[7] IM E J,YELICK K.Optimizing Sparse Matrix-Vector Multiplication on SMPs[EB/OL].http://www.cs.berkeley.edu/yelick/ejim/ppsc99.ps.
[8] LEE BENJIAMIN C,VUDUC RICHARD W,YELICK D J W,et al.Performance Model for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply[C]∥2004 International Conference on Parallel Processing.United States:IEEE Computer Society,2004:169-174.
[9] FATAHAIAN K,SUGERMAN J,HANRAHAN P.Under-standing the efficiency of GPU algorithms for matrix-matrix multiplications[C]∥Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware.New York:Association for Computing Machinery,2004:133-138.
[10] BOLZ J,FARMER I,GRINSPUN E,et al.Sparse Matrix Sol-vers on the GPU:Conjugate Gradients and Multigrid[J].ACM Transactions on Graphics,2003,2(3):917-924.
[11] VZQUEZ F,ORTEGA G,FERNANDEZ J J,et al.Improving the Performance of the Sparse Matrix Vector Product with GPUs[C]∥10th IEEE International Conference on Computer and Information Technology.Bradford,United States:IEEE Computer Society,2010:1146-1151.
[12] KORNILIOS K,GEORGIOS G,NECTARIOS K.Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression[C]∥Proceedings of the International Conference on Parallel Processing.United States:Institute of Electrical and Electronics Engineers,2008:511-519.
[13] MONAKOV A,AVETISYAN A.Implementing Blocked Sparse Matrix Vector Multiplication on NVIDIA GPUs[C]∥9th International Workshop on Embedded Computer Systems:Architectures,Modeling,and Simulation.Germany:Springer Verlag,2009,5657:289-297.
[14] BULUC A,FINEMAN J T,MATTEO F,et al.Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks[C]∥Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures.New York:Association for Computing Machinery,2009:233-244.
[15] XU S M,XUE W,LIN H X.Performance Modeling and Optimization of Sparse Matrix-Vector Multiplication on NVIDIA CUDA Platform[J].Journal of Supercomputing,2013,3(3):710-721.
[16] ZHANG Y,OWENS J D.A Quantitative Performance Analysis Model for GPU Architectures[C]∥International Symposium on High-Performance Computer Architecture.United states:IEEE Computer Society,2011:382-393.
[17] SARA S B,MATTHIEU D,PATEL S J,et al.An AdaptivePerformance Modeling Tool for GPU Architectures[C]∥Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York:Association for Computing Machinery,2010:105-114.
[18] HONG S,KIM H.An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness[C]∥International Symposium on Computer Architecture.United States:Institute of Electrical and Electronics Engineers,2009:152-163.
[19] KISHORE K,RISHABH M,REHMAN S M,et al.A Performance Prediction Model for the CUDA GPGPU Platform[C]∥16th International Conference on High Performance Computing.United States:IEEE Computer Society,2009:463-472.
[20] LIANG T.The Research on Optimizing Sparse Matrix Computation Based on GPU[D].Wuhan:Huazhong University of Scien-ce and Technology,2012.(in Chinese) 梁添.基于GPU的稀疏矩阵运算优化研究[D].武汉:华中科技大学,2012.
[21] BARRETT R,et al.Templates for the solution of Linear Systems:Building blocks for Iterative Methods[R].Philadelphia:SIAM Press,1994.
[22] RUKHSANA S,ANILA U,CHUGTAI I R.Implementationand Evaluation of Parallel Sparse Matrix-Vector Products on Distributed Memory Parallel Computers[C]∥IEEE International Conference on Cluster Computing.United States:Institute of Electrical and Electronics Engineers,2006.
[23] WANG Z W,CHENG L L,ZHAO W Q.Parallel Computer Performance Analysis Model Based on GPU[J].Computer Science,2014,41(1):31-38.(in Chinese) 王卓薇,程良伦,赵武清.基于GPU的并行计算性能分析模型[J].计算机科学,2014,41(1):31-38.
[24] PING G,WANG L Q,CHEN P.A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs[J].IEEE Transactions on Parallel And Distributed Systems,2014,5(5):1112-1123.
[25] DAVIS TIMOTHY A,HU Y F.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,8(1):1-28.
[26] SAMUEL W,LEONID O,RICHARD V,et al.Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms[J].Parallel Computing,2009,5(3):178-194.
[27] VASILEIOS K,GOUMAS G,NECTARIOS K.Performance Mo-dels for Blocked Sparse Matrix-vector Multiplication Kernels[C]∥2009 International Conference on Parallel Processing.United States:IEEE,2009:356-364.
[28] FENG X W.Research on Key Technology of Similarity Calculation based on GPU[D].Wuhan:Huazhong University of Science and Technology,2014.(in Chinese) 冯晓文.基于GPU的相似度计算关键技术研究[D].武汉:华中科技大学,2014.
[29] CHEN R X.Parallelized XML Query Based on Functional Intermediate Language[J].Journal of Chongqing Uiversity of Technology(Natural Science),2011,5(7):81-86.(in Chinese) 陈荣鑫.基于函数式中间语言的XML查询并行化[J].重庆理工大学学报(自然科学),2011,5(7):81-86.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!