计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 182-187.doi: 10.11896/j.issn.1002-137X.2017.04.040
尹孟嘉,许先斌,何水兵,胡婧,叶从欢,张涛
YIN Meng-jia, XU Xian-bin, HE Shui-bing, HU Jing, YE Cong-huan and ZHANG Tao
摘要: 稀疏矩阵向量乘(Sparse matrix-vector multiplication,SPMV)是广泛应用于大规模线性求解系统和求解矩阵特征值等问题的基本运算,但在迭代处理过程中它也常常成为处理的瓶颈,影响算法的整体性能。对于不同形态的矩阵,选择不同的存储格式 ,对应的算法往往会产生较大的性能影响。通过实验分析,找到各种矩阵形态在不同存储结构下体现的性能变化特征,构建一个有效的性能度量模型,为评估稀疏矩阵运算开销、合理选择存储格式做出有效的指导。在14组CSR,COO,HYB格式和8组ELL格式的测试用例下,性能预测模型和测量之间的差异低于9%。
[1] SHERESHEVSKY M,CUKIC B,CROWEL J,et al.Software Aging and Multifractality of Memory Resources[C]∥Procee-dings of DSN 2003.Los Alamitos,USA:IEEE Computer Society,2003:721-730. [2] BAI H T,OUYANG D T,LI X M,et al.Optimizing Sparse Matrix-vecotr Multiplication Based on GPU[J].Computer Science,2010,7(8):168-171,1.(in Chinese) 白洪涛,欧阳丹彤,李熙铭,等.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,7(8):168-171,1. [3] BELL N,GARLAND M.Implementing Sparse Matrix-VectorMultiplication On Throughput-Oriented Processors[C]∥Proceedings of the Conference on High Performance Computing Networking.New York:Association for Computing Machinery,2009:1-11. [4] RESIOS A.GPU Performance Prediction Using ParametrizedModels[D].Utrecht:Utrecht University,2011. [5] VZQUEZ F,GARZON E M,FEMANDEZ J J.A matrix approach to grahpic reconstruction and its implementation on GPUs[J].Journal of Structural Biology,2010,0(1):146-151. [6] BIK A J C,WIJSHOFF H A G.Automatic data structure selection and transformation for sparse matrix computations[J].IEEE Transactions on Parallel and Distributed Systems,1996,7(2):109-126. [7] IM E J,YELICK K.Optimizing Sparse Matrix-Vector Multiplication on SMPs[EB/OL].http://www.cs.berkeley.edu/yelick/ejim/ppsc99.ps. [8] LEE BENJIAMIN C,VUDUC RICHARD W,YELICK D J W,et al.Performance Model for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply[C]∥2004 International Conference on Parallel Processing.United States:IEEE Computer Society,2004:169-174. [9] FATAHAIAN K,SUGERMAN J,HANRAHAN P.Under-standing the efficiency of GPU algorithms for matrix-matrix multiplications[C]∥Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware.New York:Association for Computing Machinery,2004:133-138. [10] BOLZ J,FARMER I,GRINSPUN E,et al.Sparse Matrix Sol-vers on the GPU:Conjugate Gradients and Multigrid[J].ACM Transactions on Graphics,2003,2(3):917-924. [11] VZQUEZ F,ORTEGA G,FERNANDEZ J J,et al.Improving the Performance of the Sparse Matrix Vector Product with GPUs[C]∥10th IEEE International Conference on Computer and Information Technology.Bradford,United States:IEEE Computer Society,2010:1146-1151. [12] KORNILIOS K,GEORGIOS G,NECTARIOS K.Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression[C]∥Proceedings of the International Conference on Parallel Processing.United States:Institute of Electrical and Electronics Engineers,2008:511-519. [13] MONAKOV A,AVETISYAN A.Implementing Blocked Sparse Matrix Vector Multiplication on NVIDIA GPUs[C]∥9th International Workshop on Embedded Computer Systems:Architectures,Modeling,and Simulation.Germany:Springer Verlag,2009,5657:289-297. [14] BULUC A,FINEMAN J T,MATTEO F,et al.Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks[C]∥Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures.New York:Association for Computing Machinery,2009:233-244. [15] XU S M,XUE W,LIN H X.Performance Modeling and Optimization of Sparse Matrix-Vector Multiplication on NVIDIA CUDA Platform[J].Journal of Supercomputing,2013,3(3):710-721. [16] ZHANG Y,OWENS J D.A Quantitative Performance Analysis Model for GPU Architectures[C]∥International Symposium on High-Performance Computer Architecture.United states:IEEE Computer Society,2011:382-393. [17] SARA S B,MATTHIEU D,PATEL S J,et al.An AdaptivePerformance Modeling Tool for GPU Architectures[C]∥Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York:Association for Computing Machinery,2010:105-114. [18] HONG S,KIM H.An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness[C]∥International Symposium on Computer Architecture.United States:Institute of Electrical and Electronics Engineers,2009:152-163. [19] KISHORE K,RISHABH M,REHMAN S M,et al.A Performance Prediction Model for the CUDA GPGPU Platform[C]∥16th International Conference on High Performance Computing.United States:IEEE Computer Society,2009:463-472. [20] LIANG T.The Research on Optimizing Sparse Matrix Computation Based on GPU[D].Wuhan:Huazhong University of Scien-ce and Technology,2012.(in Chinese) 梁添.基于GPU的稀疏矩阵运算优化研究[D].武汉:华中科技大学,2012. [21] BARRETT R,et al.Templates for the solution of Linear Systems:Building blocks for Iterative Methods[R].Philadelphia:SIAM Press,1994. [22] RUKHSANA S,ANILA U,CHUGTAI I R.Implementationand Evaluation of Parallel Sparse Matrix-Vector Products on Distributed Memory Parallel Computers[C]∥IEEE International Conference on Cluster Computing.United States:Institute of Electrical and Electronics Engineers,2006. [23] WANG Z W,CHENG L L,ZHAO W Q.Parallel Computer Performance Analysis Model Based on GPU[J].Computer Science,2014,41(1):31-38.(in Chinese) 王卓薇,程良伦,赵武清.基于GPU的并行计算性能分析模型[J].计算机科学,2014,41(1):31-38. [24] PING G,WANG L Q,CHEN P.A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs[J].IEEE Transactions on Parallel And Distributed Systems,2014,5(5):1112-1123. [25] DAVIS TIMOTHY A,HU Y F.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,8(1):1-28. [26] SAMUEL W,LEONID O,RICHARD V,et al.Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms[J].Parallel Computing,2009,5(3):178-194. [27] VASILEIOS K,GOUMAS G,NECTARIOS K.Performance Mo-dels for Blocked Sparse Matrix-vector Multiplication Kernels[C]∥2009 International Conference on Parallel Processing.United States:IEEE,2009:356-364. [28] FENG X W.Research on Key Technology of Similarity Calculation based on GPU[D].Wuhan:Huazhong University of Science and Technology,2014.(in Chinese) 冯晓文.基于GPU的相似度计算关键技术研究[D].武汉:华中科技大学,2014. [29] CHEN R X.Parallelized XML Query Based on Functional Intermediate Language[J].Journal of Chongqing Uiversity of Technology(Natural Science),2011,5(7):81-86.(in Chinese) 陈荣鑫.基于函数式中间语言的XML查询并行化[J].重庆理工大学学报(自然科学),2011,5(7):81-86. |
No related articles found! |
|