GPU矩阵乘法的性能定量分析模型

Abstract

Abstract: Performance evaluation and optimization are indispensable work when designing efficient parallel program,and the performance of storage system directly affects the performance of the processor.We used GPGPU-Sim to simulate the storage hierarchy of GPU,and found out optimal quantity allocation relationship between SM and storage controller in GPU.Matrix multiplication is an essential part in the field of scientific computing,as a representative application with both computation and memory access intensiveness,and its performance is an important indicator of GPU high-performance computing.Performance model is a new technology solution as parallel systems performance evaluation,which has many advantages.In order to improve the performance of matrix multiplication,this paper proposed a quantitative performance model based on GPU.The model quantitatively analyzes instruction pipeline,shared memory access and global memory access,establishes the performance model,finds the performance bottlenecks and improves the execution speed.The experiment proves the model has practicability,and effectively realizes the optimization of the matrix multiplication algorithm.

Key words: GPU,GPGPU-Sim,Matrix multiplication,Quantitative performance analysis model,Instruction pipeline,Shared memory access,Global memory access

YIN Meng-jia, XU Xian-bin, XIONG Zeng-gang and ZHANG Tao. Quantitative Performance Analysis Model of Matrix Multiplication Based on GPU[J].Computer Science, 2015, 42(12): 13-17.

References

[1] Liu Jie,Chi Li-hua,Jiang Jie,et al.Performance evaluation me-thodology for massively parallel computer systems[J].Compu-ter Engineering&Science,2013,5(3):25-30
[2] Dongarra J J,Luszczek P,Petitet A.The LINPACK benchmark:Past,present,and future [J].Concurrency and Computation:Practice and Experience,2003,15(9):803-820
[3] SPEC benchmarks.http://www.spec.org/benchmarks.html
[4] Luszczek P,Dongarra J,Koester D,et al.Introduction the HPC challenge benchmark suite.http://icl.cs.utk.edu/hpcc/pubs March
[5] Yuan Nan,Zhou Yong-bin,Tan Guang-ming,et al.High Performance Matrix Multiplication on Many Cores.http://asg.ict.ac.cn/tgm/europar09.pdf
[6] Gunnels J A,Henry G M,Van De Geijn R A.A family of high-performance matrix multiplication algorithms:Lecture Notes in Computer Science,2001[C]∥Proceedings of the International Conference on Computational Science( ICCS’01).Springer-Verlag,2001:51-60
[7] Long Guo-ping,Fan Dong-rui,Zhang Jun-chao,et al.A perfor-mance model of dense matrix operations on many-core architectures:Lecture Notes in Computer Science,2008[C]∥Euro-Par 2008-Parallel Processing.Las Palmas de Gran Canaria,Spain:Springer Berlin Heidelberg,2008:120-129
[8] Liang Juan-juan.Design and implementation based on GPUBLAS library [D].Hefei:University of Science and Technology of China,2010
[9] Baghsorkhi S S,Delahaye M,Patel S J,et al.An adaptive performance modeling tool for GPU architectures[C]∥Proceedings of the 15^th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010).ACM,2010:105-114
[10] Hong S,Kim H.An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness[C]∥Proceedings of the 36^th International Symposium on Computer Architecture(ISCA 2009).2009:152-163
[11] Yu Zhi-bin,Jin Hai,Zou Nan-hai.Computer architecture software-based simulation[J].Journal of Software,2008,9(4):1051-1068
[12] Cai Jing.Analysis Key Technologies of GPGPU Architectureand Research,Extend on Simulator[D].Changsha:National University of Defense Technology,2009
[13] Zhang Shu,Chu Yan-li.GPU High-performance computing[M].Beijing:China Water & Power Press,2009
[14] Wang Zhuo-wei,Cheng Liang-lun,Zhao Wu-qing.Parallel Computation Performance Analysis Model Based on GPU[J].Computer Science,2014,1(1):31-38
[15] Wang Zhuo-wei.Research on Performance Optimization for Numerical Computation based on GPU [D].Wuhan:Wuhan University,2012
[16] Cheng Si-yuan.Research on Performance Evaluation and Optimization for CPU-GPU Heterogeneous System [D].Changsha:National University of Defense Technology,2011
[17] Wai Lun-fung.Dynamic Warp Formation:Exploiting Thread S-cheduling for Efficient MIMD Control Flow on SIMD Graphics Hardware [D].University of British Columbia,2008
[18] Volkov V,Demmel J W.Benchmarking GPUs to tune dense li-near algebra,2008[C]∥2008 SC International Conference for High Performance Computing,Networking,Storage and Analysis(SC 2008).United States:IEEE Computer Society,2008
[19] 邹航,王华秋,黄勇.基于GPU加速的彩虹表分析MD5哈希密码[J].重庆理工大学学报(自然科学版),2013,27(7):61-66 Zou Hang,Wang Hua-qiu,Huang Yong.GPU Accelerated Rainbow Tables Analysis of MD5 Has Password[J].Journal of Chongqing University of Technology(Natural Science),2013,7(7):61-66

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Quantitative Performance Analysis Model of Matrix Multiplication Based on GPU

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0