基于GPU的并行计算性能分析模型

Abstract

Abstract: In order to solve the problem of lacking accurate performance analysis model in parallel computation field based on GPU,we proposed a quantitative performance model which can simulate the performance of three major components of GPU including instruction pipeline,shared memory access time,and global memory access time．It is designed to build a performance model that helps programmer find the performance bottlenecks and improve the system’s performance efficiently．To demonstrate the usefulness of the model and to optimize the algorithms performance,we analyzed three representative real-world programs:dense matrix multiplication,tridiagonal systems solver,and sparse matrix vector multiplication.

Key words: GPU,Quantitative performance model,Instruction pipeline,Shared memory access time,Global memory access time

WANG Zhuo-wei,CHENG Liang-lun and ZHAO Wu-qing. Parallel Computation Performance Analysis Model Based on GPU[J].Computer Science, 2014, 41(1): 31-38.

References

[1] Profiler A S．ATI Stream Profiler．http://developer.amd.com
[2] Nsight N P．NVIDIA Parallel Nsight．http://developer.nvidia.com
[3] Collange S,et al．Barra:A Parallel Functional Simulator forGPGPU[C]∥IEEE International Symposium on Modeling,Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)．2010
[4] Diamos G F,et al．Ocelot:A dynamic optimization frameworkfor bulk-synchronous applications in heterogeneous systems[C]∥ 19th International Conference on Parallel Architectures and Compilation Techniques,PACT 2010．Vienna,Austria:Institute of Electrical and Electronics Engineers Inc,2010
[5] Ryoo S,et al.Program optimization carving for GPU computing[J]．Journal of Parallel and Distributed Computing,2008,68(10):1389-1401
[6] Liu Y,Zhang E Z,Shen X．A Cross-Input Adaptive Framework for GPU Program Optimizations[C]∥23rd IEEE International Parallel and Distributed Processing Symposium,IPDPS 2009．Rome,Italy:IEEE Computer Society,2009
[7] Meng J,Skadron K．Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs[C]∥23rd International Conference on Supercomputing,ICS’09．Yorktown Heights,NY,United states:Association for Computing Machine-ry,2009
[8] Choi J W,Singh A,Vuduc R W．Model-driven autotuning ofsparse matrix-vector multiply on GPUs[C]∥2010ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,PPoPP’10．Bangalore,India:Association for Computing Machinery,2010
[9] Baskaran M M,et al．A compiler framework for optimization of affine loop nests for GPGPUs[C]∥22nd ACM International Conference on Supercomputing,ICS’08．Island of Kos,Greece:Association for Computing Machinery,2008
[10] Collange S,et al．Barra:A Parallel Functional Simulator forGPGPU．in Modeling,Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)[C]∥2010IEEE International Symposium on．2010
[11] Volkov V,Demmel J W．Benchmarking GPUs to tune dense linear algebra[C]∥2008SC-International Conference for High Performance Computing,Networking,Storage and Analysis,SC 2008．Austin,TX,United states:IEEE Computer Society,2008
[12] Zhang Y,Cohen J,Owens J D．Fast tridiagonal solvers on the GPU[C]∥2010ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,PPoPP’10．Bangalore,India:Association for Computing Machinery,2010
[13] Goddeke D,Strzodka R．Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid [J]．IEEE Transactions on Parallel and Distributed Systems,2011,23(1):22-32
[14] Bell N,Garland M．Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]∥SC’09:Procee-dings of the 2009ACM/IEEE Conference on Supercomputing．Nov.2009,18:1-11
[15] Choi J W,Singh A,Vuduc R W．Model driven autotuning of sparse matrix-vector multiply on GPUs[C]∥Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010)．ACM,Jan．2010:115-126

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Parallel Computation Performance Analysis Model Based on GPU

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0