Computer Science ›› 2014, Vol. 41 ›› Issue (1): 31-38.
Previous Articles Next Articles
WANG Zhuo-wei,CHENG Liang-lun and ZHAO Wu-qing
[1] Profiler A S.ATI Stream Profiler.http://developer.amd.com [2] Nsight N P.NVIDIA Parallel Nsight.http://developer.nvidia.com [3] Collange S,et al.Barra:A Parallel Functional Simulator forGPGPU[C]∥IEEE International Symposium on Modeling,Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS).2010 [4] Diamos G F,et al.Ocelot:A dynamic optimization frameworkfor bulk-synchronous applications in heterogeneous systems[C]∥ 19th International Conference on Parallel Architectures and Compilation Techniques,PACT 2010.Vienna,Austria:Institute of Electrical and Electronics Engineers Inc,2010 [5] Ryoo S,et al.Program optimization carving for GPU computing[J].Journal of Parallel and Distributed Computing,2008,68(10):1389-1401 [6] Liu Y,Zhang E Z,Shen X.A Cross-Input Adaptive Framework for GPU Program Optimizations[C]∥23rd IEEE International Parallel and Distributed Processing Symposium,IPDPS 2009.Rome,Italy:IEEE Computer Society,2009 [7] Meng J,Skadron K.Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs[C]∥23rd International Conference on Supercomputing,ICS’09.Yorktown Heights,NY,United states:Association for Computing Machine-ry,2009 [8] Choi J W,Singh A,Vuduc R W.Model-driven autotuning ofsparse matrix-vector multiply on GPUs[C]∥2010ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,PPoPP’10.Bangalore,India:Association for Computing Machinery,2010 [9] Baskaran M M,et al.A compiler framework for optimization of affine loop nests for GPGPUs[C]∥22nd ACM International Conference on Supercomputing,ICS’08.Island of Kos,Greece:Association for Computing Machinery,2008 [10] Collange S,et al.Barra:A Parallel Functional Simulator forGPGPU.in Modeling,Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)[C]∥2010IEEE International Symposium on.2010 [11] Volkov V,Demmel J W.Benchmarking GPUs to tune dense linear algebra[C]∥2008SC-International Conference for High Performance Computing,Networking,Storage and Analysis,SC 2008.Austin,TX,United states:IEEE Computer Society,2008 [12] Zhang Y,Cohen J,Owens J D.Fast tridiagonal solvers on the GPU[C]∥2010ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,PPoPP’10.Bangalore,India:Association for Computing Machinery,2010 [13] Goddeke D,Strzodka R.Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid [J].IEEE Transactions on Parallel and Distributed Systems,2011,23(1):22-32 [14] Bell N,Garland M.Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]∥SC’09:Procee-dings of the 2009ACM/IEEE Conference on Supercomputing.Nov.2009,18:1-11 [15] Choi J W,Singh A,Vuduc R W.Model driven autotuning of sparse matrix-vector multiply on GPUs[C]∥Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2010).ACM,Jan.2010:115-126 |
No related articles found! |
|