%A TANG Tao, PENG Lin, HUANG Chun and YANG Can-qun %T Performance Analysis of GPU Programs Towards Better Memory Hierarchy Design %0 Journal Article %D 2017 %J Computer Science %R 10.11896/j.issn.1002-137X.2017.12.001 %P 1-10 %V 44 %N 12 %U {https://www.jsjkx.com/CN/abstract/article_16481.shtml} %8 2018-12-01 %X With higher peak performance and energy efficiency than CPUs,as well as increasingly mature software environment,GPUs have become one of the most popular accelerators to build heterogeneous parallel computing systems.Generally,GPU hides memory access latency through flexible and light-weight thread switch mechanism,but its memory system faces severe pressure because of the massive parallelism and its actual performance is enormously impacted by the efficiency of memory access operations.Therefore,the analysis and optimization of GPU program’s memory access behavior have always been hot research topics in GPU-related studies.However,few existing works have analyzed the impact of memory hierarchy design on performance from the view of architecture.In order to better guide the design of GPU’s memory hierarchy and program optimizations,we analyzed the influence of GPU’s each memory hierarchy on the program performance in detail from the view of experiment in this paper,and summarized several strategies for both the memory hierarchy design of future GPU-like architectures and program optimizations.