GPU平台上面向性能和功耗的分支优化

doi:10.11896/j.issn.1002-137X.2016.05.004

计算机科学 ›› 2016, Vol. 43 ›› Issue (5): 22-26.doi: 10.11896/j.issn.1002-137X.2016.05.004

GPU平台上面向性能和功耗的分支优化

于齐,王博千,沈立,王志英,陈微

国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073,国防科学技术大学计算机学院长沙410073

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受国家自然科学基金项目(61472431,61202121),教育部高等学校博士点新教师基金项目(20114307120013)资助

Branch Divergence Optimization for Performance and Power Consumption on GPU Platform

YU Qi, WANG Bo-qian, SHEN Li, WANG Zhi-ying and CHEN Wei

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 强大的计算能力使得GPGPU在通用计算领域得到了广泛的应用。然而,GPGPU的SIMT(Single Instruction Multiple Threads)工作方式,使其执行效率受到应用中不一致分支行为(Branch Divergence)的严重影响。虽然人们提出了线程交换方法来减小分支带来的性能损失,但这种方法往往会引入额外的访存操作,不仅在一定程度上减少了线程交换优化的性能收益,还增加了功耗。首先举例说明线程交换范围对程序性能和功耗的影响；然后提出了一种减少线程交换所引入的额外访存操作的方法。实验表明,对于Reduction程序,当交换范围为256时,在性能平均损失为4%的情况下功耗降低幅度最大为7%；而对于Bitonic程序,当交换范围为256和512时,在没有功耗开销的情况下,性能分别最大提升了6.4%和5.3%。

关键词: 不一致分支行为,访存,线程交换

Abstract: Because of the tremendous computing power,general purpose graphics processing units(GPGPUs) have been widely accepted in general purpose computing area.However,as GPGPUs using an execution model called SIMT(Single Instruction Multiple Threads),their efficiency is subject to the presence of branch divergence in a GPU application.People have proposed a method based on thread swapping to reduce the performance loss brought by branch divergence,but these methods always bring extra memory accesses in return,which not only decrease the performance gains to a certain degree,but also increase power consumption.Firstly,an example was used to explain the influence thread swapping range has on performance and power consumption of a program.Secondly,a method was proposed to reduce the extra memory accesses brought by thread swapping.Experiments show that,for Reduction,this method reduces power consumption by 7% with average performance loss by 4% when swapping range is 256.While for Bitonic,this method improves performance by 6.4% and 5.3% when swapping range is 256 and 512 with no power consumption overheads,respectively.

Key words: Branch divergence,Memory access,Thread swapping

于齐,王博千,沈立,王志英,陈微. GPU平台上面向性能和功耗的分支优化[J]. 计算机科学, 2016, 43(5): 22-26. https://doi.org/10.11896/j.issn.1002-137X.2016.05.004

YU Qi, WANG Bo-qian, SHEN Li, WANG Zhi-ying and CHEN Wei. Branch Divergence Optimization for Performance and Power Consumption on GPU Platform[J]. Computer Science, 2016, 43(5): 22-26. https://doi.org/10.11896/j.issn.1002-137X.2016.05.004

参考文献

[1] NVIDIA CUDA[EB/OL].[2015-5-15].http://www.nvidia.com/cuda
[2] Zhang E Z,Jiang Yun-lian,Guo Zi-yu,et al.On-the-Fly Elimination of Dynamic Irregularities for GPU Computing[C]∥Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems(ASPLOS).Newport Beach,CA,USA,ACM,2011:369-380
[3] Zhang E Z,Jiang Yun-lian,Guo Zi-yu,et al.Streamlining GPU application on the fly:thread divergence elimination through runtime thread-data remapping [C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba,Ibaraki,Japan,ACM,2010:115-126
[4] Han T Y D,Abdelrahman T S.Reducing Branch Divergence in GPU Programs[C]∥Proceedings of the 4th Workshop on Ge-neral Purpose Processing on Graphics Processing Units.Newport Beach,CA,ACM,2011:1-8
[5] Qian Cheng,Shen Li,Zhao Xia,et al.Thread Swapping Based Optimization Strategy for Sort Algorithms on GPUs[J].Journal of Northeastern University(Natural Science),2014,35(1):68-73(in Chinese) 钱程,沈立,赵夏,等.GPU上基于线程交换的排序算法优化策略[J].东北大学学报(自然科学版),2014,35(1):68-73
[6] Bakhoda A,Yuan G L,Fung W W L,et al.Analyzing CUDAworkloads using a detailed GPU simulator[C]∥IEEE International Symposium on Performance Analysis of Systems and Software.Boston,MA,USA,IEEE,2009:163-174
[7] GPGPU-Sim Manual [EB/OL].[2015-5-15].http://gpgpusim.org/manual/index.php/GPGPU-Sim_3.x_Manual
[8] Xu Qiu-min,Annavaram M.PATS:Pattern Aware Scheduling and Power Gating for GPGPUs[C]∥Proceedings of the 23rd international conference on Parallel Architectures and Compilation.Edmonton,AB,Canada,ACM,2014:225-236
[9] Leng Jing-wen,Tayler H,Ahmed E,et al.GPUWattch:Enabling Energy Optimization in GPGPUs[C]∥Proceedings of 40th Annual International Symposium on Computer Architecture.Tel-Aviv,Israel,ACM,2013:487-498
[10] GPUWattch[EB/OL].[2015-5-15].http://gpgpu-sim.org/gpuwattch

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

GPU平台上面向性能和功耗的分支优化

Branch Divergence Optimization for Performance and Power Consumption on GPU Platform

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0