Computer Science ›› 2016, Vol. 43 ›› Issue (5): 22-26.doi: 10.11896/j.issn.1002-137X.2016.05.004

Previous Articles     Next Articles

Branch Divergence Optimization for Performance and Power Consumption on GPU Platform

YU Qi, WANG Bo-qian, SHEN Li, WANG Zhi-ying and CHEN Wei   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Because of the tremendous computing power,general purpose graphics processing units(GPGPUs) have been widely accepted in general purpose computing area.However,as GPGPUs using an execution model called SIMT(Single Instruction Multiple Threads),their efficiency is subject to the presence of branch divergence in a GPU application.People have proposed a method based on thread swapping to reduce the performance loss brought by branch divergence,but these methods always bring extra memory accesses in return,which not only decrease the performance gains to a certain degree,but also increase power consumption.Firstly,an example was used to explain the influence thread swapping range has on performance and power consumption of a program.Secondly,a method was proposed to reduce the extra memory accesses brought by thread swapping.Experiments show that,for Reduction,this method reduces power consumption by 7% with average performance loss by 4% when swapping range is 256.While for Bitonic,this method improves performance by 6.4% and 5.3% when swapping range is 256 and 512 with no power consumption overheads,respectively.

Key words: Branch divergence,Memory access,Thread swapping

[1] NVIDIA CUDA[EB/OL].[2015-5-15].http://www.nvidia.com/cuda
[2] Zhang E Z,Jiang Yun-lian,Guo Zi-yu,et al.On-the-Fly Elimination of Dynamic Irregularities for GPU Computing[C]∥Proceedings of the 16th International Conference on Architecture Support for Programming Languages and Operating Systems(ASPLOS).Newport Beach,CA,USA,ACM,2011:369-380
[3] Zhang E Z,Jiang Yun-lian,Guo Zi-yu,et al.Streamlining GPU application on the fly:thread divergence elimination through runtime thread-data remapping [C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba,Ibaraki,Japan,ACM,2010:115-126
[4] Han T Y D,Abdelrahman T S.Reducing Branch Divergence in GPU Programs[C]∥Proceedings of the 4th Workshop on Ge-neral Purpose Processing on Graphics Processing Units.Newport Beach,CA,ACM,2011:1-8
[5] Qian Cheng,Shen Li,Zhao Xia,et al.Thread Swapping Based Optimization Strategy for Sort Algorithms on GPUs[J].Journal of Northeastern University(Natural Science),2014,35(1):68-73(in Chinese) 钱程,沈立,赵夏,等.GPU上基于线程交换的排序算法优化策略[J].东北大学学报(自然科学版),2014,35(1):68-73
[6] Bakhoda A,Yuan G L,Fung W W L,et al.Analyzing CUDAworkloads using a detailed GPU simulator[C]∥IEEE International Symposium on Performance Analysis of Systems and Software.Boston,MA,USA,IEEE,2009:163-174
[7] GPGPU-Sim Manual [EB/OL].[2015-5-15].http://gpgpusim.org/manual/index.php/GPGPU-Sim_3.x_Manual
[8] Xu Qiu-min,Annavaram M.PATS:Pattern Aware Scheduling and Power Gating for GPGPUs[C]∥Proceedings of the 23rd international conference on Parallel Architectures and Compilation.Edmonton,AB,Canada,ACM,2014:225-236
[9] Leng Jing-wen,Tayler H,Ahmed E,et al.GPUWattch:Enabling Energy Optimization in GPGPUs[C]∥Proceedings of 40th Annual International Symposium on Computer Architecture.Tel-Aviv,Israel,ACM,2013:487-498
[10] GPUWattch[EB/OL].[2015-5-15].http://gpgpu-sim.org/gpuwattch

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!