Computer Science ›› 2016, Vol. 43 ›› Issue (11): 30-35.doi: 10.11896/j.issn.1002-137X.2016.11.006

Previous Articles     Next Articles

High-precision Architecture-level Power Model Based on GPU

WANG Zhuo-wei, CHENG Liang-lun and XIAO Hong   

  • Online:2018-12-01 Published:2018-12-01

Abstract: As hardware functions are constantly developing,and software development environments gradually mature,the graphics processing unit (GPU) has been applied to general purpose computation to help the central processing unit (CPU) accelerate a program.To obtain high performance,a GPU generally contain hundreds of core arithmetic units.Owing to the existence of high-density computing resources,the performance of the GPU is much superior to that of the CPU,while its power consumption is larger than that of the CPU.Power consumption has become one of the important issues restricting the development of GPU.Based on the study of the Fermi GPU architecture,a high-precision architecture-level power model was proposed in this research.In this model,the power consumed by different native instructions,and each memory access,were first calculated,then the power consumption was analysed and predicted according to the execution instructions as applied to the hardware,and the sampling results were acquired using sampling instruments.Finally,the results obtained from practical testing and the power model were compared by using 13 benchmark applications.It is demonstrated that the prediction accuracy of the model can reach approximately 90%.

Key words: GPU,Fermi,Power model,Native instruction,Memory power consumption

[1] Luebke D,Harris M,Govindaraju N,et al.GPGPU:General-purpose computation on graphics hareware[C]∥Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC’06).Tampa,Florida,2004:87-91
[2] Lin Y S,Yang X J,Tang T,et al.A GPU low-power optimization based on parallelism analysis model [J].Chinese Journal of Computers,2011,34(4):705-716(in Chinese) 林一松,杨学军,唐滔,等.一种基于并行度分析模型的GPU功耗优化技术[J].计算机学报,2011,34(4):705-716
[3] Sumpo H,Hyesoon K.An integrated GPU power performance model [C]∥Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10).New York,NY,USA,2010:280-289
[4] Wang G B.Model-driven multi-dimensional lower-power optimization method for GPU [J].Chinese Journal of Computers,2012,5(5):979-988(in Chinese) 王桂彬.模型指导的多维GPU软件低功耗优化方法[J].计算机学报,2012,35(5):979-989
[5] Tiwari V,Malik S,Wolfe A,et al.Instruction level power analysis and optimization of software [C]∥Proceedings of Ninth International Conference on VLSI Design.Jan.1996:326-328
[6] Sinha A,Chandraksan A.Joule Track-a Web based tool for software energy profiling [C]∥Design Automation Conference,2001.2001:220-225
[7] Steinke S,Knauer M,Wehmeyer L,et al.An accurate and fine grain instruction-level energy model supporting software optimizations [C]∥Proc.Int.Wkshp Power,Timing Modeling,Optimization and Simulation PATMOS.September 2001:1-10
[8] Landman P.High-level power estimation [C]∥Proceedings of the 1996 International Symposium on Low Power Electronics and Design.Piscataway,NJ,USA,1996:29-35
[9] Isci C,Martonosi.Runtime power monitoring in high-end pro-cessors:Methodology and empirical data[C]∥Proceedings MICRO-36.2003
[10] Ma X,Dong M,Zhong L,et al.Statistical Power Consumption Analysis and Modeling for GPU-based Computing[M]∥Workshop on Power Aware Computing and Systems(Hot Power 09).2009:1-5
[11] NVIDIA Inc.NVIDIA PerfKit.http://developer.nvidia.com/nvidia-perfkit
[12] Nagasaka H,Maruyama N,Nukada A,et al.Statistical PowerModeling of GPU Kernels Using Performance Counters[C]∥Proceedings of International Green Computing Conference.2010:115-122
[13] Extech Instruments Corporation.Power Analyzer Model 380801.http://www.extech.com/instruments/resource/manuals/380801_803_UM.pdf
[14] Zhang Y,Parikh D,Sankaranarayanan K,et al.Hotleakage:A temperature-aware model of subthreshold and gate leakage for architects[R].Technical Report,University of Virginia,2003
[15] NVIDIA Inc.GPU Computing SDK.http://developer.nvidia.com/gpu-cpmputing-sdk
[16] Parboil Benchmark suite.http://impact.crhc.illinois.edu/parboil.php
[17] Che S,Boyer M,Meng J,et al.Rodinia:A benchmark suite for heterogeneous computing[C]∥Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC’09).2009:44-54
[18] Lai Y,Spanier J.Applications of Monte Carlo/quasi-MonteCarlo methodes in Finance:Option pricing[C]∥Monte Carlo/quasi-Monte Carlo Methods.Proceedings of a conference held at the Claremont Graduate Univ.,CA,USA,June 1998.Springer,Berlin,1999:284-295
[19] Williams S,Oliker L,Vuduc R,et al.Optimization of sparse matrixvector multiplication on emerging multicore platforms[C]∥2007 ACM/IEEE Conference on Supercomputing.2007:1-2
[20] Dazevedo E F,Fahey M R,Mills R T.Vectorized sparse matrix multiply for compressed row storage format[C]∥International Conference on Computational Science(ICCS).Springer,2005:99-106
[21] Bell N,Garland M.Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]∥Proceedings of the 2009 ACM/IEEE conference on Supercomputing(SC ’09).New York,NY,USA,2009:1-11
[22] Blelloch G E,Heroux M A,Zagha M.Segmented operations for sparse matrix computation on vector multiprocessors[R].Technical Report CMU-CS-93-173,School of Computer Science,Carnegie Mellon University.1993
[23] Williams S,Oliker L,Vuduc R,et al.Optimization of sparse matrix-vector multiplication on emerging multicore platforms[C]∥Prceedings of the 2007 ACM/IEEE Conference on Supercomputing.ACM New York,NY,USA,2007
[24] Harris M.Optimizng Parallel Reduction in CUDA[C]∥Proc.of ACM SIGMOD.2007:104-110
[25] Li Y,Dongarra J,Tomov S.A note on auto-tuning GEMM for GPUs[R].Tech.report,LAPACK Working Note 212,2009
[26] Black F,Scholer M.Th Pricing of Options and Corporate Liabi-lities[J].Journal of Political Economy,1973,81(3):637-654
[27] Ruetsch G,Mcikevicius P.Optimizing Matrix Transpose in CUDA[R].Nvidia Cuda Sdk Application Note,2009
[28] Mathworld W.Histogram.http://mathworld.wolfram.com/Histogram.html
[29] Matsumoto M,Nishimura T,Twister M.A 623-dimensionallly euidistributed uniform pseudorandom number generator [J].ACM Transactions on Modeling and Computer Simulation,1998,8(1):3-30
[30] Matsumoto M,Nishimura T.Dynamic creation of pseudorandom number generators[C]∥Monte-Carlo and Quasi-Monte Carlo Methods 1998:56-69
[31] Kipfer P,Segal M,Westermann R,et al.Uberflow:A gpu-based particle engine[C]∥In ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware 2004.Grenoble,France,2004:115-122
[32] GPUOcelot.http://code.google.com/p/gpuocelot

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!