计算机科学 ›› 2013, Vol. 40 ›› Issue (Z6): 210-216.

• 模式识别 • 上一篇    下一篇

图形处理器低功耗设计技术研究

田泽,张骏,许宏杰,郭亮,黎小玉   

  1. 中国航空工业西安航空计算技术研究所 西安710077;中国航空工业西安航空计算技术研究所 西安710077;中国航空工业西安航空计算技术研究所 西安710077;中国航空工业西安航空计算技术研究所 西安710077;中国航空工业西安航空计算技术研究所 西安710077
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受2012总装预研基金(9140A08010712HK61095),中国航空工业集团公司创新基金(2010BD63111)资助

Low-power Design Techniques for GPU

TIAN Ze,ZHANG Jun,XU Hong-jie,GUO Liang and LI Xiao-yu   

  • Online:2018-11-16 Published:2018-11-16

摘要: 图形处理器(GPU)以其强大的图形加速性能以及在通用计算领域的出色表现正在被越来越广泛地应用。但随着芯片规模和集成度的不断提升,单个GPU芯片的功耗已经高达376W,是高端通用处理器的2~3倍。高功耗带来的可靠性、稳定性以及芯片成本问题使“功耗墙”已经成为未来GPU设计过程中需要突破的关键问题之一。立足于体系结构层次,结合图形处理器的渲染流水线的结构特点,从深度测试和消隐、染色器数据通路、纹理映射和压缩、渲染策略、寄存器文件和片上Cache等角度描述了图形处理器的低功耗设计技术,并指出了GPU低功耗设计技术的进一步研究方向。

关键词: 图形处理器,低功耗,渲染,Cache

Abstract: Graphic Processing Unit(GPU) is extensively used to accelerate the graphic processing and general purpose computing.But unfortunately,along with the increasing of chip scale and density,power consumption of single GPU reaches 376W,which is two or three times higher than a typical high-end general purpose CPU.Stability,reliability and cost problems brought by high power consumption had already made the “Power Wall” be a key obstacle which must be broken through.On architecture level,and combined with the structure characteristics of GPU rendering pipeline,this paper demonstrates the GPU low-power design techniques,such as depth test and elimination,shader data path,texture mapping and compression,rendering strategy,register file and Cache,and then indicates the further research content for GPU low-power design.

Key words: Graphic processing unit(GPU),Low-power,Rendering,Cache

[1] Woo R.A 210-mW graphics LSI implementing full 3D pipeline with 264-Mtexels/s texturing for mobile multimedia applications[J].IEEE,J.Solid-St.Circ.,2004,39(2):358-367
[2] Yosida K,Sakamoto T,Hase T.A 3D graphics library for 32-bit mocroprocessors for embedded systems[J].IEEE Trans.Consum.Electron.,1998,44(4):1107-1114
[3] Jr M J N.Computer multiplication and division using binary logarithms[J].IEEE Trans.Electron.Comput.,1962,11:512-517
[4] SanGregory S L,Siferd R E,Brother C,et al.A fast,low-power logarithm approximation with CMOS VLSI implementation[C]∥Proc.of IEEE Midwest Symposium on Circuits and Systems.1999:388-391
[5] Combet M,Zonneveld H,Verbeek L.Computation of the base-two logarithm of binary numbers[J].IEEE Trans.Electron.Comput.,1965,14:863-867
[6] Kim H.A 231-MHz,2.18-mW 32-bit logarithmic arithmetic unit for fixed-point 3D graphics system[J].IEEE J.Solid-St.Circ.,2006,41(11):2373-2381
[7] Williams L.Pyramidal parametrics[C]∥Proc.of SIGGRAPH.1983:1-11
[8] Hakura Z S,Gupta A.The design and analysis of a cache architecture for texture mapping[C]∥Proc.of 24th International Symposium on Computer Architecture.1997:108-120
[9] Park Y-H,Han S-H,Kim J-S.A 7.1-GB/s low -power 3D rendering engine in 2D array embedded memory logic CMOS[C]∥Digest of Technical Papers of IEEE International Solid-State Circuits Conference.2000
[10] Knittel G,Schilling A,Kugler A,et al.Hardware for superior texture performance[J].Comput.Graph.,1996,20(4):475-481
[11] Strom J,Akenine-Moller T.iPACKMAN:high-quality,low-complexity texture compression for mobile phones[C]∥HWWS’05:Proceedings of ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.2005:63-70
[12] Akenine-Moller T,Strom J.Graphics for the masses:a hardware rasterization architecture for mobile phones[C]∥Proceedings of the ACM SIGGRAPH International Conference on Computer Graphics and Interactive Techniques(SIGGRAPH’03).ACM,New York,NY,2003:801-808
[13] Woo J-H.A 195/152-mW mobile multimedia SoC with fullyprogrammable 3D graphics and MPEG4/H.264/JPEG[J].IEEE J.Solid-St.Circ.,2008,43(9):2047-2056
[14] ARM Corporation.AMBA 2.0Specification[S].Revision 2.0
[15] NVIDIA’s Next Generation CUDA Compute Architecture:Fermi,v1.1.White Paper.http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf,2009-09
[16] Wittenbrink C M,Kilgariff E,Prabhu A.Fermi GF100GPU Architecture[J].IEEE Micro,March/April,2011,31(2):50-59
[17] 粱宇,韩奇,魏同力.低功耗数字系统设计方法[J].东南大学学报:自然科学版,2000(5):30
[18] GOODHEAD.Matrix hd 5870power consumption and thermals.http://www.bittech.net/hardware/graphics/2010/07/15/asus-matrix-hd-5870-graphics-card- review/7
[19] Lindholm E,Nickolls J,Oberman S,et al.NVIDIA TESLA:A Unified Graphics and Computing Architecture[J].IEEE Micro,March/April,2008,28(2):39-55
[20] Sheaffer J W,Luebke D,Skadron K.A flexible simulationframework for graphics architectures[C]∥Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware(HWWS’04).ACM,New York,NY,2004:85-94
[21] Mochocki B,Lahiri K,Cadambi S,et al.Signature-based workload estimation for mobile 3d graphics[C]∥Proceedings of the 43rd Annual Conference on Design Automation(DAC’06).ACM,New York,NY,2006:592-597
[22] Lee N B-G,et al.A low-power handheld GPU using logarithmic arithmetic and tripleDVFS power domains[C]∥Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics hardware(GH’07).2007:73-80
[23] Igehy H,Eldridge M,Proudfoot K.Prefetching in a texturecache architecture[C]∥HWWS’98:Proceedings of the ACM SIGGRAPH/ EUROGRAPHICS Workshop on Graphics Hardware.ACM,New York,NY,1998:133-142
[24] Powell M,Yang S-H,Falsafi B,et al.Gated-vdd:A circuit technique to reduce leakage in deep-sub micron cache memories[C]∥Proceedings of the International Symposium on Low Power Electronics and Design(ISLPED’00).ACM,New York,NY,2000:90-95
[25] Kaxiras S,Hu Z,Martonosi M.Cache decay:exploiting generational behavior to reduce cache leakage power[C]∥Proceedings of the 28th Annual International Symposium on Computer Architecture(ISCA’01).ACM,New York,NY,2001:240-251
[26] Flautner K,Kim N S,Martin S,et al.Drowsy caches:Simple techniques for reducing leakage power[C]∥Proceedings of the 29th Annual International Symposium on Computer Architecture(ISCA’02).IEEE Computer Society,2002:148-157
[27] Youssef A,Anis M,Elmasry M.Dynamic standby prediction for leakage tolerant microprocessor functional units[C]∥Procee-dings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture.IEEE Computer Society,2006:371-384
[28] 杨毅,郭立,史鸿声,等.面向移动设备的3D图形处理器设计[J].小型微型计算机系统,2009,30(8):1668-1672
[29] 韩俊刚,蒋林,杜慧敏,等.一种图形加速器和着色器的体系结构[J].计算机辅助设计与图形学学报,2010,23(3):363-372
[30] 韩俊刚,刘有耀,张晓.图形处理器的历史现状与发展趋势[J].西安邮电学院学报,2011,6(3):61-64

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!