分离通路浮点乘加器设计与实现

摘要/Abstract

摘要： 针对传统浮点融合乘加器会增加独立浮点加减法、乘法等运算延迟的缺点,首先设计并实现了一种分离通路浮点乘加器SPFMA,通过分离乘法和加法通路,在保持融合乘加运算延迟6拍延迟不变的情况下,将独立乘法和加法等运算延迟由6拍减为4拍,克服了传统融合乘加器的缺点。然后经专用工艺单元库逻辑综合评估,SPFMA可工作在1.2GHz以上,面积60779.44um²。最后在硬件仿真加速器平台上运行SPEC CPU2000浮点测试课题对其进行性能评估,结果表明所有浮点课题性能均有所提高,最大提高5.25％,平均提高1.61％,证明SPFMA可进一步提高浮点性能。

关键词: 浮点加法,浮点乘法,融合乘加,分离通路,浮点性能,运算延迟

Abstract: Considering the shortcoming that the fused multiply-add(FMA)unit increases the latency of separate floa-ting-point addition and multiplication operations,a separated path FMA(SPFMA)unit was designed and implemented firstly．The SPFMA unit can reduce the multiplication and addition latency from 6cycles to 4cycles while keeping the FMA operation latency to 6cycles by separating the multiplication and addition path,overcoming the shortcoming of traditional FMA unit．Then utilizing the specific technology cell library,the SPFMA was logically synthesized and could work at 1.2GHz above with area about 60779.44um².Finally based on the hardware emulation accelerating platform,the performance of the SPFMA unit was estimated through running the SPEC CPU2000floating-point benchmarks．It turned out that the performances of the benchmarks are all improved,5.25% at most and 1.61% on average,which proves that the SPFMA unit helps to promote floating-point performance further.

Key words: Floating-point add,Floating-point multiply,Fused multiply-add,Separated path,Floating-point perfor-mance,Operation latency

何军,黄永勤,朱英. 分离通路浮点乘加器设计与实现[J]. 计算机科学, 2013, 40(8): 28-33. https://doi.org/

HE Jun,HUANG Yong-qin and ZHU Ying. Design and Implementation of Separated Path Floating-point Fused Multiply-Add Unit[J]. Computer Science, 2013, 40(8): 28-33. https://doi.org/

参考文献

[1] Montoye R K,Hokenek E,Runyon S L．Design of the IBM RISC System/6000Floating-Point Execution Unit[J]．IBM Journal of Research and Development,1990,34:61-62
[2] Eisen L,III J W W,Tast H-W,et al.IBM POWER6Accelerators:VMX and DFU [J]．IBM Journal of Research and Development,2007,51:663-684
[3] Boersma M,Kroener M,Layer C,et al.The POWER7 BinaryFloating-Point Unit[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Tübingen,Germany,IEEE Computer Society,2011
[4] Sharangpani H,Arora K．Itanium Processor Microarchitecture[J]．IEEE Micro Magazine,2000,20(5):24-43
[5] Maruyama T,Yoshida T,Kan R,et al.SPARC64VIIIfx:ANew-generation Octocore Processor for Petascale Computing [J]．IEEE Micro, March-April 2010:30-40
[6] Glaskowsky P N．NVIDIA’s Fermi:The First Complete GPUComputing Architecture,Nvidia Fermi Whitepaper [EB/OL]．http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf,2012-09-27
[7] IEEE Computer Society．IEEE Standard for Floating-Point A-rithmetic[S]．IEEE Standard 754-2008．New York,USA,August 2008
[8] Lutz D．Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Tübingen,Germany,IEEE Computer Society,2011
[9] Galal S,Horowitz M．Latency Sensitive FMA Design[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Tübin-gen,Germany,IEEE Computer Society,2011
[10] SPEC．CPF2000(Floating Point Component of SPEC CPU2000)[EB/OL]．http://www.spec.org/cpu2000/CFP2000,2012-09-27
[11] Quach N,Flynn M J．An Improved Algorithm for High-Speed Floating Point Addition[R]．CSL-TR-90-442．Computer Systems Laboratory,Stanford University,Aug．1990
[12] Schwarz E M,Floating-Point B．Unit Design:the fused multiply-add dataflow,High-Performance Energy-Efficient Microprocessor Design [M]∥Oklobdzija V G,Krishnamurthy R K,eds.Springer,Printed in the Netherlands,2006:199-201
[13] Schmookler M S,Nowka K J．Leading Zero Anticipation and Detection A Comparison of Methods[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Vail,CO,USA,IEEE Computer Society,June 2001:11-17
[14] 梅小露．浮点乘加部件中三操作数前导1预测算法的设计[J].微电子学与计算机,2005,22(12):16-20
[15] Lang T,Bruguera J．Floating-Point Fused Multiply-Add withReduced Latency [J]．IEEE Transactions on Computer,2004,53(8):088-1003
[16] Bruguera J D,Lang T．Floating-point fused multipy-add:reduced latency for floating-point addition[C]∥Proc．17th IEEE Symp．Computer Arithmetic．Hyannis,June 2005:27-29
[17] Seidel P M．Multiple path IEEE floating-point fused multiply-add[C]∥Proc．46th Int．IEEE Midwest Symp．Circuits and Systems(MWS-CAS)．2003
[18] Quinnell E．Floating-Point Fused Multiply-Add Architectures[D]．University of Texas at Austin, 2007
[19] 靳战鹏,白永强,沈绪榜．一种64位浮点乘加器的设计与实现[J]．计算机工程与应用,2006,28(18):95-98
[20] 吴铁彬,刘衡竹,杨惠,等．一种快速SIMD浮点乘加器的设计与实现[J]．计算机工程与科学,2012,34(1):69-73

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed