Computer Science ›› 2013, Vol. 40 ›› Issue (8): 28-33.

Previous Articles     Next Articles

Design and Implementation of Separated Path Floating-point Fused Multiply-Add Unit

HE Jun,HUANG Yong-qin and ZHU Ying   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Considering the shortcoming that the fused multiply-add(FMA)unit increases the latency of separate floa-ting-point addition and multiplication operations,a separated path FMA(SPFMA)unit was designed and implemented firstly.The SPFMA unit can reduce the multiplication and addition latency from 6cycles to 4cycles while keeping the FMA operation latency to 6cycles by separating the multiplication and addition path,overcoming the shortcoming of traditional FMA unit.Then utilizing the specific technology cell library,the SPFMA was logically synthesized and could work at 1.2GHz above with area about 60779.44um2.Finally based on the hardware emulation accelerating platform,the performance of the SPFMA unit was estimated through running the SPEC CPU2000floating-point benchmarks.It turned out that the performances of the benchmarks are all improved,5.25% at most and 1.61% on average,which proves that the SPFMA unit helps to promote floating-point performance further.

Key words: Floating-point add,Floating-point multiply,Fused multiply-add,Separated path,Floating-point perfor-mance,Operation latency

[1] Montoye R K,Hokenek E,Runyon S L.Design of the IBM RISC System/6000Floating-Point Execution Unit[J].IBM Journal of Research and Development,1990,34:61-62
[2] Eisen L,III J W W,Tast H-W,et al.IBM POWER6Accelerators:VMX and DFU [J].IBM Journal of Research and Development,2007,51:663-684
[3] Boersma M,Kroener M,Layer C,et al.The POWER7 BinaryFloating-Point Unit[C]∥Proceedings of IEEE Symposium on Computer Arithmetic.Tübingen,Germany,IEEE Computer Society,2011
[4] Sharangpani H,Arora K.Itanium Processor Microarchitecture[J].IEEE Micro Magazine,2000,20(5):24-43
[5] Maruyama T,Yoshida T,Kan R,et al.SPARC64VIIIfx:ANew-generation Octocore Processor for Petascale Computing [J].IEEE Micro, March-April 2010:30-40
[6] Glaskowsky P N.NVIDIA’s Fermi:The First Complete GPUComputing Architecture,Nvidia Fermi Whitepaper [EB/OL].http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf,2012-09-27
[7] IEEE Computer Society.IEEE Standard for Floating-Point A-rithmetic[S].IEEE Standard 754-2008.New York,USA,August 2008
[8] Lutz D.Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines[C]∥Proceedings of IEEE Symposium on Computer Arithmetic.Tübingen,Germany,IEEE Computer Society,2011
[9] Galal S,Horowitz M.Latency Sensitive FMA Design[C]∥Proceedings of IEEE Symposium on Computer Arithmetic.Tübin-gen,Germany,IEEE Computer Society,2011
[10] SPEC.CPF2000(Floating Point Component of SPEC CPU2000)[EB/OL].http://www.spec.org/cpu2000/CFP2000,2012-09-27
[11] Quach N,Flynn M J.An Improved Algorithm for High-Speed Floating Point Addition[R].CSL-TR-90-442.Computer Systems Laboratory,Stanford University,Aug.1990
[12] Schwarz E M,Floating-Point B.Unit Design:the fused multiply-add dataflow,High-Performance Energy-Efficient Microprocessor Design [M]∥Oklobdzija V G,Krishnamurthy R K,eds.Springer,Printed in the Netherlands,2006:199-201
[13] Schmookler M S,Nowka K J.Leading Zero Anticipation and Detection A Comparison of Methods[C]∥Proceedings of IEEE Symposium on Computer Arithmetic.Vail,CO,USA,IEEE Computer Society,June 2001:11-17
[14] 梅小露.浮点乘加部件中三操作数前导1预测算法的设计[J].微电子学与计算机,2005,22(12):16-20
[15] Lang T,Bruguera J.Floating-Point Fused Multiply-Add withReduced Latency [J].IEEE Transactions on Computer,2004,53(8):088-1003
[16] Bruguera J D,Lang T.Floating-point fused multipy-add:reduced latency for floating-point addition[C]∥Proc.17th IEEE Symp.Computer Arithmetic.Hyannis,June 2005:27-29
[17] Seidel P M.Multiple path IEEE floating-point fused multiply-add[C]∥Proc.46th Int.IEEE Midwest Symp.Circuits and Systems(MWS-CAS).2003
[18] Quinnell E.Floating-Point Fused Multiply-Add Architectures[D].University of Texas at Austin, 2007
[19] 靳战鹏,白永强,沈绪榜.一种64位浮点乘加器的设计与实现[J].计算机工程与应用,2006,28(18):95-98
[20] 吴铁彬,刘衡竹,杨惠,等.一种快速SIMD浮点乘加器的设计与实现[J].计算机工程与科学,2012,34(1):69-73

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!