基于SIMD部件的四倍精度浮点乘加器设计

摘要/Abstract

摘要： 如何减少四倍精度浮点运算的硬件开销和延迟是需要解决的重要问题。为减少四倍精度乘加器的硬件开销,基于支持64位×4的双精度浮点SIMD FMA部件,设计并实现了一种新的四倍精度浮点乘加器(QPFMA),来支持4种浮点乘加运算和乘法、加减法、比较运算,运算延迟为7拍。通过将四倍精度113位×113位尾数乘法器分解为4个57位×57位乘法器来共享双精度浮点SIMD FMA部件的53位×53位乘法器,显著减少了实现QPFMA的硬件开销。基于65nm工艺的逻辑综合结果表明,该QPFMA频率可达1.1GHz,面积是常规QPFMA设计的42.71％,仅与一个双精度浮点乘加器相当。与现有的QPFMA设计相比,相当工艺和频率下,其运算延迟减少了3拍,门数减少了65.96％。

关键词: 浮点,SIMD部件,乘加,四倍精度,高精度

Abstract: It is an important issue to resolve to decrease the hardware cost and operation latency for the implementation of quadruple precision floating-point arithmetic．To decrease the hardware cost of floating-point quadruple fused multiply add (QPFMA) unit,a new QPFMA unit was designed and realized based on a SIMD device,which supports 64bit×4double precision floating-point fused multiply add (DPFMA)．The new QPFMA supports four kinds of FMA operation,multiplication,addition,subtraction and comparison,with the operation latency of 7cycles．By decomposing the 113bit×113bit multiplication of quadruple precision fraction into four 57bit×57bit multiplications to share the 53bit×53bit multipliers of SIMD DPFMA,the hardware cost of the new QPFMA is reduced greatly．Using the 65nm cell library,the new QPFMA is synthesized．The results show its frequency is 1.1GHz and area is 42.71% of a normal QPFMA unit,only equal to the area of a DPFMA unit．Comparing to current QPFMA design,the operation latency decreases by 3cycles and the gate number reduces by 65.96% in equivalent technology and at comparative frequency.

Key words: Floating-point,SIMD device,Fused multiply-add,Quadruple precision,High precision

何军,黄永勤,朱英. 基于SIMD部件的四倍精度浮点乘加器设计[J]. 计算机科学, 2013, 40(12): 15-18. https://doi.org/

HE Jun,HUANG Yong-qin and ZHU Ying. Design of Quadruple Precision Floating-point Fused Multiply-Add Unit Based on SIMD Device[J]. Computer Science, 2013, 40(12): 15-18. https://doi.org/

参考文献

[1] Bailey D H．High-precision floating-point arithmetic in scientific computation [J]．Computing in Science and Engineering,2005,7(3):54-61
[2] IEEE Computer Society．IEEE Standard for Floating-Point A-rithmetic[S]．IEEE Standard 754-2008,3Park Avenue New York,NY 10016-5997,USA,August 2008
[3] 黎铁军,李秋亮,徐炜遐．一种128位高性能全流水浮点乘加部件[J]．国防科技大学学报,2010,32(2):56-60
[4] Akkas A,Schulte M J．Dual-Mode Floating-Point Multiplier Architectures with Parallel Operations [J]．Journal of Systems Architecture,2006,52:549-562
[5] Akkas A．Dual-Mode Quadruple Precision Floating Point Adder[C]∥9th Euromicro Conference on Digital System Design．2006:211-220
[6] Akkas A．A Dual-Mode Quadruple Precision Floating-Point Divider[C]∥Fortieth Asilomar Conference on Signals,Systems and Computers．2006:1697-1701
[7] Gok M,Ozbilen M M．Multi-functional floating-point MAF designs with dot product support[J]．Microelectronics Journal,2008,39(1):30-43
[8] Huang Li-bo,Ma Sheng,Shen Li,et al.Low-Cost Binary128Floating-Point FMA Unit Design with SIMD Support[J]．IEEE Transactions on Computers,2012,1(5):745-751
[9] 张峰,黎铁军,徐炜遐．一种128位高精度浮点乘加部件的研究与实现[J]．计算机工程与科学,2009,31(2):93-103
[10] 雷元武,窦勇,郭松．基于FPGA的高精度科学计算加速器研究[J]．计算机学报,2012,35(1):112-122
[11] Yu Xiao-yan,Chan Yiu-Hing,Curran B,et al.A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6Processor[C]∥Proceedings of the 32nd European Solid-State Circuits Confe-rence．2006:166-169
[12] Intel Company．Intel Compilers and Libraries [EB/OL]．http://soft-ware.intel.com/en-us/articles /intel-cimpilers/,2012,12/24
[13] Fousse L,Hanrot G,Lefevre V,et al．Mpfr:A multiple-precision binary floating-point library with correct rounding [J]．ACM Transactions on Mathematical Software (TOMS),2007,33(2):1-14
[14] Hida Y,Li X S,Bailey D H．Quad-double arithmetic:Algo-rithms,implementation,and application[R]．LBL-46996．Lawrence Berkeley National Laboratory,Berkeley,CA,2000
[15] Firasta N,et al.Intel AVX:New Frontiers in Performance Improvements and Energy Efficiency[M]．White paper,2008
[16] IBM Corporation．PowerPC Microprocessor Family:Vector/SIMD Multimedia Extension Technology Programming Environments Manual [M]．2005
[17] Trong S D,Schmookler M,Schwarz E M,et al.POWER6Binary Floating-Point Unit[C]∥Proceedings of the 18th IEEE Symposium on Computer Arithmetic．Montpellier,France,2007:77-86
[18] Boersma M,Kroener M,Layer C,et al.The POWER7 BinaryFloating-Point Unit[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Tübingen,Germany,IEEE Computer Society,2011
[19] Haring R A,Ohmacht M,Fox T W,et al．The IBM Blue Gene/Q Compute Chip [M]．IEEE Micro,March/April 2012:48-60
[20] TOP500．TOP500 supercomputing sites [EB/OL]．http://www.top500.org/lists/2012/06,2012
[21] Maruyama T,Yoshida T,Kan R,et al．SPARC64VIIIfx:a New-Generation Octocore Processor for Petascale Computing[M]．IEEE Micro,March/April 2010:30-40

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed