基于SIMD部件的四倍精度浮点乘加器设计

Abstract

Abstract: It is an important issue to resolve to decrease the hardware cost and operation latency for the implementation of quadruple precision floating-point arithmetic．To decrease the hardware cost of floating-point quadruple fused multiply add (QPFMA) unit,a new QPFMA unit was designed and realized based on a SIMD device,which supports 64bit×4double precision floating-point fused multiply add (DPFMA)．The new QPFMA supports four kinds of FMA operation,multiplication,addition,subtraction and comparison,with the operation latency of 7cycles．By decomposing the 113bit×113bit multiplication of quadruple precision fraction into four 57bit×57bit multiplications to share the 53bit×53bit multipliers of SIMD DPFMA,the hardware cost of the new QPFMA is reduced greatly．Using the 65nm cell library,the new QPFMA is synthesized．The results show its frequency is 1.1GHz and area is 42.71% of a normal QPFMA unit,only equal to the area of a DPFMA unit．Comparing to current QPFMA design,the operation latency decreases by 3cycles and the gate number reduces by 65.96% in equivalent technology and at comparative frequency.

Key words: Floating-point,SIMD device,Fused multiply-add,Quadruple precision,High precision

HE Jun,HUANG Yong-qin and ZHU Ying. Design of Quadruple Precision Floating-point Fused Multiply-Add Unit Based on SIMD Device[J].Computer Science, 2013, 40(12): 15-18.

References

[1] Bailey D H．High-precision floating-point arithmetic in scientific computation [J]．Computing in Science and Engineering,2005,7(3):54-61
[2] IEEE Computer Society．IEEE Standard for Floating-Point A-rithmetic[S]．IEEE Standard 754-2008,3Park Avenue New York,NY 10016-5997,USA,August 2008
[3] 黎铁军,李秋亮,徐炜遐．一种128位高性能全流水浮点乘加部件[J]．国防科技大学学报,2010,32(2):56-60
[4] Akkas A,Schulte M J．Dual-Mode Floating-Point Multiplier Architectures with Parallel Operations [J]．Journal of Systems Architecture,2006,52:549-562
[5] Akkas A．Dual-Mode Quadruple Precision Floating Point Adder[C]∥9th Euromicro Conference on Digital System Design．2006:211-220
[6] Akkas A．A Dual-Mode Quadruple Precision Floating-Point Divider[C]∥Fortieth Asilomar Conference on Signals,Systems and Computers．2006:1697-1701
[7] Gok M,Ozbilen M M．Multi-functional floating-point MAF designs with dot product support[J]．Microelectronics Journal,2008,39(1):30-43
[8] Huang Li-bo,Ma Sheng,Shen Li,et al.Low-Cost Binary128Floating-Point FMA Unit Design with SIMD Support[J]．IEEE Transactions on Computers,2012,1(5):745-751
[9] 张峰,黎铁军,徐炜遐．一种128位高精度浮点乘加部件的研究与实现[J]．计算机工程与科学,2009,31(2):93-103
[10] 雷元武,窦勇,郭松．基于FPGA的高精度科学计算加速器研究[J]．计算机学报,2012,35(1):112-122
[11] Yu Xiao-yan,Chan Yiu-Hing,Curran B,et al.A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6Processor[C]∥Proceedings of the 32nd European Solid-State Circuits Confe-rence．2006:166-169
[12] Intel Company．Intel Compilers and Libraries [EB/OL]．http://soft-ware.intel.com/en-us/articles /intel-cimpilers/,2012,12/24
[13] Fousse L,Hanrot G,Lefevre V,et al．Mpfr:A multiple-precision binary floating-point library with correct rounding [J]．ACM Transactions on Mathematical Software (TOMS),2007,33(2):1-14
[14] Hida Y,Li X S,Bailey D H．Quad-double arithmetic:Algo-rithms,implementation,and application[R]．LBL-46996．Lawrence Berkeley National Laboratory,Berkeley,CA,2000
[15] Firasta N,et al.Intel AVX:New Frontiers in Performance Improvements and Energy Efficiency[M]．White paper,2008
[16] IBM Corporation．PowerPC Microprocessor Family:Vector/SIMD Multimedia Extension Technology Programming Environments Manual [M]．2005
[17] Trong S D,Schmookler M,Schwarz E M,et al.POWER6Binary Floating-Point Unit[C]∥Proceedings of the 18th IEEE Symposium on Computer Arithmetic．Montpellier,France,2007:77-86
[18] Boersma M,Kroener M,Layer C,et al.The POWER7 BinaryFloating-Point Unit[C]∥Proceedings of IEEE Symposium on Computer Arithmetic．Tübingen,Germany,IEEE Computer Society,2011
[19] Haring R A,Ohmacht M,Fox T W,et al．The IBM Blue Gene/Q Compute Chip [M]．IEEE Micro,March/April 2012:48-60
[20] TOP500．TOP500 supercomputing sites [EB/OL]．http://www.top500.org/lists/2012/06,2012
[21] Maruyama T,Yoshida T,Kan R,et al．SPARC64VIIIfx:a New-Generation Octocore Processor for Petascale Computing[M]．IEEE Micro,March/April 2010:30-40

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Design of Quadruple Precision Floating-point Fused Multiply-Add Unit Based on SIMD Device

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0