Computer Science ›› 2021, Vol. 48 ›› Issue (6): 26-33.doi: 10.11896/jsjkx.200400007

• Computer Architecture • Previous Articles     Next Articles

Implementation of Transcendental Functions on Vectors Based on SIMD Extensions

LIU Dan, GUO Shao-zhong, HAO Jiang-wei, XU Jin-chen   

  1. State Key Laboratory of Mathematical Engineering and Advanced Computing,PLA Information Engineering University,Zhengzhou 450002,China
  • Received:2020-04-02 Revised:2020-07-20 Online:2021-06-15 Published:2021-06-03
  • About author:LIU Dan,born in 1988,postgraduate,is a student member of China Computer Federation.His main research interests include high-performance computing and so on.(liudanmath@foxmail.com)
    XU Jin-chen,born in 1987,Ph.D,lectu-rer,is a member of China Computer Fe-deration.His main research interests include high-performance computing and so on.
  • Supported by:
    National Natural Science Foundation of China(61802434).

Abstract: The basic mathematical function library is a critical soft module in the computer system.However,the long vector transcendental function on the domestic Shenwei platform can only be implemented indirectly by cyclic utilizing the system scalar function currently,thus limiting the computing capability of the SIMD extensions of Shenwei platform.In order to solve this problem effectively,this paper implements the long vector transcendental function based on lower-level optimization of SIMD extensions of Shenwei platform and proposes the floating-point computing fusion algorithm for solving the problem that the two-branch structure algorithm is difficult to vectorize.It also proposes the implementation method of higher degree polynomials based on the dynamic grouping of Estrin algorithm,which improves the pipelining performance of polynomial assembly evaluation.This is the first time to implement the long vector transcendental function library on the Shenwei platform.The providedfunction interfaces include trigonometric functions,inverse trigonometric functions,logarithmic functions,exponential functions,etc.The experimental result shows that the maximum error of double precision version is controlled below 3.5ULP (unit in the last place),and the maximum error of single precision version is controlled below 0.5ULP.Compared with the scalar function of Shenwei platform,the performance is significantly improved,and the average speedup ratio is 3.71.

Key words: Basic mathematic library, Domestic platform, Floating point calculation, Pipeline optimization, Vector transcendental function

CLC Number: 

  • TP311
[1]Intel Corporation.Intel© Math Kernel Library[OL].[2019-11-01].https://software.intel.com/en-us/mkl.
[2]Nara Institute of Science and Technology.SLEEF VectorizedMath Library[OL].[2019-11-01].https://sleef.org.
[3]SHIBATA N,PETROGALLI F.SLEEF:A Portable Vectorized Library of C Standard Mathematical Functions[J].IEEE Transactions on Parallel and Distributed Systems,2019(99):1316-1327.
[4]Free Software Foundation.The GNU C Library (glibc)[OL].[2019-11-01].https://www.gnu.org/software/libc/.
[5]PIPARO D,INNOCENTE V,HAUTHT.Speeding up HEP experiment software with a library of fast and auto-vectorisable mathematical functions[C]//20th International Conference on Computing in High Energy and Nuclear Physics.Bristol,United Kingdom:IOP,2013:20-27.
[6]LAUTER C.A new open-source SIMD vector libm fully implemented with high-level scalar C[C]//2016 50th Asilomar Conference on Signals,Systems and Computers.Piscataway,NJ:IEEE,2016:407-411.
[7]BRUNIE N,DINECHIN F D,KUPRIIANOVAO,et al.CodeGenerators for Mathematical Functions[C]//2015 IEEE 22nd Symposium on Computer Arithmetic (ARITH).Piscataway,NJ:IEEE,2015:66-73.
[8]LOW J Y L,JONG C C.A Memory-Efficient Tables-and-Additions Method for Accurate Computation of Elementary Functions[J].IEEE Transactions on Computers,2013,62(5):858-872.
[9]LASSUS H D,DEFOUR D,REVY G.Exact Lookup Tables for the Evaluation of Trigonometric and Hyperbolic Functions[J].IEEE Transactions on Computers,2017(99):1-14.
[10]EYERMAN S,SMITH J E,EECKHOUT L.Characterizing the branch misprediction penalty[C]//IEEE International Sympo-sium on Performance Analysis of Systems and Software.Pisca-taway,NJ:IEEE,2006:48-58.
[11]MULLER J M.Elementary functions:algorithms and implementation[M].Boston:birkhauser,2016:84-85.
[12]CHEVILLARD S,MIOARA J,LAUTER C.Sollya:An Environment for the Development of Numerical Codes[C]//International Congress on Mathematical Software.Berlin:Springer,2010:28-31.
[13]American National Standards Institute and Institute of Electrical and Electronic Engineers.IEEE Standard 754-2008[S].New York:American National Standards Institute,2008.
[14]Sun Microsystems,Inc.Freely Distributable LIBM[OL].[2019-11-01].http://www.netlib.org/fdlibm.
[15]TANG P P T.Table-driven implementation of the Expm1 function in IEEE floating-point arithmetic[J].ACM Transactions on Mathematical Software,1992,18(2):211-222.
[16]MICHEL M J,BRISEBARRE N,DED F.Handbook of floating-point arithmetic[M].Boston:Birkhauser,2010.
[17]SHUMAN D I,VANDERGHEYNST P,FROSSARD P.Chebyshev polynomial approximation for distributed signal processing[C]//2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).Piscataway,NJ:IEEE,2011:1-8.
[18]TAWFIK S A,FAHMY H A H.Algorithmic truncation ofMiniMax polynomial coefficients[C]//The IEEE International Symposium on Circuits and Systems.Piscataway,NJ:IEEE,2006:2421-2424.
[19]BRISEBARRE N,CHEVILLARD S.Efficient polynomial L-approximations[C]//IEEE Symposium on Computer Arithmetic.Piscataway,NJ:IEEE,2007:169-176.
[20]CHEVILLARD S,LAUTER C.A certified infinite norm for the implementation of elementary functions[C]//International Conference on Quality Software.Piscataway,NJ:IEEE,2007:152-160.
[21]RICARDO P,TREFETHEN L N.Barycentric-Remez algo-rithms for best polynomial approximation in the chebfun system[J].Bit Numerical Mathematics,2009,49(4):721-741.
[22]QIH Y,XU J C,GUO S Z.Detection of the maximum error of mathematical functions[J].The Journal of Supercomputing,2018,74(11):6275-6290.
[1] CAO Hao, GUO Shao-zhong, LIU Dan, XU Jin-chen. Automatic Porting of Basic Mathematics Library for 64-bit RISC-V [J]. Computer Science, 2021, 48(6): 41-47.
[2] HU Hao, SHEN Li, ZHOU Qing-lei and GONG Ling-qin. Node Fusion Optimization Method Based on LLVM Compiler [J]. Computer Science, 2020, 47(6A): 561-566.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!