计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 29-35.doi: 10.11896/jsjkx.201200135
姚建宇1,2, 张祎维3, 张广婷1, 贾海鹏1
YAO Jian-yu1,2, ZHANG Yi-wei3, ZHANG Guang-ting1, JIA Hai-peng1
摘要: 作为基本的数学运算,三角函数的高性能实现对构建处理器的基础软件生态具有重要意义,特别是当前处理器都采用了SIMD架构,基于SIMD实现高性能三角函数具有重要的研究意义和应用价值。对此,文中采用数值分析的方法,对5个常用的三角函数sin,cos,tan,atan,atan2进行了高性能的实现与优化。首先通过分析浮点数IEEE754标准,设计了高效的三角函数算法;然后通过多项式逼近算法中的泰勒公式、帕德近似及雷米兹算法提升了算法精度;最后利用指令流水线与SIMD优化进一步提升了算法性能。实验结果表明,在满足精度的前提下,所实现的三角函数,相较于libm算法库和ARM_M 算法库,在ARM V8计算平台上都获得了较大的性能提升,其中相比libm算法库有1.77~6.26倍的时间性能提升,相比ARM_M算法库有1.34~1.5倍的时间性能提升。
中图分类号:
[1]FU S Y,WU J J,HSU W C.Improving SIMD code generation in QEMU[C]//2015 Design,Automation & Test in Europe Conference & Exhibition(DATE).IEEE,2015:1233-1236. [2]SHIBATA N.Efficient evaluation methods of elementary functions suitable for SIMD computation[J].Computer Science-Research and Development,2010,25(1):25-32. [3]STEPHENS N,BILES S,BOETTCHER M,et al.The ARM scalable vector extension[J].IEEE Micro,2017,37(2):26-39. [4]CHEN S M,GUO S Z,CHEN J X,et al.Optimization Algorithm for Trigonometric Functions Based on Processor with SIMD Function Components[J].Journal of Information Engineering University,2011,12(1):103-106. [5]CAO D,GUO S Z,ZHANG X.Implementation and Optimization of Extended Function Library Based on SW26010 Processor[J].Computer Engineering,2017(1):61-66. [6]LI Q Y,WANG N C,YI D Y.Numerical analysis 5th edition[M].Tsinghua University Press,2008. [7]HACKBUSCH W.Computation of best $$ L∧{\infty} $$ L∞ exponential sums for 1/x by Remez' algorithm[J].Computing and Visualization in Science,2019,20(1):1-11. [8]GLUZMAN S,YUKALOV V I.Self-similarly corrected Pade approximants for nonlinear equations[J].International Journal of Modern Physics B,2019,33(29):1950353. [9]HE Z,ZHANG J,YAO Z.Determining the optimal coefficients of the explicit finite difference scheme using the Remez exchange algorithm[J].Geophysics,2019,84(3):S137-S147. [10]MACCHIARELLA G.“Equi-Ripple” Synthesis of Multiband Prototype Filters Using a Remez-Like Algorithm[J].IEEE Microwave and Wireless Components Letters,2013,23(5):231-233. [11]VINSCHEN C,JOHNSTON J.Standard C math library[OL].https://www.sourceware.org/newlib/libm.html. [12]ARM.Arm Performance Libraries Reference Guide[OL]. https://static.docs.arm.com/101004/1920/arm_performance_libraries_reference_101004_1920_00_en.pdf. [13]ARM.Arm Optimized Routines[OL].https://github.com/ARM-software/optimized-routines. [14]ZHA Y L.Qin Jiushao's mathematical thinking method[J].Research on Dialectics of Nature,2003,19(1):87-92. |
[1] | 李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019 |
[2] | 蒋化南, 张帅, 林宇斐, 李豪. 基于MPI的分布式并行Gazebo仿真优化与测试 Simulation Optimization and Testing Based on Gazebo of MPI Distributed Parallelism 计算机科学, 2021, 48(11A): 672-677. https://doi.org/10.11896/jsjkx.210100109 |
[3] | 李爽, 赵荣彩, 王磊. 面向申威1621通用矩阵乘算法的实现与优化 Implementation and Optimization of Sunway1621 General Matrix Multiplication Algorithm 计算机科学, 2021, 48(11A): 699-704. https://doi.org/10.11896/jsjkx.201200150 |
[4] | 陈国良, 张玉杰. 并行计算学科发展历程 Development of Parallel Computing Subject 计算机科学, 2020, 47(8): 1-4. https://doi.org/10.11896/jsjkx.200600027 |
[5] | 蔡于涵,熊淑华,孙伟恒,Karn Pradeep,何小海. 基于运动矢量细化的帧率上变换与HEVC结合的视频压缩算法 Video Compression Algorithm Combining Frame Rate Up-conversion with HEVC Standard Based on Motion Vector Refinement 计算机科学, 2020, 47(2): 76-82. https://doi.org/10.11896/jsjkx.190500092 |
[6] | 汪洋, 李鹏, 季一木, 樊卫北, 张玉杰, 王汝传, 陈国良. 高性能计算与天文大数据研究综述 High Performance Computing and Astronomical Data:A Survey 计算机科学, 2020, 47(1): 1-6. https://doi.org/10.11896/jsjkx.190900042 |
[7] | 徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟 Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python 计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009 |
[8] | 龚彤艳,张广婷,贾海鹏,袁良. 一种偶数基Cooley-Tukey FFT高性能实现方法 High-performance Implementation Method for Even Basis of Cooley-Tukey FFT 计算机科学, 2020, 47(1): 31-39. https://doi.org/10.11896/jsjkx.190900179 |
[9] | 颜辉, 朱伯靖, 万文, 钟英, DavidAYune. 基于超算暨HPIC-LBM的大时空尺度三维湍流磁重联 HPIC-LBM Method Based Simulation of Large Temporal-Spatial Scale 3D Turbulent Magnetic Reconnection on Supercomputer 计算机科学, 2019, 46(8): 89-94. https://doi.org/10.11896/j.issn.1002-137X.2019.08.014 |
[10] | 贾迅, 钱磊, 邬贵明, 吴东, 谢向辉. FPGA应用于高性能计算的研究现状和未来挑战 Research Advances and Future Challenges of FPGA-based High Performance Computing 计算机科学, 2019, 46(11): 11-19. https://doi.org/10.11896/jsjkx.191100500C |
[11] | 周蓓, 黄永忠, 许瑾晨, 郭绍忠. 向量数学库的向量化方法研究 Study on SIMD Method of Vector Math Library 计算机科学, 2019, 46(1): 320-324. https://doi.org/10.11896/j.issn.1002-137X.2019.01.050 |
[12] | 张云泉. 2018年中国高性能计算机发展现状分析与展望 State-of-the-art Analysis and Perspectives of 2018 China HPC Development 计算机科学, 2019, 46(1): 1-5. https://doi.org/10.11896/j.issn.1002-137X.2019.01.001 |
[13] | 金星彤,李鹏,王刚,刘晓光,李忠伟. 基于异或的隐私保护码优化研究 Optimizing Small XOR-based Non-systematic Erasure Codes 计算机科学, 2017, 44(6): 36-42. https://doi.org/10.11896/j.issn.1002-137X.2017.06.006 |
[14] | 单娜娜,周巍,段哲民. HEVC中的变换系数熵编码优化算法 Improved Entropy Coding Algorithm for Transform Coefficients in HEVC 计算机科学, 2017, 44(6): 290-293. https://doi.org/10.11896/j.issn.1002-137X.2017.06.051 |
[15] | 杨玉星,邱亚娜. k元n维冒泡排序网络的子网排除 Sub-network Preclusion in (n,k)-bubble-sort Networks 计算机科学, 2017, 44(11): 264-267. https://doi.org/10.11896/j.issn.1002-137X.2017.11.039 |
|