Computer Science ›› 2019, Vol. 46 ›› Issue (1): 320-324.doi: 10.11896/j.issn.1002-137X.2019.01.050

• Interdiscipline & Frontier • Previous Articles    

Study on SIMD Method of Vector Math Library

ZHOU Bei1, HUANG Yong-zhong2, XU Jin-chen1, GUO Shao-zhong1   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450002,China)1
    (Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)2
  • Received:2018-01-26 Online:2019-01-15 Published:2019-02-25

Abstract: It’s an inexorable trend from basic math library to vector math library with the occurrence of SIMD.But there are many difficulties because of complicated code and many branches of math library.On the other hand,SIMD instructions are not complete,so some functions are realized by frequent split and joint,which reduces the performance quickly.An effective vectoring method of vector math library was proposed in this paper.It consists of key code segment selection,data pre-processing vectoring and instruction vectoring.This method not only gets an effective performance improvement as much as possible,but also is a solid base for later depth optimization.The experimental results show that it can highly improve the functions’ performance such as exp,pow and log10 up to 24.2% on average respectively.

Key words: Data pre-processing, Instruction vectoring, Key code segment, SIMD technique, Vector math library

CLC Number: 

  • TP313
[1]LIU Y,ZHANG D H,ZHAO X B,et al.A Rapid Parallel ART Based on SIMD Technology[J].Journal of Image and Graphics,2007,12(1):73-77.(in Chinese)<br /> 刘远,张定华,赵歆波,等.一种基于SIMD技术的快速并行代数重建算法[J].中国图像图形学报,2007,12(1):73-77.<br /> [2]VAN DER HOEVEN J,LECERF G,QUINTIN G,et al.Modular SIMD arithmetic in Mathemagix [J].ACM Transactions on Mathematical Software,2014,43(1):5.<br /> [3]XIE Q C,ZHANG Y Q,WANG K,et al.Research of the SIMD and Vector Math Library[J].Computer Science,2011,38(7):298-301.(in Chinese)<br /> 解庆春,张云泉,王可,等.SIMD技术与向量数学库研究[J].计算机科学,2011,38(7):298-301.<br /> [4]ZHANG Y Q,SUN J C,YUAN G X,et al.Perspectives of China’s HPC system development:a view from the 2009 China HPC TOP100 list[J].Frontiers of Computer Science in China,2009,4(4):437-444.<br /> [5]解庆春,张云泉,鲁永泉,等.SW_VML:基于神威蓝光处理器的向量数学软件包[C]//2013全国高性能计算学术年会论文集.桂林:中国计算机学会,2013.<br /> [6]PARRI J,SHARIRO D,BOLIC M,et al.Returning Contrl to the Programmer:SIMD Intrinsics for Virtual Machine[J].Communications of the ACM,2011,54(4):38-43.<br /> [7]LIU H,LIU F F,ZHANG P,et al.Optimization of BLAS Level 3 Functions on SW1600[J].Computer Systems and Applications,2016,25(12):234-239.(in Chinese)<br /> 刘昊,刘芳芳,张鹏,等.基于申威1600的3级BLAS GEMM函数优化[J].计算机系统应用,2016,25(12):234-239.<br /> [8]WANG D.The Research on SIMD Compilation Optimization [D].Hangzhou:Zhejiang University,2008.(in Chinese)<br /> 王迪.SIMD编译优化技术研究[D].杭州:浙江大学,2008.<br /> [9]ZHAO W X,ZHANG X D,LEMIRE D.A General SIMD-Based Approach to Accelerating Compression Algorithms[J].ACM Transactions on Information Systems,2015,33(3):1-28.<br /> [10]ZHOU H,XUE J L.A Compiler Approach for Exploiting Partial SIMD Parallelism[J].ACM Transactions on Architecture and Code Optimizaiton,2016,13(1):1-26.<br /> [11]CAO D,GUO S Z,ZHANG X.Implementation and Optimization of Extended Function Library Based on SW26010 Processor[J].Computer Engineering,2017,43(1):61-66.(in Chinese)<br /> 曹代,郭绍忠,张辛.基于申威26010处理器的扩展函数库实现与优化[J].计算机工程,2017,43(1):61-66.<br /> [12]郭绍忠,许瑾晨,陈世淼.SIMD优化中的指令等价替换实现方法[C]//河南省计算机学会2011年学术年会.2011.<br /> [13]ASHER Y B,ROTEM N.Hybrid Type Legalization for a Sparse SIMD Instruction Set[J].ACM Transactions on Architecture and Code Optimizaiton,2013,10(3):11.<br /> [14]LUK C K,MOWRY T C.Compiler-based Prefetching for Recursive Data Struchtures[C]//Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems.1996:222-233.
[1] FAN Zhe-ning, YANG Qiu-hui, ZHAI Yu-peng, WAN Ying, WANG Shuai. Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data [J]. Computer Science, 2019, 46(2): 30-34.
[2] LIU Jie-fang,ZHAO Bin and ZHOU Ning. Multilevel Real-time Payload-based Intrusion Detection System Framework [J]. Computer Science, 2014, 41(4): 126-133.
[3] . [J]. Computer Science, 2007, 34(3): 141-144.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!