计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 320-324.doi: 10.11896/j.issn.1002-137X.2019.01.050

• 交叉与前沿 • 上一篇    

向量数学库的向量化方法研究

周蓓1, 黄永忠2, 许瑾晨1, 郭绍忠1   

  1. (数学工程与先进计算国家重点实验室 郑州450002)1
    (桂林电子科技大学 广西 桂林541004)2
  • 收稿日期:2018-01-26 出版日期:2019-01-15 发布日期:2019-02-25
  • 作者简介:周 蓓(1977-),女,博士生,讲师,主要研究方向为高性能计算,E-mail:13653970052@163.com(通信作者);黄永忠(1968-),男,教授,博士生导师,主要研究方向为分布式计算、大数据处理等;许瑾晨(1987-),男,博士,主要研究方向为高性能计算;郭绍忠(1964-),女,教授,硕士生导师,主要研究方向为高性能计算。
  • 基金资助:
    面向100P高效能计算机的基础数学库系统项目,国家重点研发计划“高性能计算”重点专项:E级计算机关键技术验证系统(2016YFB0200503)资助

Study on SIMD Method of Vector Math Library

ZHOU Bei1, HUANG Yong-zhong2, XU Jin-chen1, GUO Shao-zhong1   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450002,China)1
    (Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)2
  • Received:2018-01-26 Online:2019-01-15 Published:2019-02-25

摘要: SIMD技术的出现使得基础数学库扩展到向量数学库成为必然趋势。基础数学库中多数函数存在代码实现复杂、分支判断多的特点,增加了向量化的难度,同时SIMD指令的不完备导致函数中的部分功能无法直接向量化,频繁的拆分和拼接操作降低了函数的性能。针对这些问题,提出了向量数学库的向量化方法,通过确定核心代码段、数据预处理过程向量化及指令向量化3个步骤,可以快速有效地对基础数学库进行向量化。实验表明,运用该方法,exp,pow,log10等典型函数的性能平均提高了24.2%。

关键词: SIMD技术, 核心代码段, 数据预处理, 向量数学库, 指令向量化

Abstract: It’s an inexorable trend from basic math library to vector math library with the occurrence of SIMD.But there are many difficulties because of complicated code and many branches of math library.On the other hand,SIMD instructions are not complete,so some functions are realized by frequent split and joint,which reduces the performance quickly.An effective vectoring method of vector math library was proposed in this paper.It consists of key code segment selection,data pre-processing vectoring and instruction vectoring.This method not only gets an effective performance improvement as much as possible,but also is a solid base for later depth optimization.The experimental results show that it can highly improve the functions’ performance such as exp,pow and log10 up to 24.2% on average respectively.

Key words: Data pre-processing, Instruction vectoring, Key code segment, SIMD technique, Vector math library

中图分类号: 

  • TP313
[1]LIU Y,ZHANG D H,ZHAO X B,et al.A Rapid Parallel ART Based on SIMD Technology[J].Journal of Image and Graphics,2007,12(1):73-77.(in Chinese)<br /> 刘远,张定华,赵歆波,等.一种基于SIMD技术的快速并行代数重建算法[J].中国图像图形学报,2007,12(1):73-77.<br /> [2]VAN DER HOEVEN J,LECERF G,QUINTIN G,et al.Modular SIMD arithmetic in Mathemagix [J].ACM Transactions on Mathematical Software,2014,43(1):5.<br /> [3]XIE Q C,ZHANG Y Q,WANG K,et al.Research of the SIMD and Vector Math Library[J].Computer Science,2011,38(7):298-301.(in Chinese)<br /> 解庆春,张云泉,王可,等.SIMD技术与向量数学库研究[J].计算机科学,2011,38(7):298-301.<br /> [4]ZHANG Y Q,SUN J C,YUAN G X,et al.Perspectives of China’s HPC system development:a view from the 2009 China HPC TOP100 list[J].Frontiers of Computer Science in China,2009,4(4):437-444.<br /> [5]解庆春,张云泉,鲁永泉,等.SW_VML:基于神威蓝光处理器的向量数学软件包[C]//2013全国高性能计算学术年会论文集.桂林:中国计算机学会,2013.<br /> [6]PARRI J,SHARIRO D,BOLIC M,et al.Returning Contrl to the Programmer:SIMD Intrinsics for Virtual Machine[J].Communications of the ACM,2011,54(4):38-43.<br /> [7]LIU H,LIU F F,ZHANG P,et al.Optimization of BLAS Level 3 Functions on SW1600[J].Computer Systems and Applications,2016,25(12):234-239.(in Chinese)<br /> 刘昊,刘芳芳,张鹏,等.基于申威1600的3级BLAS GEMM函数优化[J].计算机系统应用,2016,25(12):234-239.<br /> [8]WANG D.The Research on SIMD Compilation Optimization [D].Hangzhou:Zhejiang University,2008.(in Chinese)<br /> 王迪.SIMD编译优化技术研究[D].杭州:浙江大学,2008.<br /> [9]ZHAO W X,ZHANG X D,LEMIRE D.A General SIMD-Based Approach to Accelerating Compression Algorithms[J].ACM Transactions on Information Systems,2015,33(3):1-28.<br /> [10]ZHOU H,XUE J L.A Compiler Approach for Exploiting Partial SIMD Parallelism[J].ACM Transactions on Architecture and Code Optimizaiton,2016,13(1):1-26.<br /> [11]CAO D,GUO S Z,ZHANG X.Implementation and Optimization of Extended Function Library Based on SW26010 Processor[J].Computer Engineering,2017,43(1):61-66.(in Chinese)<br /> 曹代,郭绍忠,张辛.基于申威26010处理器的扩展函数库实现与优化[J].计算机工程,2017,43(1):61-66.<br /> [12]郭绍忠,许瑾晨,陈世淼.SIMD优化中的指令等价替换实现方法[C]//河南省计算机学会2011年学术年会.2011.<br /> [13]ASHER Y B,ROTEM N.Hybrid Type Legalization for a Sparse SIMD Instruction Set[J].ACM Transactions on Architecture and Code Optimizaiton,2013,10(3):11.<br /> [14]LUK C K,MOWRY T C.Compiler-based Prefetching for Recursive Data Struchtures[C]//Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems.1996:222-233.
[1] 黄颖琦, 陈红梅.
基于代价敏感卷积神经网络的非平衡问题混合方法
Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification
计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013
[2] 倪晓军, 佘戌豪.
面向无线传感网络应用的改进LZW算法
Improvement of LZW Algorithms for Wireless Sensor Networks
计算机科学, 2020, 47(5): 260-264. https://doi.org/10.11896/jsjkx.190400108
[3] 陈佳,欧阳金源,冯安琪,吴远,钱丽萍.
边缘计算构架下基于孤立森林算法的DoS异常检测
DoS Anomaly Detection Based on Isolation Forest Algorithm Under Edge Computing Framework
计算机科学, 2020, 47(2): 287-293. https://doi.org/10.11896/jsjkx.190100047
[4] 樊哲宁, 杨秋辉, 翟宇鹏, 万莹, 王帅.
重复数据中关键属性值缺失填补的改进ROUSTIDA算法
Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data
计算机科学, 2019, 46(2): 30-34. https://doi.org/10.11896/j.issn.1002-137X.2019.02.005
[5] 檀朝东,闵帆,吴霄,李欣伦.
带弱通配符的模式匹配及其在时序分析中的应用
Pattern Matching with Weak-wildcard in Application of Time Series Analysis
计算机科学, 2018, 45(1): 103-107. https://doi.org/10.11896/j.issn.1002-137X.2018.01.016
[6] 梁路,龚奔龙,黎剑,滕少华.
一种缓解分类面交错的样本点扩散方法
Diffusion Method of Sample Points for Alleviating Staggered Situation of Classification
计算机科学, 2017, 44(9): 286-289. https://doi.org/10.11896/j.issn.1002-137X.2017.09.053
[7] 池云仙,赵书良,罗燕,高琳,赵骏鹏,李超.
基于词频统计规律的文本数据预处理方法
Text Data Preprocessing Based on Term Frequency Statistics Rules
计算机科学, 2017, 44(10): 276-282. https://doi.org/10.11896/j.issn.1002-137X.2017.10.050
[8] 梁路,黎剑,霍颖翔,滕少华.
一种非均匀分布数据的非线性标准化方法
Nonlinear Normalization for Non-uniformly Distributed Data
计算机科学, 2016, 43(4): 264-269. https://doi.org/10.11896/j.issn.1002-137X.2016.04.054
[9] 刘解放,赵斌,周宁.
基于有效载荷的多级实时入侵检测系统框架
Multilevel Real-time Payload-based Intrusion Detection System Framework
计算机科学, 2014, 41(4): 126-133.
[10] 于化龙,顾国昌,赵靖,刘海波,沈晶.
基于DNA微阵列数据的癌症分类问题研究进展
State of the Art on Cancer Classification Problems Based on DNA Microarray Data
计算机科学, 2010, 37(10): 16-22.
[11] 刘加伶,范军.
基于用户访问树的Web日志挖掘数据预处理
Data Preprocessing in Web Log Mining Based on User Access Tree
计算机科学, 2009, 36(9): 154-156.
[12] 刘立军 周军 梅红岩.
Web使用挖掘的数据预处理

计算机科学, 2007, 34(5): 200-201.
[13] .
电子病历数据预处理技术

计算机科学, 2007, 34(3): 141-144.
[14] .
Web使用挖掘技术分析

计算机科学, 2006, 33(2): 220-222.
[15] 陈晓梅.
入侵检测中的数据预处理问题研究

计算机科学, 2006, 33(1): 81-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!