计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 320-324.doi: 10.11896/j.issn.1002-137X.2019.01.050
• 交叉与前沿 • 上一篇
周蓓1, 黄永忠2, 许瑾晨1, 郭绍忠1
ZHOU Bei1, HUANG Yong-zhong2, XU Jin-chen1, GUO Shao-zhong1
摘要: SIMD技术的出现使得基础数学库扩展到向量数学库成为必然趋势。基础数学库中多数函数存在代码实现复杂、分支判断多的特点,增加了向量化的难度,同时SIMD指令的不完备导致函数中的部分功能无法直接向量化,频繁的拆分和拼接操作降低了函数的性能。针对这些问题,提出了向量数学库的向量化方法,通过确定核心代码段、数据预处理过程向量化及指令向量化3个步骤,可以快速有效地对基础数学库进行向量化。实验表明,运用该方法,exp,pow,log10等典型函数的性能平均提高了24.2%。
中图分类号:
[1]LIU Y,ZHANG D H,ZHAO X B,et al.A Rapid Parallel ART Based on SIMD Technology[J].Journal of Image and Graphics,2007,12(1):73-77.(in Chinese)<br /> 刘远,张定华,赵歆波,等.一种基于SIMD技术的快速并行代数重建算法[J].中国图像图形学报,2007,12(1):73-77.<br /> [2]VAN DER HOEVEN J,LECERF G,QUINTIN G,et al.Modular SIMD arithmetic in Mathemagix [J].ACM Transactions on Mathematical Software,2014,43(1):5.<br /> [3]XIE Q C,ZHANG Y Q,WANG K,et al.Research of the SIMD and Vector Math Library[J].Computer Science,2011,38(7):298-301.(in Chinese)<br /> 解庆春,张云泉,王可,等.SIMD技术与向量数学库研究[J].计算机科学,2011,38(7):298-301.<br /> [4]ZHANG Y Q,SUN J C,YUAN G X,et al.Perspectives of China’s HPC system development:a view from the 2009 China HPC TOP100 list[J].Frontiers of Computer Science in China,2009,4(4):437-444.<br /> [5]解庆春,张云泉,鲁永泉,等.SW_VML:基于神威蓝光处理器的向量数学软件包[C]//2013全国高性能计算学术年会论文集.桂林:中国计算机学会,2013.<br /> [6]PARRI J,SHARIRO D,BOLIC M,et al.Returning Contrl to the Programmer:SIMD Intrinsics for Virtual Machine[J].Communications of the ACM,2011,54(4):38-43.<br /> [7]LIU H,LIU F F,ZHANG P,et al.Optimization of BLAS Level 3 Functions on SW1600[J].Computer Systems and Applications,2016,25(12):234-239.(in Chinese)<br /> 刘昊,刘芳芳,张鹏,等.基于申威1600的3级BLAS GEMM函数优化[J].计算机系统应用,2016,25(12):234-239.<br /> [8]WANG D.The Research on SIMD Compilation Optimization [D].Hangzhou:Zhejiang University,2008.(in Chinese)<br /> 王迪.SIMD编译优化技术研究[D].杭州:浙江大学,2008.<br /> [9]ZHAO W X,ZHANG X D,LEMIRE D.A General SIMD-Based Approach to Accelerating Compression Algorithms[J].ACM Transactions on Information Systems,2015,33(3):1-28.<br /> [10]ZHOU H,XUE J L.A Compiler Approach for Exploiting Partial SIMD Parallelism[J].ACM Transactions on Architecture and Code Optimizaiton,2016,13(1):1-26.<br /> [11]CAO D,GUO S Z,ZHANG X.Implementation and Optimization of Extended Function Library Based on SW26010 Processor[J].Computer Engineering,2017,43(1):61-66.(in Chinese)<br /> 曹代,郭绍忠,张辛.基于申威26010处理器的扩展函数库实现与优化[J].计算机工程,2017,43(1):61-66.<br /> [12]郭绍忠,许瑾晨,陈世淼.SIMD优化中的指令等价替换实现方法[C]//河南省计算机学会2011年学术年会.2011.<br /> [13]ASHER Y B,ROTEM N.Hybrid Type Legalization for a Sparse SIMD Instruction Set[J].ACM Transactions on Architecture and Code Optimizaiton,2013,10(3):11.<br /> [14]LUK C K,MOWRY T C.Compiler-based Prefetching for Recursive Data Struchtures[C]//Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems.1996:222-233. |
[1] | 黄颖琦, 陈红梅. 基于代价敏感卷积神经网络的非平衡问题混合方法 Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification 计算机科学, 2021, 48(9): 77-85. https://doi.org/10.11896/jsjkx.200900013 |
[2] | 倪晓军, 佘戌豪. 面向无线传感网络应用的改进LZW算法 Improvement of LZW Algorithms for Wireless Sensor Networks 计算机科学, 2020, 47(5): 260-264. https://doi.org/10.11896/jsjkx.190400108 |
[3] | 陈佳,欧阳金源,冯安琪,吴远,钱丽萍. 边缘计算构架下基于孤立森林算法的DoS异常检测 DoS Anomaly Detection Based on Isolation Forest Algorithm Under Edge Computing Framework 计算机科学, 2020, 47(2): 287-293. https://doi.org/10.11896/jsjkx.190100047 |
[4] | 樊哲宁, 杨秋辉, 翟宇鹏, 万莹, 王帅. 重复数据中关键属性值缺失填补的改进ROUSTIDA算法 Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data 计算机科学, 2019, 46(2): 30-34. https://doi.org/10.11896/j.issn.1002-137X.2019.02.005 |
[5] | 檀朝东,闵帆,吴霄,李欣伦. 带弱通配符的模式匹配及其在时序分析中的应用 Pattern Matching with Weak-wildcard in Application of Time Series Analysis 计算机科学, 2018, 45(1): 103-107. https://doi.org/10.11896/j.issn.1002-137X.2018.01.016 |
[6] | 梁路,龚奔龙,黎剑,滕少华. 一种缓解分类面交错的样本点扩散方法 Diffusion Method of Sample Points for Alleviating Staggered Situation of Classification 计算机科学, 2017, 44(9): 286-289. https://doi.org/10.11896/j.issn.1002-137X.2017.09.053 |
[7] | 池云仙,赵书良,罗燕,高琳,赵骏鹏,李超. 基于词频统计规律的文本数据预处理方法 Text Data Preprocessing Based on Term Frequency Statistics Rules 计算机科学, 2017, 44(10): 276-282. https://doi.org/10.11896/j.issn.1002-137X.2017.10.050 |
[8] | 梁路,黎剑,霍颖翔,滕少华. 一种非均匀分布数据的非线性标准化方法 Nonlinear Normalization for Non-uniformly Distributed Data 计算机科学, 2016, 43(4): 264-269. https://doi.org/10.11896/j.issn.1002-137X.2016.04.054 |
[9] | 刘解放,赵斌,周宁. 基于有效载荷的多级实时入侵检测系统框架 Multilevel Real-time Payload-based Intrusion Detection System Framework 计算机科学, 2014, 41(4): 126-133. |
[10] | 于化龙,顾国昌,赵靖,刘海波,沈晶. 基于DNA微阵列数据的癌症分类问题研究进展 State of the Art on Cancer Classification Problems Based on DNA Microarray Data 计算机科学, 2010, 37(10): 16-22. |
[11] | 刘加伶,范军. 基于用户访问树的Web日志挖掘数据预处理 Data Preprocessing in Web Log Mining Based on User Access Tree 计算机科学, 2009, 36(9): 154-156. |
[12] | 刘立军 周军 梅红岩. Web使用挖掘的数据预处理 计算机科学, 2007, 34(5): 200-201. |
[13] | . 电子病历数据预处理技术 计算机科学, 2007, 34(3): 141-144. |
[14] | . Web使用挖掘技术分析 计算机科学, 2006, 33(2): 220-222. |
[15] | 陈晓梅. 入侵检测中的数据预处理问题研究 计算机科学, 2006, 33(1): 81-83. |
|