面向申威平台的SIMD编程接口设计与研究

doi:10.11896/jsjkx.240700009

计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 66-73.doi: 10.11896/jsjkx.240700009

面向申威平台的SIMD编程接口设计与研究

姜军, 顾晓阳, 徐坤坤, 吕勇帅, 黄亮明

无锡先进技术研究院江苏无锡 214122

收稿日期:2024-07-01 修回日期:2024-11-22 出版日期:2025-06-15 发布日期:2025-06-11
通讯作者: 黄亮明(liangming_huang@126.com)
作者简介:(goodsun_jj@163.com)

Design and Research of SIMD Programming Interface for Sunway

JIANG Jun, GU Xiaoyang, XU Kunkun, LYU Yongshuai, HUANG Liangming

Wuxi Institute of Advanced Technology,Wuxi,Jiangsu 214122,China

Received:2024-07-01 Revised:2024-11-22 Online:2025-06-15 Published:2025-06-11
About author:JIANG Jun,born in 1980,M.S, senior engineer.His main research interests include compiler optimization and architecture-oriented performance analysis and optimization.
HUANG Liangming,born in 1988,Ph.D,engineer.His main research interests include compiler optimization and architecture-oriented performance analysis and optimization.

摘要/Abstract

摘要： 在国产申威处理器中,申威GCC编译器在对程序进行向量化时,使用自动向量化和内嵌汇编的方式很难对某些复杂的程序进行向量化,阻碍了国产申威处理器的性能发挥。针对部分程序不能向量化的问题,在申威GCC编译器中进行SIMD编程接口的设计与研究。在申威向量指令的基础上,通过在申威GCC编译器中添加向量机器模式和向量数据类型,编译器可以对向量参数类型进行识别。根据向量指令的类型和复杂度,分别使用内建函数扩展、操作符扩展和高级语言扩展3种方式实现SIMD编程接口函数。在后端添加不同的指令模板,使接口函数可以匹配相应的指令模板,生成对应向量指令的汇编代码。通过对FFTW库和Hyperscan库进行测试和分析,相比优化前的程序,使用SIMD编程接口进行向量化后,FFTW中Double类和Float类型程序的平均加速比分别为1.97和2.13,Hyperscan的平均加速比为2.94。

关键词: 向量化, SIMD编程接口, 向量指令, 内建函数, 指令模板

Abstract: In the domestically-produced Sunway high-performance systems,the Sunway GCC compiler finds it is challenging to vectorize complex programs using methods such as automatic vectorization and inline assembly during the compliation process,impeding the performance of domestically-produced Sunway processors.To address the issue of non-vectorizable programs,research and design of SIMD programming interfaces have been conducted within the Sunway compiler.By adding vector machine modes and vector data types in the Sunway GCC compiler based on Sunway vector instructions,the compiler can recognize vector parameter types.Depending on the type and complexity of the vector instruction,different vector instructions are expanded using intrinsic functions,operator expansion,and advanced language expansion,thereby implementing SIMD programming interface functions.Adding different instruction templates to the backend,so that the appropriate instruction templates can be matched,generating assembly code for the corresponding vector instructions.By testing and analyzing the FFTW library and Hyperscan library,it finds that after vectorizing the programs using SIMD programming interfaces,the average acceleration ratios for the FFTW library are 1.97 for the Double class and 2.13 for the Float type,while the average acceleration ratio for Hyperscan is 2.94.

Key words: Vectorization, SIMD programming interface, Vector instruction, Instrinsic function, Instruction template

中图分类号:

TP314

姜军, 顾晓阳, 徐坤坤, 吕勇帅, 黄亮明. 面向申威平台的SIMD编程接口设计与研究[J]. 计算机科学, 2025, 52(6): 66-73. https://doi.org/10.11896/jsjkx.240700009

JIANG Jun, GU Xiaoyang, XU Kunkun, LYU Yongshuai, HUANG Liangming. Design and Research of SIMD Programming Interface for Sunway[J]. Computer Science, 2025, 52(6): 66-73. https://doi.org/10.11896/jsjkx.240700009

参考文献

[1]ARIKPOI I,OGBAN F U,ETENG I E.Von neumann architecure and modern computers[J].Global Journal of Mathematical Sciences,2007,6(2):97-103.
[2]RUDSINSKI L,PIEPER G W.Evaluating computer programperformance on the CRAY-1:ANL-79-9; TRN:79-008828[R]. Argonne,IL:Argonne National Lab.,1979.
[3]DONGARRA J.Report on the Sunway TaihuLight System:UT-EECS-16-742 [R].University of Tennessee,2016.
[4]ASANOVICK,BODIK R,DEMMEL J,et al.A view of the parallel computing landscape[J].Communications of the ACM,2009,52(10):56-67.
[5]REDDY V,SUDHAKAR A,SIVAKUMAR P.Computing Performance Enhancement of VLIW Architecture Using Instruction Level Parallelism[J].International Journal of Innovative Science and Research Technology,2020,5(9):431-435.
[6]YIAPANIS P,BROWN G,LUJAN M.Compiler-Driven Soft-ware Speculation for Thread-Level Parallelism[J].ACM Transactions on Programming Languages and Systems,2015,38(2):1-45.
[7]LIMOUSINC,SEBOT J,VARTANIAN A,et al.Architectureoptimization for multimedia application exploiting data and thread-level parallelism[J].Journal of Systems Architecture,2005,51(1):15-27.
[8]RAMAN S K,PENTKOVSKI V,KESHAVA J.Implementing streaming SIMD extensions on the Pentium III processor[J].IEEE Micro,2000,20(4):47-57.
[9]CEBRIANJ M,NATVIG L,JAHRE M.Scalability analysis of AVX-512 extensions[J].The Journal of Supercomputing,2020,76(3):2082-2097.
[10]ODAJIMA T,KODAMA Y,SATO M.Power performance analysis of ARM scalable vector extension[C]//IEEE Symposium in Low-Power and High-Speed Chips(COOL CHIPS).IEEE,2018:1-3.
[11]GAO W,ZHAO R C,HAN L,et al.Research on SIMD Auto-Vectorization Compiling Optimization[J].Journal of Software,2015,26(6):1265-1284.
[12]FENG J G,HE Y P,TAO Q M.Evaluation of compilers' capability of automatic vectorization based on source code analysis[J].Scientific Programming,2021,2021:1-15.
[13]KONG M,VERAS R,SADAYAPPAN P.When polyhedraltransformations meet SIMD code generation[C]//Proc.of the 34th ACM SIGPLAN Conf.on Programming Language Design and Implementation.ACM,2013:127-138.
[14]AMIRI H,SHAHBAHRAMI A.SIMD programming using Intelvector extensions[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[15]BRAMASB.A fast vectorized sorting implementation based on the ARM scalable vector extension(SVE)[J].PeerJ Computer Science,2021,7:e769.
[16]RACORDON D.From ASTs to Machine Code with LLVM[C]//Companion Proceedings of the 5th International Conference on the Art,Science,and Engineering of Programming.New York:ACM,2021:68-76.
[17]WANG X W,WANGK X,YANG Q S.Research and Development of Computer Based on GCC[M]// Recent Advances in Computer Science and Information Engineering.Berlin:Springer,2012:809-814.
[18]NOVILLO D.GCC an architectural overview,current status,and future directions[C]//Proceedings of the Linux Symposium.Ottawa:Linux Symposium,2006:185.
[19]FRIGO M,JOHNSON S G.FFTW an adaptive software architecture for the FFT[C]//Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998:1381-1384.
[20]WANGX,HONG Y,CHANG H,et al.Hyperscan:A fast multi-pattern regex matcher for modern CPUs[C]//16th USENIX Symposium on Networked Systems Design and Implementation.USENIX Association,2019:631-648.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向申威平台的SIMD编程接口设计与研究

Design and Research of SIMD Programming Interface for Sunway

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0