面向申威平台的SIMD编程接口设计与研究

doi:10.11896/jsjkx.240700009

Abstract

Abstract: In the domestically-produced Sunway high-performance systems,the Sunway GCC compiler finds it is challenging to vectorize complex programs using methods such as automatic vectorization and inline assembly during the compliation process,impeding the performance of domestically-produced Sunway processors.To address the issue of non-vectorizable programs,research and design of SIMD programming interfaces have been conducted within the Sunway compiler.By adding vector machine modes and vector data types in the Sunway GCC compiler based on Sunway vector instructions,the compiler can recognize vector parameter types.Depending on the type and complexity of the vector instruction,different vector instructions are expanded using intrinsic functions,operator expansion,and advanced language expansion,thereby implementing SIMD programming interface functions.Adding different instruction templates to the backend,so that the appropriate instruction templates can be matched,generating assembly code for the corresponding vector instructions.By testing and analyzing the FFTW library and Hyperscan library,it finds that after vectorizing the programs using SIMD programming interfaces,the average acceleration ratios for the FFTW library are 1.97 for the Double class and 2.13 for the Float type,while the average acceleration ratio for Hyperscan is 2.94.

Key words: Vectorization, SIMD programming interface, Vector instruction, Instrinsic function, Instruction template

CLC Number:

TP314

JIANG Jun, GU Xiaoyang, XU Kunkun, LYU Yongshuai, HUANG Liangming. Design and Research of SIMD Programming Interface for Sunway[J].Computer Science, 2025, 52(6): 66-73.

References

[1]ARIKPOI I,OGBAN F U,ETENG I E.Von neumann architecure and modern computers[J].Global Journal of Mathematical Sciences,2007,6(2):97-103.
[2]RUDSINSKI L,PIEPER G W.Evaluating computer programperformance on the CRAY-1:ANL-79-9; TRN:79-008828[R]. Argonne,IL:Argonne National Lab.,1979.
[3]DONGARRA J.Report on the Sunway TaihuLight System:UT-EECS-16-742 [R].University of Tennessee,2016.
[4]ASANOVICK,BODIK R,DEMMEL J,et al.A view of the parallel computing landscape[J].Communications of the ACM,2009,52(10):56-67.
[5]REDDY V,SUDHAKAR A,SIVAKUMAR P.Computing Performance Enhancement of VLIW Architecture Using Instruction Level Parallelism[J].International Journal of Innovative Science and Research Technology,2020,5(9):431-435.
[6]YIAPANIS P,BROWN G,LUJAN M.Compiler-Driven Soft-ware Speculation for Thread-Level Parallelism[J].ACM Transactions on Programming Languages and Systems,2015,38(2):1-45.
[7]LIMOUSINC,SEBOT J,VARTANIAN A,et al.Architectureoptimization for multimedia application exploiting data and thread-level parallelism[J].Journal of Systems Architecture,2005,51(1):15-27.
[8]RAMAN S K,PENTKOVSKI V,KESHAVA J.Implementing streaming SIMD extensions on the Pentium III processor[J].IEEE Micro,2000,20(4):47-57.
[9]CEBRIANJ M,NATVIG L,JAHRE M.Scalability analysis of AVX-512 extensions[J].The Journal of Supercomputing,2020,76(3):2082-2097.
[10]ODAJIMA T,KODAMA Y,SATO M.Power performance analysis of ARM scalable vector extension[C]//IEEE Symposium in Low-Power and High-Speed Chips(COOL CHIPS).IEEE,2018:1-3.
[11]GAO W,ZHAO R C,HAN L,et al.Research on SIMD Auto-Vectorization Compiling Optimization[J].Journal of Software,2015,26(6):1265-1284.
[12]FENG J G,HE Y P,TAO Q M.Evaluation of compilers' capability of automatic vectorization based on source code analysis[J].Scientific Programming,2021,2021:1-15.
[13]KONG M,VERAS R,SADAYAPPAN P.When polyhedraltransformations meet SIMD code generation[C]//Proc.of the 34th ACM SIGPLAN Conf.on Programming Language Design and Implementation.ACM,2013:127-138.
[14]AMIRI H,SHAHBAHRAMI A.SIMD programming using Intelvector extensions[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[15]BRAMASB.A fast vectorized sorting implementation based on the ARM scalable vector extension(SVE)[J].PeerJ Computer Science,2021,7:e769.
[16]RACORDON D.From ASTs to Machine Code with LLVM[C]//Companion Proceedings of the 5th International Conference on the Art,Science,and Engineering of Programming.New York:ACM,2021:68-76.
[17]WANG X W,WANGK X,YANG Q S.Research and Development of Computer Based on GCC[M]// Recent Advances in Computer Science and Information Engineering.Berlin:Springer,2012:809-814.
[18]NOVILLO D.GCC an architectural overview,current status,and future directions[C]//Proceedings of the Linux Symposium.Ottawa:Linux Symposium,2006:185.
[19]FRIGO M,JOHNSON S G.FFTW an adaptive software architecture for the FFT[C]//Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998:1381-1384.
[20]WANGX,HONG Y,CHANG H,et al.Hyperscan:A fast multi-pattern regex matcher for modern CPUs[C]//16th USENIX Symposium on Networked Systems Design and Implementation.USENIX Association,2019:631-648.

Related Articles 15

[1]	LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing. Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs [J]. Computer Science, 2025, 52(6A): 240300156-7.
[2]	GUO Xiaoli, LI Qifeng, LIU Yu, ZHANG Jun, ZHAO Hongtao, YANG Gan, JIANG Ruixiang, YU Ligen. Study on Diagnosis Model of Livestock and Poultry Disease Based on Improved TF-IIGM Algorithm [J]. Computer Science, 2025, 52(6A): 240700029-7.
[3]	LIU Lili, SHAN Zheng, LI Yingying, WU Wenhao, LIU Wenbo. Research on Function Vectorization Technology Based on Directive Statements [J]. Computer Science, 2025, 52(5): 76-82.
[4]	WANG Zhen, NIE Kai, HAN Lin. Auto-vectorization Cost Model Based on Instruction MKS [J]. Computer Science, 2024, 51(4): 78-85.
[5]	SUN Wei, BI Yujiang, CHENG Yaodong. Lattice QCD Calculation and Optimization on ARM Processors [J]. Computer Science, 2023, 50(6): 52-57.
[6]	LI Hui, HAN Lin, TAO Hong-wei, DONG Ben-song. Study on Office Password Recovery Vectorization Technology Based on Sunway Many-core Processor [J]. Computer Science, 2022, 49(11A): 210900176-5.
[7]	XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319.
[8]	YAO Jin-yang, ZHAO Rong-cai, WANG Qi, LI Ying-ying. Vectorization Methods for Indirect Array Index [J]. Computer Science, 2018, 45(9): 220-223.
[9]	ZHAO Cheng, CHEN Jun-xin, YAO Ming-hai. XSS Attack Detection Technology Based on SVM Classifier [J]. Computer Science, 2018, 45(11A): 356-360.
[10]	LI Rui-long, LIANG Yuan and ZHANG Song-hai. Cartoon Animations Segmentation and Vectorization Based on Canny Optimization [J]. Computer Science, 2017, 44(8): 27-30.
[11]	HAN Lin, XU Jin-long, LI Ying-ying and WANG Yang. Method of Loop Distribution and Aggregation for Partial Vectorization [J]. Computer Science, 2017, 44(2): 70-74.
[12]	CHEN Yong and XU Chao. Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method [J]. Computer Science, 2016, 43(Z6): 461-466.
[13]	YU Hai-ning, HAN Lin and LI Peng-yuan. Structure Optimization for Automatic Vectorization [J]. Computer Science, 2016, 43(2): 210-215.
[14]	XU Jin-long ZHAO Rong-cai ZHAO Bo. Research on Non-full Length Usage of SIMD Vector Instruction [J]. Computer Science, 2015, 42(7): 229-233.
[15]	LI Peng-yuan, ZHAO Rong-cai, GAO Wei and ZHANG Qing-hua. Effective Vectorization Technique for Interleaved Data with Constant Strides [J]. Computer Science, 2015, 42(5): 194-199.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Design and Research of SIMD Programming Interface for Sunway

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0