Computer Science ›› 2025, Vol. 52 ›› Issue (6): 66-73.doi: 10.11896/jsjkx.240700009

• High Performance Computing • Previous Articles     Next Articles

Design and Research of SIMD Programming Interface for Sunway

JIANG Jun, GU Xiaoyang, XU Kunkun, LYU Yongshuai, HUANG Liangming   

  1. Wuxi Institute of Advanced Technology,Wuxi,Jiangsu 214122,China
  • Received:2024-07-01 Revised:2024-11-22 Online:2025-06-15 Published:2025-06-11
  • About author:JIANG Jun,born in 1980,M.S, senior engineer.His main research interests include compiler optimization and architecture-oriented performance analysis and optimization.
    HUANG Liangming,born in 1988,Ph.D,engineer.His main research interests include compiler optimization and architecture-oriented performance analysis and optimization.

Abstract: In the domestically-produced Sunway high-performance systems,the Sunway GCC compiler finds it is challenging to vectorize complex programs using methods such as automatic vectorization and inline assembly during the compliation process,impeding the performance of domestically-produced Sunway processors.To address the issue of non-vectorizable programs,research and design of SIMD programming interfaces have been conducted within the Sunway compiler.By adding vector machine modes and vector data types in the Sunway GCC compiler based on Sunway vector instructions,the compiler can recognize vector parameter types.Depending on the type and complexity of the vector instruction,different vector instructions are expanded using intrinsic functions,operator expansion,and advanced language expansion,thereby implementing SIMD programming interface functions.Adding different instruction templates to the backend,so that the appropriate instruction templates can be matched,generating assembly code for the corresponding vector instructions.By testing and analyzing the FFTW library and Hyperscan library,it finds that after vectorizing the programs using SIMD programming interfaces,the average acceleration ratios for the FFTW library are 1.97 for the Double class and 2.13 for the Float type,while the average acceleration ratio for Hyperscan is 2.94.

Key words: Vectorization, SIMD programming interface, Vector instruction, Instrinsic function, Instruction template

CLC Number: 

  • TP314
[1]ARIKPOI I,OGBAN F U,ETENG I E.Von neumann architecure and modern computers[J].Global Journal of Mathematical Sciences,2007,6(2):97-103.
[2]RUDSINSKI L,PIEPER G W.Evaluating computer programperformance on the CRAY-1:ANL-79-9; TRN:79-008828[R]. Argonne,IL:Argonne National Lab.,1979.
[3]DONGARRA J.Report on the Sunway TaihuLight System:UT-EECS-16-742 [R].University of Tennessee,2016.
[4]ASANOVICK,BODIK R,DEMMEL J,et al.A view of the parallel computing landscape[J].Communications of the ACM,2009,52(10):56-67.
[5]REDDY V,SUDHAKAR A,SIVAKUMAR P.Computing Performance Enhancement of VLIW Architecture Using Instruction Level Parallelism[J].International Journal of Innovative Science and Research Technology,2020,5(9):431-435.
[6]YIAPANIS P,BROWN G,LUJAN M.Compiler-Driven Soft-ware Speculation for Thread-Level Parallelism[J].ACM Transactions on Programming Languages and Systems,2015,38(2):1-45.
[7]LIMOUSINC,SEBOT J,VARTANIAN A,et al.Architectureoptimization for multimedia application exploiting data and thread-level parallelism[J].Journal of Systems Architecture,2005,51(1):15-27.
[8]RAMAN S K,PENTKOVSKI V,KESHAVA J.Implementing streaming SIMD extensions on the Pentium III processor[J].IEEE Micro,2000,20(4):47-57.
[9]CEBRIANJ M,NATVIG L,JAHRE M.Scalability analysis of AVX-512 extensions[J].The Journal of Supercomputing,2020,76(3):2082-2097.
[10]ODAJIMA T,KODAMA Y,SATO M.Power performance analysis of ARM scalable vector extension[C]//IEEE Symposium in Low-Power and High-Speed Chips(COOL CHIPS).IEEE,2018:1-3.
[11]GAO W,ZHAO R C,HAN L,et al.Research on SIMD Auto-Vectorization Compiling Optimization[J].Journal of Software,2015,26(6):1265-1284.
[12]FENG J G,HE Y P,TAO Q M.Evaluation of compilers' capability of automatic vectorization based on source code analysis[J].Scientific Programming,2021,2021:1-15.
[13]KONG M,VERAS R,SADAYAPPAN P.When polyhedraltransformations meet SIMD code generation[C]//Proc.of the 34th ACM SIGPLAN Conf.on Programming Language Design and Implementation.ACM,2013:127-138.
[14]AMIRI H,SHAHBAHRAMI A.SIMD programming using Intelvector extensions[J].Journal of Parallel and Distributed Computing,2020,135:83-100.
[15]BRAMASB.A fast vectorized sorting implementation based on the ARM scalable vector extension(SVE)[J].PeerJ Computer Science,2021,7:e769.
[16]RACORDON D.From ASTs to Machine Code with LLVM[C]//Companion Proceedings of the 5th International Conference on the Art,Science,and Engineering of Programming.New York:ACM,2021:68-76.
[17]WANG X W,WANGK X,YANG Q S.Research and Development of Computer Based on GCC[M]// Recent Advances in Computer Science and Information Engineering.Berlin:Springer,2012:809-814.
[18]NOVILLO D.GCC an architectural overview,current status,and future directions[C]//Proceedings of the Linux Symposium.Ottawa:Linux Symposium,2006:185.
[19]FRIGO M,JOHNSON S G.FFTW an adaptive software architecture for the FFT[C]//Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998:1381-1384.
[20]WANGX,HONG Y,CHANG H,et al.Hyperscan:A fast multi-pattern regex matcher for modern CPUs[C]//16th USENIX Symposium on Networked Systems Design and Implementation.USENIX Association,2019:631-648.
[1] LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing. Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs [J]. Computer Science, 2025, 52(6A): 240300156-7.
[2] GUO Xiaoli, LI Qifeng, LIU Yu, ZHANG Jun, ZHAO Hongtao, YANG Gan, JIANG Ruixiang, YU Ligen. Study on Diagnosis Model of Livestock and Poultry Disease Based on Improved TF-IIGM Algorithm [J]. Computer Science, 2025, 52(6A): 240700029-7.
[3] LIU Lili, SHAN Zheng, LI Yingying, WU Wenhao, LIU Wenbo. Research on Function Vectorization Technology Based on Directive Statements [J]. Computer Science, 2025, 52(5): 76-82.
[4] WANG Zhen, NIE Kai, HAN Lin. Auto-vectorization Cost Model Based on Instruction MKS [J]. Computer Science, 2024, 51(4): 78-85.
[5] SUN Wei, BI Yujiang, CHENG Yaodong. Lattice QCD Calculation and Optimization on ARM Processors [J]. Computer Science, 2023, 50(6): 52-57.
[6] LI Hui, HAN Lin, TAO Hong-wei, DONG Ben-song. Study on Office Password Recovery Vectorization Technology Based on Sunway Many-core Processor [J]. Computer Science, 2022, 49(11A): 210900176-5.
[7] XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319.
[8] YAO Jin-yang, ZHAO Rong-cai, WANG Qi, LI Ying-ying. Vectorization Methods for Indirect Array Index [J]. Computer Science, 2018, 45(9): 220-223.
[9] ZHAO Cheng, CHEN Jun-xin, YAO Ming-hai. XSS Attack Detection Technology Based on SVM Classifier [J]. Computer Science, 2018, 45(11A): 356-360.
[10] LI Rui-long, LIANG Yuan and ZHANG Song-hai. Cartoon Animations Segmentation and Vectorization Based on Canny Optimization [J]. Computer Science, 2017, 44(8): 27-30.
[11] HAN Lin, XU Jin-long, LI Ying-ying and WANG Yang. Method of Loop Distribution and Aggregation for Partial Vectorization [J]. Computer Science, 2017, 44(2): 70-74.
[12] CHEN Yong and XU Chao. Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method [J]. Computer Science, 2016, 43(Z6): 461-466.
[13] YU Hai-ning, HAN Lin and LI Peng-yuan. Structure Optimization for Automatic Vectorization [J]. Computer Science, 2016, 43(2): 210-215.
[14] XU Jin-long ZHAO Rong-cai ZHAO Bo. Research on Non-full Length Usage of SIMD Vector Instruction [J]. Computer Science, 2015, 42(7): 229-233.
[15] LI Peng-yuan, ZHAO Rong-cai, GAO Wei and ZHANG Qing-hua. Effective Vectorization Technique for Interleaved Data with Constant Strides [J]. Computer Science, 2015, 42(5): 194-199.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!