Computer Science ›› 2021, Vol. 48 ›› Issue (11A): 699-704.doi: 10.11896/jsjkx.201200150
• Interdiscipline & Application • Previous Articles Next Articles
LI Shuang, ZHAO Rong-cai, WANG Lei
CLC Number:
[1]GOTO K,GEIJN R A.Anatomy of high-performance matrixmultiplication[J].ACM Transactions on Mathematical Software (TOMS),2008,34(3):1-25. [2]ZHANG X Y,WANG Q,ZHANG Y Q.Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor[C]//2012 IEEE 18th International Conference on Parallel and Distributed Systems.Singapore,2012:684-691. [3]WANG E,ZHANG Q,SHEN B,et al.Intel math kernel library[M].High-Performance Computing on the Intel© Xeon Phi-.Springer,Cham,2014:167-188. [4]AMD.2012.AMD Core Math Library[OL].http://developer.amd.com/tools/cpu/acml/pages/default.aspx. [5]cuBLAS.Basic Linear Algebra on NVIDIA GPUs[OL].https://developer.nvidia.com/cublas. [6]GOTO K,VAN DE GEIJN R.High-performance implementa-tion of the level-3 BLAS[J].ACM Transactions on Mathematical Software (TOMS),2008,35(1):1-14. [7]JIANG M Q,ZHANG Y Q,SONG G,et al.Research on High Performance Implementation Mechanism of GOTOBLAS General Matrix-matrix Multiplication[J].Computer Engineering,2008(7):84-86,103. [8]LIU H,LIU F F,ZHANG P,et al.Optimization of BLAS Level 3 Functions on SW1600[J].Computer System Application,2016,25(12):234-239. [9]LIU Z,TIAN X.Vectorization of Matrix Multiplication forMulti-core Vector Processors[J].Chinese Journal of Compu-ters,2018,41(10):2251-2264. [10]VAN ZEE F G,SMITH T M.Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods[J].ACM Transactions on Mathematical Software,2017,44(1):1-36. [11]KIM K,COSTA T B,DEVECIM,et al.Designing vector-friendly compact BLAS and LAPACK kernels[C]//IEEE International Conference on High Performance Computing Data and Analytics.2017. [12]Chengdu Sunway Technology Corporation Limited.2017.Sun-way1621 processor structure manual[OL].http://www.swcpu.cn/uploadfile/2018/0709/20180709030836489.pdf. |
[1] | YAO Jian-yu, ZHANG Yi-wei, ZHANG Guang-ting, JIA Hai-peng. High Performance Implementation and Optimization of Trigonometric Functions Based on SIMD [J]. Computer Science, 2021, 48(12): 29-35. |
[2] | GONG Tong-yan,ZHANG Guang-ting,JIA Hai-peng,YUAN Liang. High-performance Implementation Method for Even Basis of Cooley-Tukey FFT [J]. Computer Science, 2020, 47(1): 31-39. |
[3] | ZHOU Bei, HUANG Yong-zhong, XU Jin-chen, GUO Shao-zhong. Study on SIMD Method of Vector Math Library [J]. Computer Science, 2019, 46(1): 320-324. |
[4] | JIN Xing-tong, LI Peng, WANG Gang, LIU Xiao-guang and LI Zhong-wei. Optimizing Small XOR-based Non-systematic Erasure Codes [J]. Computer Science, 2017, 44(6): 36-42. |
[5] | HAO Xin and GUO Shao-zhong. Optimization of 3D Finite Difference Algorithm on Intel MIC [J]. Computer Science, 2017, 44(5): 26-32. |
[6] | CHEN Yong and XU Chao. Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method [J]. Computer Science, 2016, 43(Z6): 461-466. |
[7] | YU Hai-ning, HAN Lin and LI Peng-yuan. Structure Optimization for Automatic Vectorization [J]. Computer Science, 2016, 43(2): 210-215. |
[8] | XU Jin-long ZHAO Rong-cai ZHAO Bo. Research on Non-full Length Usage of SIMD Vector Instruction [J]. Computer Science, 2015, 42(7): 229-233. |
[9] | SUN Hui-hui, ZHAO Rong-cai, GAO Wei and LI Yan-bing. Control Flow Vectorization Based on Conditions Classification [J]. Computer Science, 2015, 42(11): 240-247. |
[10] | GONG Qing-kui, ZHANG Chang-you, ZHANG Xian-yi and ZHANG Yun-quan. Primary Investigation into Parallel Computing in Julia Language [J]. Computer Science, 2015, 42(1): 44-46. |
[11] | XU Ying,LI Chun-jiang,DONG Yu-shan and ZHOU Si-qi. Implementation of Auto-vectorization Based on Directives in GCC [J]. Computer Science, 2014, 41(Z11): 364-367. |
[12] | LIU Peng,ZHAO Rong-cai,ZHAO Bo and GAO Wei. Unified Vectorization Framework for SIMD Extensions [J]. Computer Science, 2014, 41(9): 28-31. |
[13] | HOU Yong-sheng,ZHAO Rong-cai,HUANG Lei and HAN Lin. Research on SIMD-oriented Loop Optimizations [J]. Computer Science, 2014, 41(5): 27-32. |
[14] | ZHAO Bo,ZHAO Rong-cai,LI Yan-bing and GAO Wei. SLP Exploitation Method for Type Conversion Statements [J]. Computer Science, 2014, 41(11): 16-21. |
[15] | LI Chun-jiang,XU Ying,HUANG Juan-juan and YANG Can-qun. Formal Description of Design Space of SIMD Instruction Sets [J]. Computer Science, 2013, 40(6): 32-36. |
|