Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230300200-8.doi: 10.11896/jsjkx.230300200
• Computer Software & Architecture • Previous Articles Next Articles
LEI Chao1,2, LIU Jiang1,2, SONG Jiawen3
CLC Number:
[1]AHMED N,NATARAJAN T,RAO K R.Discrete cosine transform[J].IEEETransactions on Computers,1974,100(1):90-93. [2]GERSHO A,GRAY R M.Vector quantization and signal compression:volume 159[M].Springer Science Business Media,2012. [3]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Na-ture,2015,521(7553):436-444. [4]STRASSEN V.Gaussian elimination is not optimal[J].Nume-rische Mathematik,1969,13(4):354-356. [5]COPPERSMITH D,WINOGRAD S.Matrix multiplication viaarithmetic progressions[C]//Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing.1987:1-6. [6]DAVIS T A,RAJAMANICKAM S,SID-LAKHDAR W M.A survey of direct methods for sparse linear systems[J/OL].Acta Numerica,2016,25:383-566.https://www.cambridge.org/core/article/survey-of-direct-methods-for-sparse-linear-systems/8AE7AC55909389F7EA1F027855AC4044.DOI:10.1017/S0962492916000076. [7]CASASENT D,TAYLOR B K.Banded-matrix high-perfor-mance algorithm and architecture[J/OL].Applied Optics,1985,24(10):1476-1480.https://opg.optica.org/ao/abstract.cfm?URI=ao-24-10-1476.DOI:10.1364/AO.24.001476. [8]CHALLACOMBE M.A general parallel sparse-blocked matrix multiply for linear scaling scf theory[J/OL].Computer Physics Communications,2000,128(1):93-107.https://www.sciencedirect.com/science/article/pii/S0010465500000746.DOI:https://doi.org/10.1016/S0010-4655(00)00074-6. [9]RUBENSSON E H,RUDBERG E.Locality-aware parallelblock-sparse matrix-matrix multiplication using the chunks and tasks programming model[J/OL].Parallel Computing,2016,57:87-106.https://www.sciencedirect.com/science/article/pii/S0167819116300606.DOI:https://doi.org/10.1016/j.parco.2016.06.005 [10]CHOWDHERY A,NARANG S,DEVLIN J,et al.Palm:Scaling language modeling with pathways[J].arXiv:2204.02311,2022. [11]ASAIMEH M,DENOLF K,LO J,et al.Comparing energy efficiency of CPU,GPU and FPGA implementations for vision kernels[C]//2019 IEEE International Conference on Embedded Software and Systems(ICESS).IEEE:1-8. [12]LINDHOLM E,NICKOLLS J,OBERMAN S,et al.NVIDIATesla:A unified graphics and computing architecture[J/OL].IEEE Micro,2008,28(2):39-55.https://dx.doi.org/10.1109/mm.2008.31. [13]NVIDIA,VINGELMANN P,FITZEK F H.CUDA,release:10.2.89[EB/OL].2020.https://developer.nvidia.com/cuda-toolkit. [14]FATAHALIAN K,SUGERMAN J,HANRAHAN P.Under-standing the efficiency of GPU algorithms for matrix-matrix multiplication[C]//Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.133-137. [15]HERAULT T,ROBERT Y,BOSILCA G,et al.Generic matrix multiplication for multi-gpu accelerated distributed-memory platforms over parsec[C]//2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems(ScalA).IEEE:33-41. [16]CLBLAS[EB/OL].https://github.com/clMathLibraries/cl-BLAS. [17]WANG Z W,CHENG L L,ZHAO W Q.Parallel Computation Performance Analysis Model Based on GPU[J].Computer Science,2014,41(1):31-38. [18]YIN M J,XU X B,XIONG Z G,et al.Quantitative Performance Analysis Model of Matrix Multiplication Based on GPU[J].Computer Science,2015,42(12):13-17. [19]DALTON S,OLSON L,BELL N.Optimizing sparse matrix-matrix multiplication for the ssGPU[J].ACM Transactions on Mathematical Software(TOMS),2015,41(4):1-20. [20]CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to algorithms,third edition(3rd ed)[M].The MIT Press,2009. [21]GUIDE D.CUDA C best practices guide[J].NVIDIA,July,2013. [22]STIVAL P,GIRAUD L.Performance and accuracy of the matrix multiplication routines:cuBLAS on NVIDIA tesla versus MKL and ATLAS on Intel nehalem[Z].2010. [23]KHALILOV M,TIMOVEEV A.Performance analysis of CUDA,OpenACC and OpenMP programming models on Tesla v100 GPU[J].Journal of Physics:Conference Series,2021,1740(1):012056. [24]BARRACHINA S,CASTILLO M,IGUAL F D,et al.Evaluation and tuning of the level 3 cuBLAS for graphics processors[C]//2008 IEEE International Symposium on Parallel and Distributed Processing.IEEE:1-8. [25]WONG H,PAPADOPOULOU M M,SADOOGHI-ALVANDI M,et al.Demystifying GPU microarchitecture through microbenchmarking[C]//IEEE International Symposium on Performance Analysis of Systems & Software(ISPASS 2010).2010:235-246.DOI:10.1109/ISPASS.2010.5452013. |
[1] | WANG Zhuang, WANG Pinghui, WANG Bincheng, WU Wenbo, WANG Bin, CONG Pengyu. GPU Shared Scheduling System Under Deep Learning Container Cloud Platform [J]. Computer Science, 2023, 50(6): 86-91. |
[2] | MO Shangfeng, ZHOU Zhenfen, HU Yonghua, XU Minmin, MAO Chunxian, YUAN Yudi. Transplantation and Optimization of Row-vector-matrix Multiplication in Complex Domain Based on FT-M7002 [J]. Computer Science, 2023, 50(11A): 220900277-6. |
[3] | GUO Zheng-wei, FU Ze-wen, LI Ning, BAI Lan. Study on Acceleration Algorithm for Raw Data Simulation of High Resolution Squint Spotlight SAR [J]. Computer Science, 2022, 49(8): 178-183. |
[4] | ZONG Di-di, XIE Yi-wu. Model Medial Axis Generation Method Based on Normal Iteration [J]. Computer Science, 2022, 49(6A): 764-770. |
[5] | WANG Jin, LIU Jiang. GPU-based Parallel DILU Preconditioning Technique [J]. Computer Science, 2022, 49(6): 108-118. |
[6] | HUANG Jia-wei, LI Xiao-peng, LING Cheng. Scalable Parallel Computing Method for Conditional Likelihood Probability of Nucleotide Molecular Phylogenetic Tree Based on GPU [J]. Computer Science, 2022, 49(11A): 210800189-7. |
[7] | WANG Bo-yang, PANG Jian-min, XU Jin-long, ZHAO Jie, TAO Xiao-han, ZHU Yu. Matrix Multiplication Vector Code Generation Based on Polyhedron Model [J]. Computer Science, 2022, 49(10): 44-51. |
[8] | LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204. |
[9] | HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28. |
[10] | WEN Min-hua, WANG Shen-peng, WEI Jian-wen, LI Lin-ying, ZHANG Bin, LIN Xin-hua. DGX-2 Based Optimization of Application for Turbulent Combustion [J]. Computer Science, 2021, 48(12): 43-48. |
[11] | HAN Xiao-dong, GAO Fei, ZHANG Li-wei. Novel Real-time Algorithm for Critical Path of Linear Network Coding [J]. Computer Science, 2020, 47(9): 232-237. |
[12] | WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111. |
[13] | LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12. |
[14] | ZUO Xian-yu, ZHANG Zhe, SU Yue-han, LIU Yang, GE Qiang, TIAN Jun-feng. Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model [J]. Computer Science, 2020, 47(4): 25-29. |
[15] | XU Xin-peng, HU Bin-xing. Fast Calculation Method of Aircraft Component Strength Check Based on ICCG [J]. Computer Science, 2020, 47(11A): 624-627. |
|