Computer Science ›› 2025, Vol. 52 ›› Issue (4): 291-300.doi: 10.11896/jsjkx.241100030
• High Performance Computing • Previous Articles Next Articles
LI Qing1,2, JIA Haipeng2, ZHANG Yunquan2, ZHANG Sijia1
CLC Number:
[1]ZHANG Y Q,DANG L,YUAN L,et al.Analysis of the development status of high-performance computers in China in 2023[J].Computer Engineering and Science,2023,45(12):2091. [2]XU S,WANG W,ZHANG J,et al.High performance computingalgorithms and software for heterogeneous computing[J].Journal of Software,2021,32(8):2365-2376. [3]FAN Z,QIU F,KAUFMAN A,et al.GPU cluster for high performance computing[C]//SC’04:Proceedings of the 2004 ACM/IEEE Conference on Supercomputing.IEEE,2004:47-47. [4]HAGER W W.Applied numerical linear algebra[M].Society for Industrial and Applied Mathematics,2021. [5]MUKUNOKI D,IMAMURA T,TAKAHASHI D.Fast imple-mentation of general matrix-vector multiplication(GEMV) on Kepler GPUs[C]//2015 23rd Euromicro International Confe-rence on Parallel,Distributed,and Network-Based Processing.IEEE,2015:642-650. [6]MUKUNOKI D,OGITA T.Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs[J].Journal of Computational and Applied Mathematics,2020,372:112701. [7]LONG X.Research on key issues of resource scheduling and optimization in heterogeneous computing[D].Beijing:Beijing University of Posts and Telecommunications,2023. [8]TIAN Z,YANG S,ZHANG C.Accelerating Sparse General Ma-trix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU[C]//Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing.2023:329-330. [9]ZHANG B,ZENG H,PRASANNAV.Hardware acceleration of large scale GCN inference[C]//2020 IEEE 31st International Conference on Application-specific Systems,Architectures and Processors(ASAP).IEEE,2020:61-68. [10]SCHOONHOVEN R A,VAN WERKHOVEN B,BATEN-BURG K J.Benchmarking optimization algorithms for auto-tuning GPU kernels[J].IEEE Transactions on Evolutionary Computation,2022,27(3):550-564. [11]HU Y,CHEN D K,YANG C,et al.Crowd-core parallel optimization of level 1 and 2 BLAS functions for SW26010-Pro[J].Journal of Software,2023,34(9):4421-4436. [12]YAO J,SHI B,XIANG C,et al.Iaat:A input-aware adaptive tuning framework for small gemm[C]//2021 IEEE 27th International Conference on Parallel and Distributed Systems(ICPADS).IEEE,2021:899-906. [13]ABDELFATTAH A,COSTA T,DONGARRA J,et al.A set of batched basic linear algebra subprograms and LAPACK routines[J].ACM Transactions on Mathematical Software(TOMS),2021,47(3):1-23. [14]GUO H,WANG H,CHEN W,et al.Optimizing sparse general matrix-matrix multiplication for DCUs[J].The Journal of Supercomputing,2024(14):80. [15]CHANG W B,MOU M R,JIA H P,et al.Research on the implementation and optimization of image filtering algorithm based on OpenGL ES[J].Computer Engineering,2023,49(11):257-266. [16]MINISKAR N R,MONIL M A H,VALERO-LARA P,et al.Iris-blas:Towards a performance portable and heterogeneous blas library[C]//2022 IEEE 29th International Conference on High Performance Computing,Data,and Analytics(HiPC).IEEE,2022:256-261. [17]SCHIEFFER G,MEDEIROS D,FAJ J,et al.Characterizing the Performance,Power Efficiency,and Programmability of AMD Matrix Cores[R].Lawrence Livermore National Laboratory(LLNL),Livermore,CA(United States),2024. [18]KIM D,KIM I,KIM J.Analysis of Sub-Routines in NVIDIA cuBLAS Library for a series of Matrix-Matrix Multiplications in Transformer[C]//2022 13th International Conference on Information and Communication Technology Convergence(ICTC).IEEE,2022:618-620. [19]WANG E,ZHANG Q,SHEN B,et al.High-performance computing on the intel xeon phi[M].Springer,2014. [20]ZHANG X Y,WANG X,ZHANG Y Q.openblas:a high performance blas library on loongson 3a cpu[J].Journal of Software,2012,22(zk2):208-216. [21]LI C,JIA H,CAO H,et al.Autotsmm:An auto-tuning framework for building high-performance tall-and-skinny matrix-matrix multiplication on cpus[C]//2021 IEEE Intl. Conf. on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking(ISPA/BDCloud/SocialCom/SustainCom).IEEE,2021:159-166. [22]RASCH A,SCHULZE R,STEUWER M,et al.Efficient auto-tuning of parallel programs with interdependent tuning parameters via auto-tuning framework(ATF)[J].ACM Transactions on Architecture and Code Optimization(TACO),2021,18(1):1-26. [23]WEI C,JIA H,ZHANG Y,et al.IrGEMM:An Input-AwareTuning Framework for Irregular GEMM on ARM and X86 CPUs[J].IEEE Transactions on Parallel and Distributed Systems,2024,35(9):1672 -1689. [24]MARKIDIS S,DER CHIEN S W,LAURE E,et al.Nvidia tensor core programmability,performance & precision[C]//2018 IEEE International Parallel and Distributed Processing Sympo-sium Workshops(IPDPSW).IEEE,2018:522-531. |
[1] | LIU Xiaonan, LIAN Demeng, DU Shuaiqi, LIU Zhengyu. Simulation of Limited Entangled Quantum Fourier Transform Based on Matrix Product State [J]. Computer Science, 2024, 51(9): 80-86. |
[2] | HAO Meng, TIAN Xueyang, LU Gangzhao, LIU Yi, ZHANG Weizhe, HE Hui. Transplantation and Optimization of Graph Matching Algorithm Based on Domestic DCUHeterogeneous Platform [J]. Computer Science, 2024, 51(4): 67-77. |
[3] | CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600. |
[4] | CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203. |
[5] | E Hai-hong, ZHANG Tian-yu, SONG Mei-na. Web-based Data Visualization Chart Rendering Optimization Method [J]. Computer Science, 2021, 48(3): 119-123. |
[6] | ZHANG Xiao, ZHANG Si-meng, SHI Jia, DONG Cong, LI Zhan-huai. Review on Performance Optimization of Ceph Distributed Storage System [J]. Computer Science, 2021, 48(2): 1-12. |
[7] | XIE Jing-ming, HU Wei-fang, HAN Lin, ZHAO Rong-cai, JING Li-na. Quantum Fourier Transform Simulation Based on “Songshan” Supercomputer System [J]. Computer Science, 2021, 48(12): 36-42. |
[8] | XU Jiang-feng and TAN Yu-long. Research on HBase Configuration Parameter Optimization Based on Machine Learning [J]. Computer Science, 2020, 47(6A): 474-479. |
[9] | ZHANG Peng-yi, SONG Jie. Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms [J]. Computer Science, 2020, 47(12): 296-303. |
[10] | XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23. |
[11] | ZHANG Ling-hao, GUI Sheng-lin, MU Feng-jun, WANG Sheng. Clone Detection Algorithm for Binary Executable Code with Suffix Tree [J]. Computer Science, 2019, 46(10): 141-147. |
[12] | XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319. |
[13] | SUN Tao, ZHANG Jun-xing. Review of SDN Performance Optimization Technology [J]. Computer Science, 2018, 45(11A): 84-91. |
[14] | SUN Zhi-long, Edwin H-M Sha, ZHUGE Qing-feng, CHEN Xian-zhang and WU Kai-jie. Research on Data Consistency for In-memory File Systems [J]. Computer Science, 2017, 44(2): 222-227. |
[15] | NI You-cong, LI Song, YE Peng and DU Xin. Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level [J]. Computer Science, 2017, 44(11): 156-163. |
|