Computer Science ›› 2024, Vol. 51 ›› Issue (9): 15-22.doi: 10.11896/jsjkx.231000204
• High Performance Computing • Previous Articles Next Articles
GUO Shuaizhe, GAO Jianhua, JI Weixing
CLC Number:
[1]SAAD Y,SCHULTZ M H.GMRES:A generalized minimal residual algorithm for solving nonsymmetric linear systems[J].Society for Industrial and Applied Mathematics,1986,7(3):856-869. [2]DAVIS T A,HU Y F.The University of Florida Sparse Matrix Collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25. [3]Top500.June 2023 List[EB/OL].[2023-10-01].https://top500.org/lists/top500/2023/06/. [4]NVIDIA.NVLink&NVSwitch[EB/OL].[2023-10-01].https://www.nvidia.com/en-us/data-center/nvlink/. [5]KHODJA L Z,COUTURIER R,GIERSCH A,et al.Parallelsparse linear solver with GMRES method using minimization techniques of communications for GPU clusters[J].The Journal of Supercomputing,2014,69:200-224. [6]ROCm Software Platform.rocALUTION[EB/OL].[2023-10-01].https://github.com/ROCmSoftwarePlatform/rocALUTION. [7]NVIDIA Blog.TensorFloat-32 in the A100 GPU Accelerates AI Training,HPC up to 20x[EB/OL].(2020-05-14)[2023-10-01].https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/. [8]Intel.BFLOAT16- hardware numerics definition[EB/OL].(2018-11)[2023-10-01].https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf. [9]MICIKEVICIUS P,STOSIC D,BURGESS Net al.FP8 Formats for Deep Learning[J].arXiv:2209.05433v1,2022. [10]IOANNIDIS E I,CHEIMARIOS N,SPYROPOULOS A N,et al.On the performance of various parallel GMRES implementations on CPU and GPU clusters[J].arXiv:1906.04051,2019. [11]YAMAZAKI I,RAJAMANICKAM S,BOMAN E G,et al.Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.New Orleans(SC 14).LA,USA,2014:933-944. [12]BAHI J M,COUTURIER R,KHODJA L Z,et al.ParallelGMRES implementation for solving sparse linear systems on GPU clusters[C]//Proceedings of the 19th High Performance Computing Symposia(HPC '11).2011. [13]MATSUMOTO K,IDOMURA Y,INAT,et al.Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster[J].The Journal of Supercomputing,2019,75:8115-8146. [14]HE K,TAN S X,ZHAO H Y,et al.Parallel GMRES solver forfast analysis of large linear dynamic systems on GPU platforms[J].Integration,2016,52:10-22. [15]LACOSTE X.Scheduling and memory optimizations for sparse direct solver on multi-core/multi-gpu duster systems[C]//Distributed,Parallel,and Cluster Computing.2015. [16]LINDQUIST N,LUSZCZEK P,DONGARRA J,et al.Accelerating Restarted GMRES With Mixed Precision Arithmetic[J].IEEE Transactions on Parallel and Distributed Systems,2022,33(4):1027-1037. [17]BOUCHARD A,PARENTEAU M,LAURENDEAU É.Towarda Multi-GPU Implementation of a GMRES Solver in CHAMPS[C]//The 8th Annual Chapel Implementers and Users Workshop.2021. [18]MA W P,HU Y W,YUANW,et al.Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices[J].Mathematical Problems in Engineering:Theory,Methods and Applications,2021,2021(Pt.9):6804723.1-6804723.17. [19]DEVRIES B,IANNELLI J,TREFFTZ C,et al.Parallel Implementations of FGMRES for Solving Large,Sparse Non-symme-tric Linear Systems[J].Procedia Computer Science,2013,18:491-500. [20]ZHANG J,DENG L,LI RT,et al.Achieving high performance and portable parallel GMRES algorithm for compressible flow simulations on unstructured grids[J].The Journal of Supercomputing,2023,79:20116-20140. |
[1] | ZHENG Xu, FAN Hongjie, LIU Junfei. Pre-allocated Capacity Quota Limiting System Based on Microservice [J]. Computer Science, 2024, 51(6): 346-353. |
[2] | FENG Liao-liao, DING Yan, LIU Kun-lin, MA Ke-lin, CHANG Jun-sheng. Research Advance on BFT Consensus Algorithms [J]. Computer Science, 2022, 49(4): 329-339. |
[3] | TAN Shuang-jie, LIN Bao-jun, LIU Ying-chun, ZHAO Shuai. Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning [J]. Computer Science, 2022, 49(2): 336-341. |
[4] | LU Yong-chao, WANG Bin-yi, HU Jiang-feng, MU Yang, REN Jun-long. Research on Integrated Electronic Time Synchronization Technology [J]. Computer Science, 2021, 48(6A): 629-632. |
[5] | LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12. |
[6] | FAN Jian-feng, LI Yi, WU Wen-yuan, FENG Yong. Double Blockchain Based Station Dynamic Loop Information Monitoring System [J]. Computer Science, 2019, 46(12): 155-164. |
[7] | LI Yan, MA Jun-ming, AN Bo, CAO Dong-gang. Web Based Lightweight Tool for Big Data Processing and Visualization [J]. Computer Science, 2018, 45(9): 60-64. |
[8] | MA Liang-li and LIU Qing. Researches of Redundancy Coding Technologies on Reducing Reconstruction Data Amount [J]. Computer Science, 2017, 44(Z6): 463-469. |
[9] | HUANG Yong WU Jin-zhao. Security Model for Distributed System Based on Seal Calculus [J]. Computer Science, 2015, 42(7): 178-181. |
[10] | PAN Zhong-qiang and CHANG Xin-feng. Efficient Key Management Scheme for Data-centric Storage Wireless Sensor Networks [J]. Computer Science, 2014, 41(Z11): 277-281. |
[11] | WANG Yu,ZHAO Yue-long and HOU Fang. Minimum Redundancy Storage Regeneration Code Research MSRRC Based on Matrix Operation [J]. Computer Science, 2014, 41(Z11): 191-194. |
[12] | REN Guo-chao,WANG Jiang and MA Xiao-xing. ConUp:SCA Middleware with Dynamic Component Updating Support [J]. Computer Science, 2014, 41(9): 60-62. |
[13] | HE Pan, YUAN Yue and WU Kai-gui. Optimal Multi-objective Monitoring Resources Allocation in Distributed Systems [J]. Computer Science, 2014, 41(5): 64-67. |
[14] | ZHANG Ya-jun,LI Zhou-jun,LIAO Xiang-ke,JIANG Rui-cheng and LI Hai-feng. Survey of Automated Whitebox Fuzz Testing [J]. Computer Science, 2014, 41(2): 7-10. |
[15] | HUANG Xiang and CHEN Zhi-gang. Performance Modeling Approach for Resource Sensitive Distributed Systems [J]. Computer Science, 2013, 40(9): 174-181. |
|