Computer Science ›› 2022, Vol. 49 ›› Issue (6): 108-118.doi: 10.11896/jsjkx.210300259

• High Performance Computing • Previous Articles     Next Articles

GPU-based Parallel DILU Preconditioning Technique

WANG Jin, LIU Jiang   

  1. 1 Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences,Chongqing 400714,China
    2 School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2021-03-25 Revised:2021-07-27 Online:2022-06-15 Published:2022-06-08
  • About author:WANG Jin,born in 1995,postgraduate.His main research interests include pa-rallel computing and iterative methods for linear systems.
    LIU Jiang,born in 1979,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include computability theory,formal methods and computer algorithms.
  • Supported by:
    National Natural Science Foundation of China(61672488) and National Key R & D Program of China(2018YFC0116704).

Abstract: Large sparse linear equations often appear in scientific computation and engineering.There are many iterative methods and preconditioning techniques for solving these linear equations.Diagonal-based incomplete LU (DILU) is a preconditioning technique similar to incomplete LU (ILU) factorization.DILU is applied in OpenFOAM,an open source computational fluid dynamics software,and is a very important preconditioning technique in OpenFOAM.DILU has not received extensive attention outside OpenFOAM,and there is no complete GPU-based implementation so far.This paper compares DILU preconditioned BiCGStab with ILU preconditioned BiCGStab,and the time elapses in preconditioner constructions.The numeric experiments suggest that DILU may be more efficient and stable than ILU.As for GPU-based parallel implementations,this paper discusses two parallel schemes,that are level-set scheme and synchronization-free scheme,and gives related algorithms and some codes under these two parallel schemes.It compares the performances of DILU preconditioning technique under two parallel schemes.The numeric results show that each scheme has its own advantages and disadvantages in different equations,and we can select one according to their performances in practice.This paper compares the performance of DILU preconditioning on GPU and CPU,and the results show that GPU is more competitive.The applications that have performance bottlenecks on linear systems solutions can be improved by moving to GPU platforms.

Key words: CUDA, DILU, GPU, Iterative methods, Open FOAM, Preconditioning, Sparse triangular equations

CLC Number: 

  • TP391.9
[1] SAAD Y.Iterative Methods for Sparse Linear Systems[M/OL].SIAM,2003.https://www-users.cse.umn.edu/~saad/itermethbook_2nded.pdf.
[2] GUTKNECHT M H.A brief introduction to Krylov spacemethods for solving linear systems[M]//Frontiers of Computational Science.Berlin:Springer,2007:53-62.
[3] LANGR D,TVRDIK P.Evaluation Criteria for Sparse Matrix Storage Formats[J].IEEE Transactions on Parallel and Distri-buted Systems,2016,27(2):428-440.
[4] DAVIS T A,RAJAMANICKAM S,SID-LAKHDAR W M.ASurvey of Direct Methods for Sparse Linear Systems[J].Acta Numerica,2016,25:383-566.
[5] HACKBUSCH W.Iterative Solution of Large Sparse Systems of Equations (2nd ed)[M].Berlin:Springer,2016.
[6] FILIPPONE S,CARDELLINI V,BARBIERI D,et al.SparseMatrix-vector Multiplication on GPGPUs[J].ACM Transactions on Mathematical Software (TOMS),2017,43(4):30.
[7] STEINBERGER M,DERLERY A,ZAYER R,et al.How Naive is Naive SpMV on the GPU?[C]//2016 IEEE High Perfor-mance Extreme Computing Conference(HPEC).IEEE,2016:1-8.
[8] GAO J,QI P,HE G.Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU[J].Mathematical Problems in Enginee-ring,2016,2016:1-14.
[9] BELL N,GARLAND M.Efficient sparse matrix-vector multiplication on CUDA:Nvidia Technical Report:NVR-2008-004[R].Nvidia Corporation,2008.
[10] BARRETT R,BERRY M W,CHAN T F,et al.Templates forthe Solution of Linear Systems:Building Blocks for Iterative Methods[M].SIAM Other Titles in Applied Mathematics,1994.
[11] BEHRENS T.OpenFOAM’s basic solvers for linear systems of equations[J/OL].Chalmers,Department of Applied Mechanics,2009,18(2).http://www.tfd.chalmers.se/~hani/kurser/os_cfd_2008/timberhrens/tibeh-report-fin.pdf.
[12] OpenFOAM Project[EB/OL].https://openfoam.com/.
[13] OWENS J D,HOUSTON M,LUEBKE D,et al.GPU computing[J].Proceedings of the IEEE,2008,96(5):879-899.
[14] MEIJERINK J A,VAN DER VORST H A.An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix[J].Mathematics of Computation,1977,31(137):148-162.
[15] KERSHAW D S.The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations[J].Journal of Computational Physics,1978,26(1):43-65.
[16] SAAD Y.ILUT:A dual threshold incomplete LU factorization[J].Numerical Linear Algebra with Applications,1994,1(4):387-402.
[17] POMMERELL C.Solution of large unsymmetric systems of li-near equations[D].ETH Zürich:Swiss Federal Institute of Technology Zurich,1992.
[18] ANDERSON E,SAAD Y.Solving Sparse Triangular LinearSystem on Parallel Computers[J].International Journal of High Speed Computing,1989,1(1):73-95.
[19] NAUMOV M.Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU:NVIDIA Technical Report:NVR-2011-001[R].2011.
[20] LI R,SAAD Y.GPU-accelerated preconditioned iterative linear solvers[J].The Journal of Supercomputing,2013,63(2):443-466.
[21] NAUMOV M,CASTONGUAY P,COHEN J.Parallel GraphColoring with Applications to the Incomplete-LU Factorization on the GPU:NVIDIA Technical Report:NVR-2015-001[R].2015.
[22] LIU W,LI A,HOGG J,et al.A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves[C]//European Conference on Parallel Processing.Cham:Springer,2016:617-630.
[23] LI R.On parallel solution of sparse triangular linear systems in CUDA[J].arXiv:1710.04985,2017.
[24] SU J,ZHANG F,LIU W,et al.CapelliniSpTRSV:A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs[C]//49th International Conference on Parallel Processing-ICPP.2020.
[25] XIE C,CHEN J,FIROZ J S,et al.Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures[J].arXiv:.06959,2020.
[26] ANZT H,RIBIZEL T,FLEGAR G,et al.ParILUT-A Parallel Threshold ILU for GPUs[C]//2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).2019:231-241.
[27] CHEN Y,TIAN X,LIU H,et al.Parallel ILU preconditioners in GPU computation[J].Soft Computing,2018,22(24):8187-8205.
[28] NAUMOV M.Incomplete-LU and Cholesky preconditioned ite-rative methods using CUSPARSE and CUBLAS[J/OL].Nvidia white paper,2011:1-16.https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.8934&rep=rep1&type=pdf.
[29] BNA S,SPISSOA I,OLESENB M,et al.PETSc4FOAM:A Library to plug-in PETSc into the OpenFOAM Framework[J/OL].PRACE White paper,2020.https://prace-ri.eu/wp-content/uploads/WP294-PETSc4FOAM-A-Library-to-plug-in-PETSc-into-the-OpenFOAM-Framework.pdf.
[30] Paralution project[EB/OL].http://www.paralution.com.
[31] SAAD Y,SCHULTZ M H.GMRES:A generalized minimal residual algorithm for solving nonsymmetric linear systems[J].SIAM Journal on Scientific and Statistical Computing,1986,7(3):856-869.
[32] VAN DER VORST H A.Bi-CGSTAB:A fast and smoothlyconverging variant of Bi-CG for the solution of nonsymmetric linear systems[J].SIAM Journal on Scientific and Statistical Computing,1992,13(2):631-644.
[33] JENSEN T R,TOFT B.Graph Coloring Problems[M/OL].1995.https://doi.org/10.3929/ethz-a-000669614.
[34] SALTZ J H.Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors[J].SIAM Journal on Scientific and Statistical Computing.1990,11(1):123-144.
[35] KNUTH D E.The art of computer programming,volume 3:(2nd ed.) sorting and searching[M].Addison Wesley Longman Publishing Co.,Inc.,1998,17(4):324.
[36] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to algorithms second edition[M].MIT Press,2001.
[37] HORN R A,JOHNSON C R.The Hadamard product[M]//Topics in Matrix Analysis:Cambridge University Press,1991:298-381.
[1] GUO Zheng-wei, FU Ze-wen, LI Ning, BAI Lan. Study on Acceleration Algorithm for Raw Data Simulation of High Resolution Squint Spotlight SAR [J]. Computer Science, 2022, 49(8): 178-183.
[2] ZONG Di-di, XIE Yi-wu. Model Medial Axis Generation Method Based on Normal Iteration [J]. Computer Science, 2022, 49(6A): 764-770.
[3] LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204.
[4] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[5] WEN Min-hua, WANG Shen-peng, WEI Jian-wen, LI Lin-ying, ZHANG Bin, LIN Xin-hua. DGX-2 Based Optimization of Application for Turbulent Combustion [J]. Computer Science, 2021, 48(12): 43-48.
[6] WANG Liang, ZHOU Xin-zhi, YNA Hua. Real-time SIFT Algorithm Based on GPU [J]. Computer Science, 2020, 47(8): 105-111.
[7] LIU Shi-fang, ZHAO Yong-hua, YU Tian-yu, HUANG Rong-feng. Efficient Implementation of Generalized Dense Symmetric Eigenproblem StandardizationAlgorithm on GPU Cluster [J]. Computer Science, 2020, 47(4): 6-12.
[8] ZUO Xian-yu, ZHANG Zhe, SU Yue-han, LIU Yang, GE Qiang, TIAN Jun-feng. Extraction Algorithm of NDVI Based on GPU Multi-stream Parallel Model [J]. Computer Science, 2020, 47(4): 25-29.
[9] XU Xin-peng, HU Bin-xing. Fast Calculation Method of Aircraft Component Strength Check Based on ICCG [J]. Computer Science, 2020, 47(11A): 624-627.
[10] KANG Lin-yao, TANG Bing, XIA Yan-min, ZHANG Li. GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm [J]. Computer Science, 2019, 46(8): 106-110.
[11] ZHENG Hong-bo, SHI Hao, DU Yi-cheng, ZHANG Mei-yu, QIN Xu-jia. Fast Stripe Extraction Method for Structured Light Images with Uneven Illumination [J]. Computer Science, 2019, 46(5): 272-278.
[12] ZHANG Yi-ran, CHEN Long, AN Xiang-zhe, YAN Shen-gen. Study on Performance Optimization of Reduction Algorithm Targeting GPU Computing Platform [J]. Computer Science, 2019, 46(2): 306-314.
[13] LU Xian-hua, WANG Hong-jun. Design of Distributed News Clustering System Based on Big Data Computing Framework [J]. Computer Science, 2019, 46(11A): 220-223.
[14] ZHU Chao, WU Su-ping. Parallel Harris Feature Point Detection Algorithm [J]. Computer Science, 2019, 46(11A): 289-293.
[15] SU Qing-hua, FU Jing-chao, GU Han, ZHANG Shan-shan, LI Yi-fei, JIANG Fang-zhou, BAI Han-lin, ZHAO Di. Parallel Algorithm Design for Assisted Diagnosis of Prostate Cancer [J]. Computer Science, 2019, 46(11A): 524-527.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!