Computer Science ›› 2020, Vol. 47 ›› Issue (8): 32-40.doi: 10.11896/jsjkx.200500093

;

Previous Articles     Next Articles

Parallelizing Multigrid Application Using Data-driven Programming Model

GUO Jie1, GAO Xi-ran2, CHEN Li2, FU You1, LIU Ying2,   

  1. 1 College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China
    2 State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:GUO Jie, born in 1996, postgraduate.His main research interests includeparal-lel optimization and parallel compilation.
    CHEN Li, born in 1970, Ph.D, associate professor, is a member of China Computer Federation.Her main research interests include parallel programming languages and parallelizing compiling techniques.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61521092), National Key R&D Program of China(2016YFB0200803) and Key R&D Project of Shandong Province(2019GGX101066).

Abstract: Multigrid is an important family of algorithms to accelerate the convergence of iterative solvers for linear systems, and it plays an important role in large-scale scientific computing.At present, distributed-memory systems have evolved to large scale systems based on multi-core nodes or heterogeneous nodes with accelerators.Legacy applications face the urgent need to be ported to modern supercomputers with diverse node-level architectures.In this paper, a data-driven programming language, AceMesh is introduced, and using this directive language, NAS MG is ported to two home-made supercomputers which are Tianhe-2 and Sunway TaihuLight supercomputer.This paper shows how to taskify computation loops and communication-related codes in AceMesh, and analyzes the characteristics on its task graph and on its computation-communication overlapping.Experimental results show that compared with traditional programming models, the AceMesh versions achieve relative speedup up to 1.19X and 1.85X on Sunway TaihuLight and Tianhe-2 respectively.Analyses show that performance improvements come from two main reasons, memory-related optimization and communication overlapping optimization.At last, future directions are put forward to further optimize inter-process communications for the AceMesh version.

Key words: Computation-communication overlap, Data-driven task parallel programming model, Heterogeneous many-core, MPI legacy application, Multigrid

CLC Number: 

  • TP311
[1] BRANDT A.Multiscale computational methods:research activities[C]∥Proceedings of 1991 Hang Zhou International Conf.on Scientific Computation.Singapore:World Scientific Publishing Co., 1992.
[2] BRANDT A.Multi-Level Adaptive Solutions to Boundary-ValueProblems.Mathematics of Computation, 1977, 31(138):333-390.
[3] HACKBUSCH W.Multi-Grid Methods and Applications.Heidelberg:Springer, 1985.
[4] NAKAJIMA K.Optimization of serial and parallel communications for parallel geometric multigrid method∥Proceedings of IEEE International Conference on Parallel and Distributed Systems(ICPADS).Hsinchu, Taiwan, 2014:25-32.
[5] LIU X Z, LU Z H, HU X D, et al.Large-scale Parallel CFD Simulation Software-CCFD Development and Application[C]∥HPC China 2019.2019.
[6] LEI J, LIU W, ZHOU Y L, et al.CFD unsteady flow simulations using GPU with high-order schemes[C]∥HPC China 2019.2019.
[7] WANG W, XU C F, CHE Y G.A Heterogeneous Parallel Algorithm Based on Inner-Out Subdomain Dividing for High Order CFD Solver[C]∥HPC China 2019.2019.
[8] NVIDIA, the Portland Group.The openacc application programming interface.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.698.5254&rep=rep1&type=pdf.
[9] OpenMP Architecture Review Board.OpenMP Application Program Interface(Version 4.0).http://www.openmp.org/.
[10] DURAN A, AYGUADE E, BADIA R M, et al.OmpSs:A Proposal for Programming Heterogeneous Multi-core Architectures[J].Parallel Processing Letters, 2011, 21(2):173-193.
[11] AUGONNET C, THIBAULT S, NAMYST R, et al.StarPU:A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.Concurrency and Computation-Practice &Experience, 2011, 23(2):187-198.
[12] Intel Inc.Intel CilkTM Plus.https://www.cilkplus.org.
[13] Intel Inc.Intel Threading Building Blocks Documentation.https://software.intel.com/en-us/node/506286.
[14] BRIGGS W L, EMDEN H V, MCCORMICK S F.A Multigrid Tutorial, 2nd Edition.Society for Industrial and Applied Mathematics, 2000.
[15] WAGNER C.Introduction to Algebraic Multigrid.http://www.mgnet.org/mgnet/papers/Wagner/amgV11.pdf.
[16] BAILEY D H, BARSZCZ E, BARTON J T, et al.The NAS Parallel Benchmarks.https://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf.
[17] XU Z, LIN J, MATSUOKA S.Benchmarking SW26010 Many-Core Processor[C]∥IEEE International Parallel & Distributed Processing Symposium Workshops.IEEE, 2017.
[18] FU H H, LIAO J F, YANG J Z, et al.The Sunway Taihu Light supercomputer:system and applications.Science China(Information Sciences), 2016, 59(7):113-128.
[19] LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System.Computer Science, 2020, 47(1):24-30.
[20] BASU P, VENKAT A, HALL M, et al.Compiler generation and autotuning of communication-avoiding operators for geometric multigrid[C]∥High Performance Computing.2013:452-461.
[21] CHAN C, ANSEL J, WONG Y L, et al.Autotuning multigrid with petabricks[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing Networking.New York:ACM, 2009.
[22] CHRISTEN M, SCHENK O, BURKHART H.PATUS:A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures[C]∥Parallel & Distributed Processing Symposium(IPDPS) 2011 IEEE International.2011:676-687.
[23] MARJANOVIC V, LABARTA J, AYGUADE E, et al.Overlapping communication and computation by using a hybrid MPI/SMPSs approach[C]∥Proceedings of the 24th ACM International Conference on Supercomputing.2010:5-16.
[24] CASTILLO E, JAIN N, CASAS M, et al.Optimizing computation-communication overlap in asynchronous task-based programs[C]∥Proceedings of the ACM International Conference on Supercomputing(ICS ’19).New York:Association for Computing Machinery, 2019:380-391.
[1] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[2] YUAN Xin-hui, LIN Rong-fen, WEI Di, YIN Wan-wang, XU Jin-xiu. Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 [J]. Computer Science, 2020, 47(8): 98-104.
[3] NI Hong, LIU Xin. Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids [J]. Computer Science, 2019, 46(6A): 518-522.
[4] CHENG Dong-sheng,LIU Zhi-yong,XUE Guo-wei,GAO Yue-fang. High-performance Parallel Preconditioned Iterative Solver for Helmholtz Equation with Large Wavenumbers [J]. Computer Science, 2018, 45(7): 299-306.
[5] MENG De-long, WEN Min-hua, WEI Jian-wen and James LIN. Porting and Optimizing OpenFOAM on Sunway TaihuLight System [J]. Computer Science, 2017, 44(10): 64-70.
[6] XU Jin-chen,GUO Shao-zhong,HUANG Yong-zhong and WANG Lei. Access Optimization Technique for Mathematical Library of Slave Processors on Heterogeneous Many-core Architectures [J]. Computer Science, 2014, 41(6): 12-17.
[7] . Parallel Multigrid Approach for Solving Poisson PDE in Gigapixel Image Editing [J]. Computer Science, 2013, 40(3): 59-61.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!