Computer Science ›› 2019, Vol. 46 ›› Issue (6A): 518-522.

• Interdiscipline & Application • Previous Articles     Next Articles

Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids

NI Hong, LIU Xin   

  1. National Research Centre of Parallel Computer Engineering and Technology,Beijing 100086,China
  • Online:2019-06-14 Published:2019-07-02

Abstract: Sparse Triangular Solver (SpTRSV),as an important algorithm in basic linear algebraic library,has been widely used in large-scale scientific computing.In unstructured-grids,because unstructured grid have the characte-ristics of data storage disorder,data depth correlation and frequent discrete-time memory access,this algorithm is difficult to achieve effective parallelism in the many-core architecture.In this paper,based on the architecture of the domestic heterogeneous multiprocessor SW26010 architecture,a general kernel optimization method based on pipelined serial and local parallel was proposed for unstructured grid computing.This method can effectively reduce random access in unstructured grid computing,improve the computing efficiency,and have the good scalability.Based on this algorithm,multiple kernel optimization is carried out for several practical applications.The experimental results show that the method can achieve more than 3 times acceleration of the single core group and significantly reduce the running time.

Key words: Heterogeneous many-core Optimization, Parallel algorithm, SpTRSV, SW26010, Unstructured-grids

CLC Number: 

  • TP311
[1]ANDERSON E,SAAD Y.Solving sparse triangular linear systems on parallel computers[J].International Journal of High Speed Computing,1989,1(1):73-95.
[2]DUFF I S,ERISMAN A M,REID J K.Direct Methods for Sparse Matrices[J].Matehematics of Computation,1986,52(185):250.
[3]SAAD Y.Iterative methods for sparse linear systems[OL].http://ieeexplore.ieee.org/ielx5/99/12132/001231631.pdf.
[4]REGULY I Z,MUDALIGE G R,BERTOLLI C,et al.Acceleration of a full-scale industrial cfd application with op2.IEEE Transactionson Parallel and Distributed Systems,2016,27(5):1265-1278.
[5]YAO C,YANG X,WANG W,et al.26 pflops stencil computations for atmospheric modeling on sunway taihulight[C]∥2017 IEEE International Parallel and Distributed Processing Sympo-sium(IPDPS).IEEE,2017:535-544.
[6]CHEN T,LI M,LI Y,et al.Mxnet:A flexible and efficient machine learning library for heterogeneous distributed systems[J].arXiv preprint arXiv:1512.01274,2015.
[7]ANDERSON E,SAAD Y.Solving sparse triangular linear systems on parallel computers[J].International Journal of High Speed Computing,1989,1(1):73-95.
[8]KALIKAR S,NASRE R.Domlock:A new multi-granularity locking technique for hierarchies[C]∥Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2016.
[9]SCHREIBER R,TANG W P.Vectorizing the conjugate gradient method[D].Los Angeles:Stanford University,1982.
[10]LIU W,LI A,HOGG J D,et al.Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides[J].Concurrency and Computation:Practice and Experience,2017,29(21):e4244-n/a.
[11]PICCIAU A,INGGS G E,WICKERSON J,et al.Balancing locality and concurrency:Solving sparse triangular systems on gpus[C]∥2016 IEEE 23rd International Conference on High Performance Computing(HiPC).2016:183-192.
[12]敖玉龙.国产大型众核系统上稀疏矩阵和Stencil运算的性能优化关键技术研究[D].北京:中科院软件研究所,2017:27-47.
[13]王欣亮.关键稀疏数值计算核心在国产众核架构上的性能优化研究[D].北京:清华大学,2018:24-43.
[14]QI F B.Sunway TaihuLight super computer[J].Communications of the CCF,2017,13(10):16-22.
[15]ANDERSON W K,GROPP W D,KAUSHIK D K,et al.Achieving High Sustained Performance in an Unstructured Mesh CFD Application[C]∥Conference on High Performance Computing (Super Computing).1999.
[16]林恒.基于超大规模异构体系结构的图计算系统研究[D].北京:清华大学,2018:23-53.
[1] ZHU Yu, PANG Jian-min, XU Jin-long, TAO Xiao-han, WANG Jun. Adaptive Tiling Size Algorithm for 3D Stencil Computation on SW26010 Many-core Processor [J]. Computer Science, 2021, 48(6): 10-18.
[2] HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40.
[3] LI Yu-rong, LIU Jie, LIU Ya-lin, GONG Chun-ye, WANG Yong. Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation [J]. Computer Science, 2020, 47(8): 49-55.
[4] LIU Xiao-nan, JING Li-na, WANG Li-xin, WANG Mei-ling. Large-scale Quantum Fourier Transform Simulation Based on SW26010 [J]. Computer Science, 2020, 47(8): 93-97.
[5] YUAN Xin-hui, LIN Rong-fen, WEI Di, YIN Wan-wang, XU Jin-xiu. Optimization of BFS on Domestic Heterogeneous Many-core Processor SW26010 [J]. Computer Science, 2020, 47(8): 98-104.
[6] LV Xiao-jing, LIU Zhao, CHU Xue-sen, SHI Shu-peng, MENG Hong-song, HUANG Zhen-chun. Extreme-scale Simulation Based LBM Computing Fluid Dynamics Simulations [J]. Computer Science, 2020, 47(4): 13-17.
[7] LI Fang,LI Zhi-hui,XU Jin-xiu,FAN Hao,CHU Xue-sen,LI Xin-liang. Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System [J]. Computer Science, 2020, 47(1): 24-30.
[8] TAO Xiao-han, PANG Jian-min, GAO Wei, WANG Qi, YAO Jin-yang. Performance Optimization of FT Program Based on SW26010 Processor [J]. Computer Science, 2019, 46(4): 321-328.
[9] ZHU Chao, WU Su-ping. Parallel Harris Feature Point Detection Algorithm [J]. Computer Science, 2019, 46(11A): 289-293.
[10] ZHOU Jie and LI Wen-jing. Research on Parallel Algorithm of Petri Net Based on Three-layer Mixed Programming Model [J]. Computer Science, 2017, 44(Z11): 586-591.
[11] ZHU Kai-long, LU Yu-liang and ZHANG Yan-qing. Study on Calculation Method for Internet Topological Parameters Based on MapReduce [J]. Computer Science, 2017, 44(6): 80-82.
[12] TANG Bing, Laurent BOBELIN and HE Hai-wu. Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model [J]. Computer Science, 2017, 44(3): 51-54.
[13] SHI Song, NING Yong-bo, LI Hong-liang and ZHENG Fang. Multi-layer Partition Hash Join Algorithm on Array-based Manycore Architecture [J]. Computer Science, 2016, 43(3): 18-22.
[14] LI Hai-peng, LI Jing-jiao, YAN Ai-yun, WANG Ai-xia and WANG Jiao. Parallel Implementation of Genetic Neural Network in Face Recognition [J]. Computer Science, 2015, 42(Z6): 168-170.
[15] BI Shuo-ben,CHEN Dong-qi,YAN Jian and GUO Yi. Planar Delaunay Triangulation Algorithm Based on 2D Convex Hull [J]. Computer Science, 2014, 41(10): 317-320.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!