Computer Science ›› 2022, Vol. 49 ›› Issue (6): 73-80.doi: 10.11896/jsjkx.210900045

• High Performance Computing • Previous Articles     Next Articles

Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture

YE Yue-jin1, LI Fang1, CHEN De-xun2, GUO Heng2, CHEN Xin1   

  1. 1 National Supercomputing Center in Wuxi,Wuxi,Jiangsu 214000,China
    2 Department of Computer Science and Technology,Tsinghua University,Beijin 100084,China
  • Received:2021-09-05 Revised:2022-02-21 Online:2022-06-15 Published:2022-06-08
  • About author:YE Yue-jin,born in 1991,master,engineer,is a member of China Computer Federation.His main research interests include high performance computing and so on.
    LI Fang,born in 1980,postgraduate,Ph.D,associate professor.Her main research interests include high perfor-mance computing and so on.
  • Supported by:
    National High Performance Computing Foundation of China(2020YFB0204804,2016YFB0201100).

Abstract: How to efficiently solve the discrete-memory-accessing problem of unstructed-grid is one of the hot-spot issues in the field of parallel algorithms and application in scientific and engineering computing.The distributed block reconnection optimization algorithm,which is designed on the basis of domestic Sunway heterogeneous many-core architecture,can maintain high computing performance when solving the problem of unstructured sparsity in applications.After deeply analyzing the on-chip communication mechanism of the many-core architecture,an efficient message grouping strategy is designed to improve the bandwidth utilization of on-chip array on the slave core.At the same time,a barrier-free data distribution algorithm is combined to give full play to the network perfor-mance of the domestic heterogeneous many-core architecture.Through the establishment of perfor-mance models and experimental analysis,the average memory bandwidth of the proposed algorithm can reach more than 70% of the theoretical value under different memory access situations.Compared with the serial algorithm on the master core,it has an ave-rage of 10 times and a maximum of 45 times performance acceleration.At the same time,the universal applicability of the algorithm is proved by application tests in different fields.

Key words: Barrier-free data distribution, Domestic many-core architecture, Message grouping, On-chip communication, Unstructed-grid

CLC Number: 

  • TP311
[1] LI YY,XUE W,CHEN D X,et al.Performance optimization of sparse matrix vector multiplication on Sunway many-core architecture[J].Chinese Journal of Computers,2020,43(6):1011-1020.
[2] ZHENG F,LI H L,LV H,et al.Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture[J].Journal of Computer Science and Techno-logy,2015,30(1):145-162.
[3] GUNNELS J A,HENRY G M,VAN DE GEIJN R A.A Family of High-Performance Matrix Multiplication Algorithms[C]//Proceedings of the International Conference on Computational Sciences-Part I.London,UK,UK:Springer-Verlag,2001:51-60.
[4] GOTO K,VAN DE GRIJN R.High-performance Implementation of the Level-3BLAS[J].ACM Transaction on Mathematical Software,2008,35(4):1-14.
[5] CHECCONI F,PETRINI F,WILLCOCK J,et al.Breaking the speed and scalability barriers for graph exploration on distributed-memory machines[C]//International Conference on Storage Anal & High Performance Computing Networking.SC12,2012.
[6] UENO K,SUZUMURA T,MARUYAMA N,et al.Exremescale breath- first search chon super computer[C]//Big Data (Big Data).IEEE International Conference,2016:1040-1047.
[7] BEAMER S,BULUC A,ASANOVIC K,et al.Distributed me-mory breadth-first search revisited:Enabling bottom-up search[C]//Parallel and Distributed Porcessing Symposium Workshops.IEEE International Conference,2013:1618-1627.
[8] CHECCONI F,PETRINI F.Traversing trillions of edges in real time:Graph exploration on large scale parallel machines[C]//International Conference & International Parallel and Distributed Processing Symposium.IEEE International Conference,2014:425-434.
[9] BISSON M,BERNASCHI M,MASTRONSTEFANO E.Parallel Distributed Breadth First Search on the Kepler Architecture[J].IEEE Transaction on Parallel and Distributed System,2016,27(7):2091-2102.
[10] LIAO J F.Redesigning CAM-SE for Peta-Scale Climate Mode-ling Performance on Sunway TaihuLight[D].Beijing:Tsinghua University,2017.
[11] LI F,LI Z H,XU J X,et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Chinese Journal of Computers,2020,47(1):1-8.
[12] AO Y L.Research on Key Optimizations of Sparse Matrix and Stencil Computation for the Domestic Large Many-core System[D].Hefei:University of Science and Technology of China,2017.
[13] AN H,YU Y,CHEN J S,et al.Pipelining Computation and Optimization Strategies for Scaling GROMACS on the Sunway Many-core Processor[C]//International Conference on Algorithms and Architectures for Parallel Processing.2018:134-137.
[14] KOURTIS K,KARAKASIS V,GOUMAS G,et al.Csx:An extended compression format for spmv on shared memory system[J].ACM SIGPLAN Notices,2011,46(2):247-256
[15] SUN Q,ZHANG C Y.Bandwith reduced parallel SpMV on the SW26010 many-core platform[C]//Proceedings of the 47th International Conference on Parallel Processing Eugence.USA,2018:1-10.
[16] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs[C]//Proceedings of the 28th ACM International Conference on Supercomputing.ACM,2014:273-282.
[17] LIU C X,XIE B W,LIU X,et al.Towards efficeient SpMV on sunway many-core architectures[C]//Proceedings of the 2018 International Conference on Supercomputing.Portland,USA,2018:363-373.
[18] NI H,LIU X.Many-core Optimization Technology Of Unstructured-grid On SunWay TaihuLight[J].Computer Engineering,2019,45(6):51-57.
[19] LIN H.Extreme-scale graph analysis on heterogeneous architecture[D].Beijing:Tsinghua University,2017.
[20] APHU E S,BRANTSON E T,ADDO B J,et al.Development of Finite Difference Explicit and Implicit Numerical Reservoir Simulator for Modelling Single Phase Flow in Porous Media[J].Earth Science,2018,134:2-10.
[1] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[2] WANG Zi-kai, ZHU Jian, ZHANG Bo-jun, HU Kai. Research and Implementation of Parallel Method in Blockchain and Smart Contract [J]. Computer Science, 2022, 49(9): 312-317.
[3] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[4] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[5] PAN Zhi-yong, CHENG Bao-lei, FAN Jian-xi, BIAN Qing-rong. Algorithm to Construct Node-independent Spanning Trees in Data Center Network BCDC [J]. Computer Science, 2022, 49(7): 287-296.
[6] LI Tang, QIN Xiao-lin, CHI He-yu, FEI Ke. Secure Coordination Model for Multiple Unmanned Systems [J]. Computer Science, 2022, 49(7): 332-339.
[7] HUANG Jue, ZHOU Chun-lai. Frequency Feature Extraction Based on Localized Differential Privacy [J]. Computer Science, 2022, 49(7): 350-356.
[8] ZHAO Jing-wen, FU Yan, WU Yan-xia, CHEN Jun-wen, FENG Yun, DONG Ji-bin, LIU Jia-qi. Survey on Multithreaded Data Race Detection Techniques [J]. Computer Science, 2022, 49(6): 89-98.
[9] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[10] WANG Yi, LI Zheng-hao, CHEN Xing. Recommendation of Android Application Services via User Scenarios [J]. Computer Science, 2022, 49(6A): 267-271.
[11] FU Li-yu, LU Ge-hao, WU Yi-ming, LUO Ya-ling. Overview of Research and Development of Blockchain Technology [J]. Computer Science, 2022, 49(6A): 447-461.
[12] JIANG Cheng-man, HUA Bao-jian, FAN Qi-liang, ZHU Hong-jun, XU Bo, PAN Zhi-zhong. Empirical Security Study of Native Code in Python Virtual Machines [J]. Computer Science, 2022, 49(6A): 474-479.
[13] YUAN Hao-nan, WANG Rui-jin, ZHENG Bo-wen, WU Bang-yan. Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric [J]. Computer Science, 2022, 49(6A): 490-495.
[14] CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600.
[15] ZHANG Ji-lin, SHAO Yu-cao, REN Yong-jian, YUAN Jun-feng, WAN Jian, ZHOU Li. Dynamic Customization Model of Business Processes Supporting Multi-tenant [J]. Computer Science, 2022, 49(6A): 705-713.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!