Computer Science ›› 2020, Vol. 47 ›› Issue (8): 87-92.doi: 10.11896/jsjkx.191000011

;

Previous Articles     Next Articles

Large Scalability Method of 2D Computation on Shenwei Many-core

ZHUANG Yuan, GUO Qiang, ZHANG Jie, ZENG Yun-hui   

  1. Qilu University of Technology (Shandong Academy of Sciences), Jinan 250101, China
    Shandong Computer Science Center (National Supercomputer Center in Jinan), Jinan 250101, China
    Shandong Provincial Key Laboratory of Computer Networks, Jinan 250101, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:ZHUANG Yuan, born in 1991, postgra-duate, engineer.His main research interests include high performance computing and so on.
    ZENG Yun-hui, born in 1975, Ph.D, researcher, is a member of China ComputerFederation.His main research interests include numerical simulation and high performance computing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2016YFB0201100).

Abstract: With the development of supercomputer and its programming environment, multilevel parallelism under heterogeneous system infrastructure is a promising trend.Applications ported to Sunway TaihuLight are typical.Since the Sunway TaihuLight was open to public in 2016, many scholars focus on the method study and application verification, so much experience on Shenwei many-core programming method is accumulated.However, when the CESM model is ported to Shenwei many-core infrastructure, some two dimensional computations in the ported POP model show quite good results under 1024 processes.On the contrary, they perform much worse than the original version, and false acceleration ratios appeared under 16800 processes.Upon this problem, a new parallel method based on slave-core partitions was proposed.Under the new parallel method, the 64 slave-cores in a core-group are divided into some disjoint small partitions, which make that different and independent computing kernels can run at different slave-core partitions simultaneously.In the method, the computing kernels can be loaded to different slave-core partitions with the suitable data size and computational load, where the amount and number of the slave-cores in each partition can be pro-perly set according to the computing scale, so the slave-core’s calculation ability can be fully utilized.Based on the new parallel method, also with the loops combination and function expansion, the slave-cores are fully applied and some computing time among several parallel running codes is hidden.Furthermore, it is effective to extend the parallel granularity of the kernels to be athrea-ded.Applied the above methods, the simulation speed of POP model in high-resolution CESM G-compset is improved by 0.8 si-mulation year per day under 1.1 million cores.

Key words: 2D-array computation, Large scalability, Parallel granularity, Shenwei many-core, Slave-core partition

CLC Number: 

  • TP391
[1]WAN X Q, LIU Z D, SHEN B, et al.Introduction to the Community Earth System Model and Application to High Perfor-mance Computing [J].Advance in Earth Science, 2014, 29(4):482-491.
[2]FU H, LIAO J, XUE W, et al.Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer.Supercomputing Conference[C]∥Salt Lake City, Utah, USA.2016:69-980.
[3]FU H H, LIAO J F, DING N, et.al.Redesigning CAM-SE for Petascale Climate Modeling Performance on Sunway TaihuLight[C]∥Supercomputing Conference.Denver, USA, 2017.
[4]SMITH R D, GENT P.Reference manual for the Parallel Ocean Program (POP)[R].Los Alamos Unclassified Report LA-UR-02-2484, 2002.
[5]ZHANG L L, ZHAO J, WU J P, et al.Parallel Computing of POP Ocean Model on Quad-Core Intel Xeon Cluster [J].Computer Engineering and Application, 2009, 45(5):189-192.
[6]GUO S, DOU Y, LEI Y W.GPU Parallel Optimization of the Oceanic General Circulation Model POP [J].Computer Engineering and Science, 2012, 34(8):147-153.
[7]ZHAO W, LEI X Y, CHEN D X, et al.Porting and Application of Global Eddy-Resolving Parallel Ocean Mode POP to SW Supercomputer [J].Computer Application and Software, 2014, 31(5):42-45.
[8]WU Q, NI Y F, HUANG X M.Regional Ocean Model Parallel Optimization in “Sunway TaiHuLight ”[J].Journal of Computer Research and Development, 2019, 56(7):1556-1566.
[9]DUAN X H, GAO P, ZHANG T J, et al.Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:148-159.
[10]LIN H, ZHU X W, YU B W, et al.ShenTu:Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:706-716.
[11]LIU X, GUO H, SUN R J, et al.The Characteristic Analysis and Exascale Suggestions of Large Scale Parallel Applications on Sunway TaiHuLight Supercomputer[J].Chinese Journal of Computers, 2018, 41(10):2209-2220.
[12]LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Computer Science, 2020, 47(1):24-30.
[13]ZHOU Y.Implementation and optimization of Lattice QCD numerical simulation based on the Sunway platform[D].Hangzhou:Zhejiang University, 2019.
[14]LIU K, WANG X L, XU P, et al.A Parallel Tridiagonal Solver on Sunway Many-core Processors[C]∥HPC China.2018.
[15]SU Z C.Design and Implement of a Dataflow ProgrammingModel on [D].Hefei:University of Science and Technology of chine, 2018.
[16]WANG X.Study of the Parallel Physical Optics on Sunway Platform[D].Xi’an:Xidian University.2016.
[17]FU H H, HE C H, CHEN B W, et al.18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight:Enabling Depiction of 18-Hz and 8-Meter Scenarios[C]∥Supercomputing Conference.Denver, USA.(ACM Gordon Bell Prize), 2017:1-12.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!