Computer Science ›› 2020, Vol. 47 ›› Issue (8): 87-92.doi: 10.11896/jsjkx.191000011

Previous Articles     Next Articles

Large Scalability Method of 2D Computation on Shenwei Many-core

ZHUANG Yuan, GUO Qiang, ZHANG Jie, ZENG Yun-hui   

  1. Qilu University of Technology (Shandong Academy of Sciences), Jinan 250101, China
    Shandong Computer Science Center (National Supercomputer Center in Jinan), Jinan 250101, China
    Shandong Provincial Key Laboratory of Computer Networks, Jinan 250101, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:ZHUANG Yuan, born in 1991, postgra-duate, engineer.His main research interests include high performance computing and so on.
    ZENG Yun-hui, born in 1975, Ph.D, researcher, is a member of China ComputerFederation.His main research interests include numerical simulation and high performance computing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2016YFB0201100).

Abstract: With the development of supercomputer and its programming environment, multilevel parallelism under heterogeneous system infrastructure is a promising trend.Applications ported to Sunway TaihuLight are typical.Since the Sunway TaihuLight was open to public in 2016, many scholars focus on the method study and application verification, so much experience on Shenwei many-core programming method is accumulated.However, when the CESM model is ported to Shenwei many-core infrastructure, some two dimensional computations in the ported POP model show quite good results under 1024 processes.On the contrary, they perform much worse than the original version, and false acceleration ratios appeared under 16800 processes.Upon this problem, a new parallel method based on slave-core partitions was proposed.Under the new parallel method, the 64 slave-cores in a core-group are divided into some disjoint small partitions, which make that different and independent computing kernels can run at different slave-core partitions simultaneously.In the method, the computing kernels can be loaded to different slave-core partitions with the suitable data size and computational load, where the amount and number of the slave-cores in each partition can be pro-perly set according to the computing scale, so the slave-core’s calculation ability can be fully utilized.Based on the new parallel method, also with the loops combination and function expansion, the slave-cores are fully applied and some computing time among several parallel running codes is hidden.Furthermore, it is effective to extend the parallel granularity of the kernels to be athrea-ded.Applied the above methods, the simulation speed of POP model in high-resolution CESM G-compset is improved by 0.8 si-mulation year per day under 1.1 million cores.

Key words: 2D-array computation, Shenwei many-core, Large scalability, Slave-core partition, Parallel granularity

CLC Number: 

  • TP391
[1] WAN X Q, LIU Z D, SHEN B, et al.Introduction to the Community Earth System Model and Application to High Perfor-mance Computing [J].Advance in Earth Science, 2014, 29(4):482-491.
[2] FU H, LIAO J, XUE W, et al.Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer.Supercomputing Conference[C]∥Salt Lake City, Utah, USA.2016:69-980.
[3] FU H H, LIAO J F, DING N, et.al.Redesigning CAM-SE for Petascale Climate Modeling Performance on Sunway TaihuLight[C]∥Supercomputing Conference.Denver, USA, 2017.
[4] SMITH R D, GENT P.Reference manual for the Parallel Ocean Program (POP)[R].Los Alamos Unclassified Report LA-UR-02-2484, 2002.
[5] ZHANG L L, ZHAO J, WU J P, et al.Parallel Computing of POP Ocean Model on Quad-Core Intel Xeon Cluster [J].Computer Engineering and Application, 2009, 45(5):189-192.
[6] GUO S, DOU Y, LEI Y W.GPU Parallel Optimization of the Oceanic General Circulation Model POP [J].Computer Engineering and Science, 2012, 34(8):147-153.
[7] ZHAO W, LEI X Y, CHEN D X, et al.Porting and Application of Global Eddy-Resolving Parallel Ocean Mode POP to SW Supercomputer [J].Computer Application and Software, 2014, 31(5):42-45.
[8] WU Q, NI Y F, HUANG X M.Regional Ocean Model Parallel Optimization in “Sunway TaiHuLight ”[J].Journal of Computer Research and Development, 2019, 56(7):1556-1566.
[9] DUAN X H, GAO P, ZHANG T J, et al.Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:148-159.
[10] LIN H, ZHU X W, YU B W, et al.ShenTu:Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:706-716.
[11] LIU X, GUO H, SUN R J, et al.The Characteristic Analysis and Exascale Suggestions of Large Scale Parallel Applications on Sunway TaiHuLight Supercomputer[J].Chinese Journal of Computers, 2018, 41(10):2209-2220.
[12] LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Computer Science, 2020, 47(1):24-30.
[13] ZHOU Y.Implementation and optimization of Lattice QCD numerical simulation based on the Sunway platform[D].Hangzhou:Zhejiang University, 2019.
[14] LIU K, WANG X L, XU P, et al.A Parallel Tridiagonal Solver on Sunway Many-core Processors[C]∥HPC China.2018.
[15] SU Z C.Design and Implement of a Dataflow ProgrammingModel on [D].Hefei:University of Science and Technology of chine, 2018.
[16] WANG X.Study of the Parallel Physical Optics on Sunway Platform[D].Xi’an:Xidian University.2016.
[17] FU H H, HE C H, CHEN B W, et al.18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight:Enabling Depiction of 18-Hz and 8-Meter Scenarios[C]∥Supercomputing Conference.Denver, USA.(ACM Gordon Bell Prize), 2017:1-12.
[1] SHAN Mei-jing, QIN Long-fei, ZHANG Hui-bing. L-YOLO:Real Time Traffic Sign Detection Model for Vehicle Edge Computing [J]. Computer Science, 2021, 48(1): 89-95.
[2] YUAN Lu, ZHU Zheng-zhou, REN Ting-yu. Survey on Fake Review Recognition [J]. Computer Science, 2021, 48(1): 111-118.
[3] ZHANG Yu, LU Yi-hong, HUANG De-cai. Weighted Hesitant Fuzzy Clustering Based on Density Peaks [J]. Computer Science, 2021, 48(1): 145-151.
[4] ZHANG Yang, MA Xiao-hu. Anime Character Portrait Generation Algorithm Based on Improved Generative Adversarial Networks [J]. Computer Science, 2021, 48(1): 182-189.
[5] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[6] ZHANG Fan, HE Wen-qi, JI Hong-bing, LI Dan-ping, WANG Lei. Multi-view Dictionary-pair Learning Based on Block-diagonal Representation [J]. Computer Science, 2021, 48(1): 233-240.
[7] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[8] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[9] XU Yun-qi, HUANG He, JIN Zhong. Application Research on Container Technology in Scientific Computing [J]. Computer Science, 2021, 48(1): 319-325.
[10] YANG Jing-wei, WEI Zi-qi, LIU Lin. What Users Think about Predictive Analytics?——A Domestic Survey on NFRs [J]. Computer Science, 2020, 47(12): 18-24.
[11] JIA Jing-dong, ZHANG Xiao-man, HAO Lu, TAN Huo-bin. Analysis of Focuses of Requirements Engineering in Industry [J]. Computer Science, 2020, 47(12): 25-34.
[12] YANG Li, MA Jia-jia, JIANG Hua-xi, MA Xiao-xiao, LIANG Geng, ZUO Chun. Requirements Modeling and Decision-making for Machine Learning Systems [J]. Computer Science, 2020, 47(12): 42-49.
[13] LU Dong-dong, WU Jie, LIU Peng, SHENG Yong-xiang. Analysis of Key Developer Type and Robustness of Collaboration Network in Open Source Software [J]. Computer Science, 2020, 47(12): 100-105.
[14] CHAO Le-men. Open-source Course and Open-sourcing Intro to Data Science [J]. Computer Science, 2020, 47(12): 114-118.
[15] ZHANG Hu, ZHOU Jing-jing, GAO Hai-hui, WANG Xin. Network Representation Learning Method on Fusing Node Structure and Content [J]. Computer Science, 2020, 47(12): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .