计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 87-92.doi: 10.11896/jsjkx.191000011

所属专题: 高性能计算

• 高性能计算 • 上一篇    下一篇

大规模申威众核环境下二维数据计算的可扩展方法

庄园, 郭强, 张洁, 曾云辉   

  1. 齐鲁工业大学(山东省科学院) 济南 250101
    山东省计算中心(国家超级计算济南中心) 济南 250101
    山东省计算机网络重点实验室 济南 250101
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 曾云辉(zengyh@sdas.org)
  • 作者简介:zhuangy@sdas.org
  • 基金资助:
    国家重点研发计划项目(2016YFB0201100)

Large Scalability Method of 2D Computation on Shenwei Many-core

ZHUANG Yuan, GUO Qiang, ZHANG Jie, ZENG Yun-hui   

  1. Qilu University of Technology (Shandong Academy of Sciences), Jinan 250101, China
    Shandong Computer Science Center (National Supercomputer Center in Jinan), Jinan 250101, China
    Shandong Provincial Key Laboratory of Computer Networks, Jinan 250101, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:ZHUANG Yuan, born in 1991, postgra-duate, engineer.His main research interests include high performance computing and so on.
    ZENG Yun-hui, born in 1975, Ph.D, researcher, is a member of China ComputerFederation.His main research interests include numerical simulation and high performance computing.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2016YFB0201100).

摘要: 随着超级计算机及其编程环境的发展, 异构系统结构下的多级并行编程将成为趋势, 神威·太湖之光国产超级计算机就是其中的一个典型。自2016年神威·太湖之光运行以来, 国内外很多学者在其上进行了方法研究和应用验证, 为申威环境积累了比较丰富的众核化编程方法及优化方法。但是, 将全球系统模式CESM移植到申威众核环境时, 对于海洋分量模式POP中的一些二维数据计算, 常用的众核优化方法在1024进程规模下运行时具有较好的加速效果, 然而在16800大规模进程下运行时众核化会失效, 表现为负加速。针对上述问题, 文中提出了一种基于从核分区的并行计算方法, 一个核组内的64个从核被分成多个互不交叉的从核分区, 将可以独立计算的多个代码段计算任务分别分配到不同的从核分区上进行运行, 能够有效利用从核的计算能力, 还可以实现对多个独立的代码段进行计算时间隐藏。每个从核分区内的从核数量及从核号可以根据拟分配的计算任务情况进行适当选取, 使得每个从核都能达到较适宜的数据量和计算量。在采用前述从核分区方法的基础上, 结合使用循环合并和函数上提等方法增大程序并行粒度, 提高了二维数据计算在大规模进程下的可扩展性, CESM模式高分辨率G算例中POP分量模式在110万核心规模下的模拟速度提高了0.8模式年/天, 众核化的加速效果明显。

关键词: 并行粒度, 从核分区, 大规模可扩展性, 二维数据计算, 申威众核

Abstract: With the development of supercomputer and its programming environment, multilevel parallelism under heterogeneous system infrastructure is a promising trend.Applications ported to Sunway TaihuLight are typical.Since the Sunway TaihuLight was open to public in 2016, many scholars focus on the method study and application verification, so much experience on Shenwei many-core programming method is accumulated.However, when the CESM model is ported to Shenwei many-core infrastructure, some two dimensional computations in the ported POP model show quite good results under 1024 processes.On the contrary, they perform much worse than the original version, and false acceleration ratios appeared under 16800 processes.Upon this problem, a new parallel method based on slave-core partitions was proposed.Under the new parallel method, the 64 slave-cores in a core-group are divided into some disjoint small partitions, which make that different and independent computing kernels can run at different slave-core partitions simultaneously.In the method, the computing kernels can be loaded to different slave-core partitions with the suitable data size and computational load, where the amount and number of the slave-cores in each partition can be pro-perly set according to the computing scale, so the slave-core’s calculation ability can be fully utilized.Based on the new parallel method, also with the loops combination and function expansion, the slave-cores are fully applied and some computing time among several parallel running codes is hidden.Furthermore, it is effective to extend the parallel granularity of the kernels to be athrea-ded.Applied the above methods, the simulation speed of POP model in high-resolution CESM G-compset is improved by 0.8 si-mulation year per day under 1.1 million cores.

Key words: 2D-array computation, Large scalability, Parallel granularity, Shenwei many-core, Slave-core partition

中图分类号: 

  • TP391
[1]WAN X Q, LIU Z D, SHEN B, et al.Introduction to the Community Earth System Model and Application to High Perfor-mance Computing [J].Advance in Earth Science, 2014, 29(4):482-491.
[2]FU H, LIAO J, XUE W, et al.Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer.Supercomputing Conference[C]∥Salt Lake City, Utah, USA.2016:69-980.
[3]FU H H, LIAO J F, DING N, et.al.Redesigning CAM-SE for Petascale Climate Modeling Performance on Sunway TaihuLight[C]∥Supercomputing Conference.Denver, USA, 2017.
[4]SMITH R D, GENT P.Reference manual for the Parallel Ocean Program (POP)[R].Los Alamos Unclassified Report LA-UR-02-2484, 2002.
[5]ZHANG L L, ZHAO J, WU J P, et al.Parallel Computing of POP Ocean Model on Quad-Core Intel Xeon Cluster [J].Computer Engineering and Application, 2009, 45(5):189-192.
[6]GUO S, DOU Y, LEI Y W.GPU Parallel Optimization of the Oceanic General Circulation Model POP [J].Computer Engineering and Science, 2012, 34(8):147-153.
[7]ZHAO W, LEI X Y, CHEN D X, et al.Porting and Application of Global Eddy-Resolving Parallel Ocean Mode POP to SW Supercomputer [J].Computer Application and Software, 2014, 31(5):42-45.
[8]WU Q, NI Y F, HUANG X M.Regional Ocean Model Parallel Optimization in “Sunway TaiHuLight ”[J].Journal of Computer Research and Development, 2019, 56(7):1556-1566.
[9]DUAN X H, GAO P, ZHANG T J, et al.Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:148-159.
[10]LIN H, ZHU X W, YU B W, et al.ShenTu:Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds [C]∥Supercomputing Conference.Dallas, Texas, USA, 2018:706-716.
[11]LIU X, GUO H, SUN R J, et al.The Characteristic Analysis and Exascale Suggestions of Large Scale Parallel Applications on Sunway TaiHuLight Supercomputer[J].Chinese Journal of Computers, 2018, 41(10):2209-2220.
[12]LI F, LI Z H, XU J X, et al.Research on Adaptation of CFD Software Based on Many-core Architecture of 100P Domestic Supercomputing System[J].Computer Science, 2020, 47(1):24-30.
[13]ZHOU Y.Implementation and optimization of Lattice QCD numerical simulation based on the Sunway platform[D].Hangzhou:Zhejiang University, 2019.
[14]LIU K, WANG X L, XU P, et al.A Parallel Tridiagonal Solver on Sunway Many-core Processors[C]∥HPC China.2018.
[15]SU Z C.Design and Implement of a Dataflow ProgrammingModel on [D].Hefei:University of Science and Technology of chine, 2018.
[16]WANG X.Study of the Parallel Physical Optics on Sunway Platform[D].Xi’an:Xidian University.2016.
[17]FU H H, HE C H, CHEN B W, et al.18.9-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight:Enabling Depiction of 18-Hz and 8-Meter Scenarios[C]∥Supercomputing Conference.Denver, USA.(ACM Gordon Bell Prize), 2017:1-12.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 王明, 武文芳, 王大玲, 冯时, 张一飞.
生成链接树:一种高数据真实性的反事实解释生成方法
Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity
计算机科学, 2022, 49(9): 33-40. https://doi.org/10.11896/jsjkx.220300158
[3] 张佳, 董守斌.
基于评论方面级用户偏好迁移的跨领域推荐算法
Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer
计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[4] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[5] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[6] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[7] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[8] 吕晓锋, 赵书良, 高恒达, 武永亮, 张宝奇.
基于异质信息网的短文本特征扩充方法
Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network
计算机科学, 2022, 49(9): 92-100. https://doi.org/10.11896/jsjkx.210700241
[9] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[10] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[11] 曹晓雯, 梁美玉, 鲁康康.
基于细粒度语义推理的跨媒体双路对抗哈希学习模型
Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model
计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[12] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[13] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[14] 曲倩文, 车啸平, 曲晨鑫, 李瑾如.
基于信息感知的虚拟现实用户临场感研究
Study on Information Perception Based User Presence in Virtual Reality
计算机科学, 2022, 49(9): 146-154. https://doi.org/10.11896/jsjkx.220500200
[15] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!