计算机科学 ›› 2020, Vol. 47 ›› Issue (1): 24-30.doi: 10.11896/jsjkx.181102176
李芳1,李志辉2,徐金秀1,范昊1,褚学森3,李新亮4
LI Fang1,LI Zhi-hui2,XU Jin-xiu1,FAN Hao1,CHU Xue-sen3,LI Xin-liang4
摘要: 国产众核处理器提供了两种移植难度相差较大的众核级并行编程语言。不同流体力学软件对众核架构适应性的不同,决定了它们在移植优化过程中适合于不同的编程语言。首先介绍了国产众核处理器的体系结构、编程模型和并行编程语言;然后分析了流体力学软件应用于国产众核处理器存在的挑战性问题,包括隐格式带来的数据相关性、大型稀疏矩阵线性代数方程组求解、多重网格方法和非结构网格等,这些问题限制了软件对众核架构的适应性。文中针对这些难题分别提出了创新的优化算法,并通过理论分析和实验得到了几种典型流体力学软件的众核适应性研究结论。实践证明,多数流体力学软件对国产众核处理器的适应性良好,能够采用OpenACC编译器自动移植,并扩展到百万核并行规模,能保持较高的并行效率。
中图分类号:
[1]ZHENG F,LI H L,LV H,et al.Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture[J].Journal of Computer Science and Technology,2015,30(1):145-162. [2]FU H H,LIAO J F,YANG J Z,et al.The Sunway Taihulight supercomputer:system and applications[J].Science China Information Sciences,2016,59(7):72-91. [3]YANG C,XUE W,FU H H,et al.10m-core scalable fully- implicit solver for nonhydrostatic atmospheric dynamics[C]∥Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2016:6-15. [4]ZHANG J,ZHOU C B,WANG Y G,et al.Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer [C]∥International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2016:34-45. [5]FU H H,XUE W,YANG C,et al.Redesigning CAM-SE for Peta-Scale Climate Modeling Performance on Sunway TaihuLight [C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2017:4-12. [6]FU H H,LIAO J F,YANG J Z,et al.15-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight:Enabling Depiction of Realistic 10 Hz Scenarios[C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2017:102-117. [7]QIAO F L,ZHAO W,YIN X Q,et al.A highly effective global surface wave numerical simulation with ultra-high resolution[C]∥High Performance Computing,Networking,Storage and Analysis.IEEE,2016:46-56. [8]HOU C F,XU J,WANG P,et al.Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals[J].MOLECULAR SIMULATION,2012,38(1):8-15. [9]HOU C F,XU J,WANG P,et al.Petascale molecular dynamics simulation of crystalline silicon on Tianhe-1A[J].International Journal of High Performance Computing Applications,2013,27(3):307-317. [10]LI D,XU Z M,LI S,et al.A survey on information diffusion in online social networks [J].Chinese Journal of Computers,2014,37(1):189-206. [11]LIN H,TANG X C,YU B W,et al.Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores[C]∥2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE Computer Society,2017. [12]LIN J,XU Z G,NUKADA A.Optimizations of Two Compute-bound Scientific Kernels on SW26010 Many-core Processor[C]∥Proceedings of the 46th International Conference on Pa-rallel Processing.IEEE,2017. [13]XU Z G,LIN J,MATSUOKA S.Benchmarking Sunway SW26010 Manycore Processor[C]∥Proceedings of The Seventh International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) (IPDPS workshop).Orlando,USA,2017. [14]AN H.Pipelining Computation and Data Reuse Strategies for Scaling GROMACS on the Sunway Many-core Processor[C]∥18th International Conference on Algorithms and Architectures for Parallel Processing(ICA3PP-2018).IEEE,2018. [15]YOU H T,ZHANG L B,MAO Z H.OpenACC2.0 VS OpenMP4.0 Comparation of Two Popular Programming Language Based on Compilation Instructions[J].High Performance Computing,2014,227:20-25. [16]何沧平.OpenACC并行编程实战[M].北京:机械工业出版社,2016. [17]LIAO J F.Redesigning CAM-SE for Peta-Scale Climate Mode- ling Performance on Sunway TaihuLight [D].Beijing:Tsinghua University,2017. [18]AO Y L.Research on Key Optimizations of Sparse Matrix and Stencil Computation for the Domestic Large Many-core System[D].Beijing:University of Chinese Academy of Sciences,2017. [19]NI H.Research on Heterogeneous parallel computing technology of CFD in unstructured grids[D].Wuxi:The 56th Institute of PLA,2018. [20]LI Z Z.Research on parallel multi grid of unstructured grids [D].Changsha:National University of Defense Technology,2012. [21]MENG D L,WEN M H,WEI J W.Porting and Optimizing OpenFOAM on Sunway TaihuLight System.Computer Science,2017,10(44):64-70. [22]LIN H.Extreme-scale graph analysis on heterogeneous architecture[D].Beijing:Tsinghua University,2017. [23]XU J X,YOU H T.Application of Many-core Programming Language OpenACC in Solving of Boltzmann Equations[J].High Performance Computing,2016(2):7-12. [24]LI Z H,ZHANG H X.Parallel Computing of Three Dimension Complex Gas Motion Flow[J].Journal of Aerodynamics,2010,28(1):7-16. |
[1] | 高健博, 张家硕, 李青山, 陈钟. RegLang:一种面向监管的智能合约编程语言 RegLang:A Smart Contract Programming Language for Regulation 计算机科学, 2022, 49(6A): 462-468. https://doi.org/10.11896/jsjkx.210700016 |
[2] | 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫. 基于国产众核架构的非结构网格分区块重构预处理算法研究 Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture 计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045 |
[3] | 刘聃, 郭绍忠, 郝江伟, 许瑾晨. 基于SIMD扩展部件的长向量超越函数实现方法 Implementation of Transcendental Functions on Vectors Based on SIMD Extensions 计算机科学, 2021, 48(6): 26-33. https://doi.org/10.11896/jsjkx.200400007 |
[4] | 高枫越, 王琰, 朱铁兰. 有适应力的分布式状态估计方法 Resilient Distributed State Estimation Algorithm 计算机科学, 2021, 48(5): 308-312. https://doi.org/10.11896/jsjkx.200300117 |
[5] | 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇. 面向语音分离的深层转导式非负矩阵分解并行算法 Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation 计算机科学, 2020, 47(8): 49-55. https://doi.org/10.11896/jsjkx.190900202 |
[6] | 胡浩, 沈莉, 周清雷, 巩令钦. 基于LLVM编译器的节点融合优化方法 Node Fusion Optimization Method Based on LLVM Compiler 计算机科学, 2020, 47(6A): 561-566. https://doi.org/10.11896/JsJkx.191100017 |
[7] | 倪鸿, 刘鑫. 非结构网格下稀疏下三角方程求解器众核优化技术研究 Many-core Optimization for Sparse Triangular Solver Under Unstructured Grids 计算机科学, 2019, 46(6A): 518-522. |
[8] | 史征, 徐明星. 用于票房收益预测的国产电影信息数据库 Database of Chinese Domestic Films for Fox-office Revenue Forecasting 计算机科学, 2019, 46(11A): 149-152. |
[9] | 朱超, 吴素萍. 并行Harris特征点检测算法 Parallel Harris Feature Point Detection Algorithm 计算机科学, 2019, 46(11A): 289-293. |
[10] | 朱江, 陈森. 基于NAWL-ILSTM的网络安全态势预测方法 Network Security Situation Prediction Method Based on NAWL-ILSTM 计算机科学, 2019, 46(10): 161-166. https://doi.org/10.11896/jsjkx.180901820 |
[11] | 徐启泽, 韩文廷, 陈俊仕, 安虹. 众核平台上广度优先搜索算法的优化 Optimization of Breadth-first Search Algorithm Based on Many-core Platform 计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049 |
[12] | 周杰,李文敬. 基于三层混合编程模型的Petri网并行算法研究 Research on Parallel Algorithm of Petri Net Based on Three-layer Mixed Programming Model 计算机科学, 2017, 44(Z11): 586-591. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.126 |
[13] | 廖新考,王力生,刘晓建,许晓洁. 网络环境下的个性化信任模型PTM Personalized Trust Model in Network Environment 计算机科学, 2017, 44(8): 100-106. https://doi.org/10.11896/j.issn.1002-137X.2017.08.019 |
[14] | 唐兵,Laurent BOBELIN,贺海武. 基于MPI和OpenMP混合编程的非负矩阵分解并行算法 Parallel Algorithm of Nonnegative Matrix Factorization Based on Hybrid MPI and OpenMP Programming Model 计算机科学, 2017, 44(3): 51-54. https://doi.org/10.11896/j.issn.1002-137X.2017.03.013 |
[15] | 李元平,李华,赵俊岚. 有限状态机模型测试序列生成算法研究 Research about FSM Test Sequence Generation Algorithm 计算机科学, 2016, 43(Z11): 474-481. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.107 |
|