计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 56-58.doi: 10.11896/j.issn.1002-137X.2015.11.010

• 2014年全国高性能计算机学术年会 • 上一篇    下一篇

有限元网格积分算法在MIC众核平台上的并行实现

寇大治,孔大力   

  1. 上海超级计算中心 上海201203,埃克塞特大学数学系 埃克塞特EX4 4QF
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家高技术研究发展计划(863)(2012AA01A308),国家自然科学基金(11473014),上海市科学技术委员会科研计划项目(13DZ2294500)资助

Parallel Implementation of Finite-element Mesh Integration Algorithm on Many Integrated Core

KOU Da-zhi and KONG Da-li   

  • Online:2018-11-14 Published:2018-11-14

摘要: 基于英特尔集成众核(Many Integrated Core,MIC)架构,将有限元网格积分算法在至强融核(Xeon Phi)协处理器做了移植和性能分析。该应用全面测试了有限元分析的核心计算过程在MIC上的加速效果,实现了卸载模式(offload)[1]下利用OpenMP在MIC上的线程并行化。计算性能测试结果显示集成众核平台可以有效地加速有限元网格积分算法:1)一块被充分利用的MIC设备卡(3115A)的计算能力超过两路16核Intel XeonTM E5-2670 CPU;2)MIC并发的物理线程可能由于公共缓存访问存在竞争而降低程序的扩展性。测试结果还显示了在多CPU多MIC平台上进一步移植完整的MPI并行有限元模拟软件的可行性。这项工作有助于推动与有限元网格相关的科学和工程高性能计算的研究。

关键词: 集成众核,卸载模式,并行,多线程,有限元

Abstract: A C++ 3-D finite-element mesh integration algorithm was implemented and profiled on a heterogeneous Intel CPU/MIC architecture.By virtually programing in the offload mode[1] with explicit copies,a sequence of key element-wise operations are fully parallelized utilizing massive concurrency of OpenMP threads on MIC devices.It is remarkably demonstrated that,in the sense of overall run-time efficiency,one fully employed 3115A MIC card outweighs two 8-core Intel XeonTM E5-2670 CPUs.However,possibly owing to cache contention among physical threads on individual MIC core,scalability is somehow below an ideal level.Current test unveils a good chance of transplanting a full finite-element analysis code onto a multi-CPU nodes/multi-MIC devices platform based on this single-process multi-thread building block presented here.

Key words: Many integrated core,Offload mode,Parallel,Multi-threads,Finite element

[1] 沈铂,张广勇,吴韶华,等.基于MIC平台的offload并行方法研究[J].计算机科学,2014,1(6A):477-480 Shen Bo,Zhang Guang-yong,Wu Shao-hua,et al.Research of Offload Parallel Method Based on MIC Platform[J].Computer Science,2014,41(6A):477-480
[2] 刘跃进,薛孟君.LDLT分块求解计算方法在有限元分析中的编程实现[J] .计算机科学,2014,41(11A):408-409 Liu Yue-jin,Xue Meng-jun.Program of Blocks Combining with LDLT Method for Finite Element Analysis[J].Computer Science,2014,41(11A):408-409
[3] 刘建华,王朝尉,任江勇,等.面向异构架构的混合精度有限元算法及其CUDA实现[J].计算机科学,2012,9(6):293-296 Liu Jian-hua,Wang Chao-wei,Ren Jiang-yong,et al.Mixed Precision Finite Element Algorithm on Heterogeneous Architecture[J].Computer Science,2012,9(6):293-296
[4] 王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,0(3):46-49 Wang Ying-rui,Ren Jiang-yong,Tian Rong.Efficient Sparse Matrix-vector Multiplication and CG Solver Optimization on GPU[J].Computer Science,2013,0(3):46-49
[5] Luo Li,Yang Chao,Zhao Yu-bo,et al.A scalable hybrid algorithm based on domain decomposition and algebraic multigrid for solving partial differential equations on a cluster of CPU/GPUs[J/OL].http://www.cs.colorado.edu/~cai/papers/lyzc2011.pdf
[6] Chan K,Zhang Ke-ke,Li Li-gang,et al.A new generation of convection-driven spherical dynamos using EBE finite element method[J].Physics of the Earth and Planetary Interiors,2007,163(1-4):251-265
[7] Kong Da-li,Zhang Ke-ke,Gerald S,et al.A three-dimensionalnumerical solution for the shape of a rotationally distorted polytrope of index unity[J].The Astrophysical Journal,2013,763(2):116-126
[8] Kong Da-li.Analytical and Numerical Studies of Several Fluid Mechanical Problems[D].University of Exeter,2012
[9] 王恩东,张清,沈铂,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012 Wang En-dong,Zhang Qing,Shen Bo,et al.High-Performance Computing on the Intel Xeon Phi-How to Fully Exploit MIC Architectures[M].Beijing:China Water and Power Press,2012
[10] Kong Da-li,Zhang Ke-ke,Gerald S.Shapes of two-layer models of rotating planets[J].Journal of Geophysical Research,2010,115(E12003):1-11

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!