郝鑫,郭绍忠.基于Intel MIC架构的3D有限差分算法优化[J].计算机科学,2017,44(5):26-32
基于Intel MIC架构的3D有限差分算法优化
Optimization of 3D Finite Difference Algorithm on Intel MIC
投稿时间:2016-04-30  修订日期:2016-08-27
DOI:10.11896/j.issn.1002-137X.2017.05.005
中文关键词:  有限差分算法,MIC架构,向量化,异构协同,并行计算
英文关键词:Finite difference algorithm,MIC architecture,SIMD,Heterogeneous cooperation,Parallel computation
基金项目:
作者单位
郝鑫 数学工程与先进计算国家重点实验室 郑州450002 
郭绍忠 数学工程与先进计算国家重点实验室 郑州450002 
摘要点击次数: 1249
全文下载次数: 426
中文摘要:
      有限差分算法是一种基于偏微分方程的数值离散方法,被广泛应用于弹性波传播问题的数值模拟中。该算法访存跨度大、计算密度高、CPU利用率低,这在实际应用中成为了性能瓶颈。针对上述问题,在详析3D有限差分算法(3DFD)的基础上,基于Intel MIC架构,采用三步递进法对其进行优化:首先,通过分支消除、循环展开、不变量外提等基本优化法削减计算强度并为向量化扫除障碍;然后,通过分析数据依赖及循环分块,使用向量指令集改写核心算法等并行优化法,充分利用MIC协处理器多线程、长向量的机制;最后,在异构众核平台(CPU+MIC:Many Integra-ted Cores)下通过数据传输最小化、负载均衡等异构协同优化法实现CPU和MIC的并行计算。实验验证,与原有算法相比,优化后的算法在异构平台上获得了50~120倍的加速。
英文摘要:
      Finite difference algorithm is a numerical discrete method based on the partial differential equation which is widely applied in elastic wave propagation simulation.Because of the high computation density,long distance memory access pattern and low CPU utilization,it becomes the performance bottleneck in practical applications.Aiming at solving above problems,this paper deliberated the key points of 3D finite difference(3DFD) algorithm and then proposed the three-step progressive method to optimize 3DFD algorithm based on Intel MIC.Firstly,the basic optimization methods,such as branch elimination,loop unroll,and invariant extraction,were proposed to reduce calculation strength and remove the obstacle of SIMD(Single Instruction Multiple Data).Secondly,by leveraging the parallel optimization methods such as data dependence analysis,loop tiling,and intrinsic SIMD instructions,it took full advantage of the mechanism of MIC coprocessor with multithreads and long vector.At last,the heterogeneous cooperative optimization methods,such as data transformation minimization and load balancing,were applied to the platform of CPU+MIC(Many Integrated Cores) which parallelizes the algorithm execution in both CPU and MIC.Experimental results show that the optimized 3DFD algorithm gains 50~120 speedup compared with original algorithm.
查看全文  查看/发表评论  下载PDF阅读器