基于Intel MIC架构的3D有限差分算法优化

doi:10.11896/j.issn.1002-137X.2017.05.005

Abstract

Abstract: Finite difference algorithm is a numerical discrete method based on the partial differential equation which is widely applied in elastic wave propagation simulation.Because of the high computation density,long distance memory access pattern and low CPU utilization,it becomes the performance bottleneck in practical applications.Aiming at solving above problems,this paper deliberated the key points of 3D finite difference(3DFD) algorithm and then proposed the three-step progressive method to optimize 3DFD algorithm based on Intel MIC.Firstly,the basic optimization methods,such as branch elimination,loop unroll,and invariant extraction,were proposed to reduce calculation strength and remove the obstacle of SIMD(Single Instruction Multiple Data).Secondly,by leveraging the parallel optimization methods such as data dependence analysis,loop tiling,and intrinsic SIMD instructions,it took full advantage of the mechanism of MIC coprocessor with multithreads and long vector.At last,the heterogeneous cooperative optimization methods,such as data transformation minimization and load balancing,were applied to the platform of CPU+MIC(Many Integrated Cores) which parallelizes the algorithm execution in both CPU and MIC.Experimental results show that the optimized 3DFD algorithm gains 50~120 speedup compared with original algorithm.

Key words: Finite difference algorithm,MIC architecture,SIMD,Heterogeneous cooperation,Parallel computation

HAO Xin and GUO Shao-zhong. Optimization of 3D Finite Difference Algorithm on Intel MIC[J].Computer Science, 2017, 44(5): 26-32.

References

[1] Top500.org.Top 500 supercomputer site [EB/OL].[2015-12-09].http://www.top500.org.
[2] 王恩东,张清,沈铂,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012:15-21.
[3] ZHUKOV V T,KRANSON M M,NOVIKOVA ND,et al.Multigrid effectiveness on modern computing architectures[J].Programming and Computer Software,2015,2(1):14-22.
[4] REINDERS J,JEFFERS J.High Performance Parallelism Pearls,Multicore and Many-core Programming[M].Waltham,MA,USA:Morgan Kaufmann,2014:377-396.
[5] ITU L M,SUCIU C,MOLDOVEANU F,et al.GPU Optimized Computation of Stencil Based Algorithms[C]∥2011 RoEduNet International Conference 10th Edition:Networking in Education and Reasearch.2011:1-6.
[6] WANG G B,YANG X J,ZHANG Y,et al.Program Optimization of Stencil Based Application on the GPU-accelerated System[C]∥2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.2009:219-225.
[7] HERNANDEZ M,CEBRIAN J M,CECILIA J M,et al.Evaluating 3-D Stencil codes on Intel Xeon Phi:Limitations and Tradeoffs[C]∥XXVI Jornadas De Paralelismo.2015:441-444.
[8] KOMATITSCH D,ERLEBACHER G,GODDEKE C,et al.High-order finite element seismic wave propagation modeling with MPI on a large GPU cluster[J].Journal of Computational physics,2010,9(20):7692-7714.
[9] FANG J B,SIPS H,ZHANG L L,et al.Test- driving Intel Xeon Phi[C]∥5th ACM/SPEC International Conference on Perfor-mance Engineering.2014:137-148.
[10] LIANG X,WANG Z Y,LIU G H,et al.Numerical simulation of elastic wave based on the staggered grid finite difference method[C]∥2011 International Conference on Consumer Electronics,Communications and Networks.2011:3283-3286.
[11] WANG T,LI L G,ZHANGY,et al.Pseudo-spectral method for modeling elastic wave propagation in isotropic medium[C]∥12th International Conference on Signal Processing.2014:58-62.
[12] WOSKOBOYNIKOVA G M.Seismic wave propagation in fractured media[C]∥Third International Forum on Strategic Technologies.2008:307-309.
[13] LIU H W,LI B,LIU H,et al.The Algorithm of high order finitedifference pre-stack reverse time migration and GPU implementation [J].Chinese Journal of Geophysics,2010,3(7):1725-1733.(in Chinese) 刘红伟,李博,刘洪,等.地震叠前逆时偏移高阶有限差分算法及GPU实现[J].地球物理学报,2010,53(7):1725-1733.
[14] PERAZA J,TIWARI A,LAURENZANO M,et al.Understan-ding the performance of stencil computations on Intel's Xeon Phi[C]∥IEEE International Conference on Cluster Computing.2013:1-5.
[15] SHAN Y,WU J P,WANG Z H.Hierarchial parallel programming model and parallelization and optimization techniques based on SMP cluster[J].Appolication Research of Computers,2006,3(10):254-256.(in Chinese) 单莹,吴建平,王正华.基于SMP集群的多层次并行编程模型与并行优化技术[J].计算机应用研究,2006,3(10):254-256.
[16] SATO M.OpenMP:Parallel Programming API for shared memo-ry multiprocessors and on-chip multiprocessors[C]∥15th International Symposium on System Synthesis.2002:109-111.
[17] JEFFERS J,REINDERS J.Intel Xeon Phi coprocessor high-performance programming [M].陈建,周珊,李慧等,译.北京:人民邮电出版社,2014:92-96.
[18] STROUT M M,LUPORINI F,KRIEGER C D,et al.Generalizing Run-time Tiling with the Loop Chain Abstraction[C]∥IEEE 28th International Parallel and Distributed Processing Symposium.2014:1136-1145.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Optimization of 3D Finite Difference Algorithm on Intel MIC

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0