Computer Science ›› 2015, Vol. 42 ›› Issue (1): 75-78.doi: 10.11896/j.issn.1002-137X.2015.01.017

Previous Articles     Next Articles

Performance Portability Evaluation for OpenACC on Intel Knights Corner and NVIDIA Kepler

WANG Yi-chao, QIN Qiang, SEE Simon and LIN Xin-hua   

  • Online:2018-11-14 Published:2018-11-14

Abstract: OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives.Since OpenACC can generate OpenCL and CUDA code,meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler,it is attractive to using OpenACC on hardwares with different underlying micro-architectures.This paper studied how realistic it is to use a single OpenACC source code for a set of hardwares with different underlying micro-architectures.Intel Knight Corner and Nvidia Kepler products are the targets in the exper- iment,since they have the latest architectures and similar peak performance.Meanwhile CAPS OpenACC compiler is used to compile EPCC OpenACC benchmark suite,Stream and MaxFlops of SHOC benchmarks to access the performance.To study the performance portability,roofline model and relative performance model were built by the data of experiments.It shows that at most 82% performance compared with peak performance on Kepler and Knight Corner is achieved by specific benchmarks,but as the rise of arithmetic intensity,the average performance is approximately 10%.And there is a big performance gap between Intel Knight Corner and Nvidia Kepler on several benchmarks.This study confirmed that performance portability of OpenACC is related to the arithmetic intensity and a big performance gap still exsits in specific benchmarks between different hardware platforms.

Key words: OpenACC,Performance portabilty,High performance computing

[1] Kurkure N,Das A,Valmiki M,et al.Evaluation of RodiniaCodes on Intel Xeon Phi[C]∥4th International Conference on International Conference on Intelligent Systems,Modelling and Simulation,2013.Bangkok:IEEE,2013:415-419
[2] Aoki T.Application Performances on Many-core Processors Xeon Phi versus Kepler GPU.2013-12[2014-3].http://www.ocw.titech.ac.jp/index.php?module=General&action=DownLoad&file=20131226717065-477-1-45.pdf&type=cal&JWC=20131226717065
[3] OpenMP Architecture Review Board.OpenMP Application Program Interface.2013-7[2014-4].http://www.openmp.org/mp-documents/spec30.pdf
[4] CAPS entreprise.OpenACC Reference Manual CAPSCompilers 3.3.2012-12[2014-4].http://www.caps-entreprise.com/products/caps-compilers/
[5] Khronos OpenCL Working Group.The OpenCL Specification.2008-12[2014-4].https://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
[6] OpenACC Group.The OpenACC Application Programming In-terface_v1.0.2011-11[2014-4].http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf
[7] David A.Patterson John L.Hennessy and et al.Computer Architecture:A Quantitative Approach(第5版)[M].北京:机械工业出版社,2012:285-288
[8] Johnson N.EPCC OpenACC benchmark suite.2013-5[2014-4].https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openacc-benchmark-suite
[9] Kaltofen E L.The "Seven Dwarfs" of Symbolic Computation[C]∥Numerical and Symbolic Scientific Computing,2012.Wien:Springer Vienna,2012:95-104
[10] McCalpin J D.Stream:Sustainable memory bandwidth in high performance computers.2013-2[2014-4].http://www.cs.virginia.edu/stream/ref.html
[11] md rezaur rahman.The scalable heterogeneous computing ben-chmark suite (shoc) for intel xeon phi.2013-4[2014-4].https://software.intel.com/en-us/blogs/2013/03/20/the-scalable-heterogeneous-computing-benchmark-suite-shoc-for-intelr-xeon-phitm
[12] NVIDIA.CUDA C Programming Guide.2014-2(5.5)[2014-4].http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz342yBEw4Q
[13] Lin H,Scogland T,Zhang J,et al.OpenCL and the 13 Dwarfs:A Work in Progress[C]∥ICPE’12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering,2012.New York:ACM,2012:291-294
[14] Hoshinom T,Maruyama N,Takaki R.CUDA vs OpenACC:Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application[C]∥13th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing (CCGrid),2013.Delft:IEEE,2013:136-143
[15] Yang You,Fu Hao-huan,Huang Xiao-meng,et al.Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC[C]∥the 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum.Washington,2013:1088-1096

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!