计算机科学 ›› 2015, Vol. 42 ›› Issue (1): 75-78.doi: 10.11896/j.issn.1002-137X.2015.01.017
• 2013年全国理论计算机科学学术年会 • 上一篇 下一篇
王一超,秦强,施忠伟,林新华
WANG Yi-chao, QIN Qiang, SEE Simon and LIN Xin-hua
摘要: OpenACC 是一套基于指导语句方式的并行编程语言标准。编程者可以通过在代码中添加符合该标准的指导语句,经OpenACC编译器的编译,将串行代码并行化地移植到加速器或者协处理器上,进而获得异构加速器所带来的加速效果。OpenACC与CUDA和OpenCL这类异构并行编程技术的不同之处在于,它的目的是使编程者在应用移植过程中不需要考虑加速器或协处理器的底层硬件架构,从而降低编程难度。同时它也具有仅需维护一套代码便可在不同硬件平台上运行的优良跨平台性。因此,OpenACC是一个值得研究的并行编程标准。如今的异构加速硬件设备呈现出多元化趋势。在2013年11月的Top500榜单上排名第一的“天河二号”使用了48000块构建在Intel Knights Corner架构之上的协处理器。与此同时,发布不久的NVIDIA公司最新的Kepler架构GPU产品由于多年来的GPU市场积累也迅速形成了可观的用户群体。对于并非追求性能极限的应用移植者而言,寻求应用性能和移植简易性之间的平衡是相当重要的议题。只需要编写一套代码便可运行在这两种硬件平台上的OpenACC正迎合了用户在移植简易性上的需求。解决了移植的简易性之后,同一个应用在不同硬件平台上的性能表现便成了用户最想了解的问题。通过实验和构建性能模型向读者展示使用OpenACC移植的应用在Intel Knights Corner和NVIDIA Kepler架构硬件上的性能可移植性。
[1] Kurkure N,Das A,Valmiki M,et al.Evaluation of RodiniaCodes on Intel Xeon Phi[C]∥4th International Conference on International Conference on Intelligent Systems,Modelling and Simulation,2013.Bangkok:IEEE,2013:415-419 [2] Aoki T.Application Performances on Many-core Processors Xeon Phi versus Kepler GPU.2013-12[2014-3].http://www.ocw.titech.ac.jp/index.php?module=General&action=DownLoad&file=20131226717065-477-1-45.pdf&type=cal&JWC=20131226717065 [3] OpenMP Architecture Review Board.OpenMP Application Program Interface.2013-7[2014-4].http://www.openmp.org/mp-documents/spec30.pdf [4] CAPS entreprise.OpenACC Reference Manual CAPSCompilers 3.3.2012-12[2014-4].http://www.caps-entreprise.com/products/caps-compilers/ [5] Khronos OpenCL Working Group.The OpenCL Specification.2008-12[2014-4].https://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf [6] OpenACC Group.The OpenACC Application Programming In-terface_v1.0.2011-11[2014-4].http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf [7] David A.Patterson John L.Hennessy and et al.Computer Architecture:A Quantitative Approach(第5版)[M].北京:机械工业出版社,2012:285-288 [8] Johnson N.EPCC OpenACC benchmark suite.2013-5[2014-4].https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openacc-benchmark-suite [9] Kaltofen E L.The "Seven Dwarfs" of Symbolic Computation[C]∥Numerical and Symbolic Scientific Computing,2012.Wien:Springer Vienna,2012:95-104 [10] McCalpin J D.Stream:Sustainable memory bandwidth in high performance computers.2013-2[2014-4].http://www.cs.virginia.edu/stream/ref.html [11] md rezaur rahman.The scalable heterogeneous computing ben-chmark suite (shoc) for intel xeon phi.2013-4[2014-4].https://software.intel.com/en-us/blogs/2013/03/20/the-scalable-heterogeneous-computing-benchmark-suite-shoc-for-intelr-xeon-phitm [12] NVIDIA.CUDA C Programming Guide.2014-2(5.5)[2014-4].http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz342yBEw4Q [13] Lin H,Scogland T,Zhang J,et al.OpenCL and the 13 Dwarfs:A Work in Progress[C]∥ICPE’12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering,2012.New York:ACM,2012:291-294 [14] Hoshinom T,Maruyama N,Takaki R.CUDA vs OpenACC:Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application[C]∥13th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing (CCGrid),2013.Delft:IEEE,2013:136-143 [15] Yang You,Fu Hao-huan,Huang Xiao-meng,et al.Accelerating the 3D Elastic Wave Forward Modeling on GPU and MIC[C]∥the 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum.Washington,2013:1088-1096 |
No related articles found! |
|