计算机科学 ›› 2012, Vol. 39 ›› Issue (3): 260-264.

• 体系结构 • 上一篇    下一篇

基于OpenCL的图像模糊化算法优化研究

张樱,张云泉,龙国平   

  1. (中国科学院软件研究所并行软件与计算科学实验室 北京 100190); (中国科学院研究生院 北京 100190)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Research on Image Blur Algorithm Optimization Using OpenCL

  • Online:2018-11-16 Published:2018-11-16

摘要: 现代CPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过 程,相应的编程模型(CUDA, OpenCL)都定义了特定程序设计接口(CUDA的纹理内存,C)penCL的图像对象)以便图 像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台CPU的优化为例,探讨了〔)pcnCI、的图像对 象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方 法的优劣。实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性 能改善,其余情况采用全局内存加局部存储都能获得较好性能。优化后的算法性能相对于精心实现的CPU版加速比 为200-}-1000;相对于NVIDIA NPP库相应函数的性能加速比为1. 3~。

关键词: AMD GPU, Blur, OpcnCI,图像对象

Abstract: Modern GPUs generally provide specific hardware(such as texture, grating components and various on-chip cache) to accelerate the 2D image processing and displaying process. Programming model defines specific APIs to facili- fate image applications taking advantage of image-related GPU hardware, such as CUDA' s texture memory and OpenCI_'s Images Object. Taking the optimization of image blur algorithm on AMD GPU as an example, the paper made a deep insight into the using of OpenCL's image object on image applications,especially its advantage and disad- vantage compared to the more general optimization method based on global memory and the on-chip local memory. The experimental results demonstrate that the image object can provide better performance only when the processing image is four-channel and the amount of data to be cached is small. For other cases, optimizing with global memory and local memory can get better performance. After optimization,the speedup reaches 200x to 1000x in comparison with the well optimized CPU code,and the speedup over NV)DIA NPP version is upto 1. 3x to 5x.

Key words: AMD GPU, Blur, OpcnCI,Images object

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!