计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 32-36.doi: 10.11896/j.issn.1002-137X.2015.11.005

• 2014年全国高性能计算机学术年会 • 上一篇    下一篇

基于OpenCL的直方图生成算法优化方法研究

安小景,张云泉,贾海鹏   

  1. 中国科学院计算技术研究所体系结构国家重点实验室 北京100190,中国科学院计算技术研究所体系结构国家重点实验室 北京100190,中国科学院计算技术研究所体系结构国家重点实验室 北京100190
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61272136),国家自然科学基金创新群体(61221062)资助

Research on Histogram Generation Algorithm Optimization Based on OpenCL

AN Xiao-jing, ZHANG Yun-quan and JIA Hai-peng   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着GPU计算能力及可编程性的不断增强,采用GPU作为通用加速器对应用程序进行性能加速已经成为提升程序性能的主要模式。直方图生成算法是计算机视觉的常用算法,在图像处理、模式识别、图像搜索等领域都有着广泛的应用。随着图像处理规模的扩大和实时性要求的提高,通过GPU提升直方图生成算法性能的需求也越来越强。在GPU计算平台关键优化方法和技术的基础上,完成了直方图生成算法在GPU计算平台上的实现及优化。实验结果表明,通过使用直方图备份、访存优化、数据本地化及规约优化等优化方法,直方图生成算法在AMD HD7850 GPU计算平台上的性能相对于优化前的版本达到了1.8~13.3倍的提升;相对于CPU版本,在不同数据规模下也达到了7.2~210.8倍的性能提升。

关键词: GPGPU,OpenCL,数据本地化,直方图生成

Abstract: Application developers increasingly adopt GPUs as standard computing accelerators to improve application performance with their easier programmability and increasing computing power.The histogram generation algorithm is a common algorithm of computer vision,and is widely used in image processing,pattern recognition and image search.With the scale enlargement of image processing and the demand of real-time,improving the performance of histogram generation algorithm by GPU is in increasingly high demand.We introduced the realization and optimization on GPU of histogram to research the major optimization methodologies and technologies.Experimental results show that the applications of access optimization of histogram backup,memory optimization,data localization and mergence optimization,and some other optimization strategies,bring about a 1.8 ~ 13.3 times speedup for the algorithm on AMD HD 7850,than versions before optimization,and brings about a 7.2~ 210.8 times speedup than CPU versions.

Key words: GPGPU,OpenCL,Data localization,Histogram generation

[1] Jia Hai-peng.Research of Parallel Optimization Technicals onGPU Computing Platforms[D].Qingdao:Ocean University of China,2013
[2] Shame R,Kennedy R A.Efficient histogram algorithms forNVIDIA CUDA compatibledevice[C]∥ICSPCS2007.New York:IEEE,2007:418-422
[3] Di Peng,Hu Chang-jun,Li Jian-jiang.Efficient Method for Histogram Generetionon GPU[D].Beijing:University of Science and Technology,2011
[4] Gómez-Luna J,González-Linares J M,Benavides J I,et al.Anoptimized approach to histogram computation on GPU[J].Machine vision and applications,2013,24(5):899-908
[5] Zhang Yuan-quan,ZhangXian-yi,Jia Hai-peng,et al.Heterogeneous Computing with OpenCL[M].Tsinghua University press,2012
[6] AMD GRAPHICS CORES NEXT(GCN)Architecture Whitepaper [J/OL].https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
[7] Munshi A,Gaster B,Mattson T G,et al.OpenCL programming guide[M].Pearson Education,2011
[8] AMD R & D center in Shanghai.Cross platform multicore and manycore Programming Notes--int the way of OpenCL.http://down.51cto.com/data/964762
[9] AMD.AMD Accelerated Parallel Processing OpenCLTM Pro-graming Guide.http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf
[10] Zhang Jing.OpenCV2 Computer vision programming manual[M].Science Press Limited liability company,2013
[11] Jia Hai-peng,Zhang Yun-quan,Long Guo-ping,et al.GPURoofline:A Model for Guiding Performance Optimizations on GPUs[C]∥Proceeding of International European Conference on Para-llel and Distributed Computing.Rhodes Island,Greece,2012:920-932
[12] Jia H,Zhang Y,Wang W,et al.Accelerating viola-jones faccedetection algorithm on gpus[C]∥2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS).IEEE,2012:396-403
[13] NVIDAI.GPU-ACCELERATED APPLICATIONS.ht-tp://www.nvidia.com/object/media-and-entertainment.html
[14] NVIDIA.NVIDIA’s Next Generation CUDATM Compute Achitecture:Kepler GK110.http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!