Computer Science ›› 2013, Vol. 40 ›› Issue (2): 1-7.
Next Articles
Online:
Published:
Abstract: Image integral algorithm is widely used in fast feature detection, and improving the performance of this algorithm through GPU has an important practical significance. However, due to the complexity of the GPU hardware architecture and the architectural differences between different GPUs, how to complete the optimization of this algorithm and achieve performance portability on different GPU platforms is still a hard work. hhis paper analysed the differences between theunderlying hardware architectures of GPU, and studied the effects of performance on different GPU platforms using different optimization methods from the utilization of the off-chip memory bandwidth, the utilization of the computing resource, data locality and other aspects. And based on this, we implemented the image integral algorithm based on OpcnCL.Experimental results show that optimized algorithm gets 11.26 and 12.38 times speedup on AMD and NVIDIA GPU respectively, and the performance of the optimized kernel improves 55.O1% and 65.17% than the CUDA version in NVIDIA NPP library, which verifies the effectiveness and cross platform ability of optimization methods.
Key words: OpenCL, GPU, Image integral algorithm, Across platform
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2013/V40/I2/1
Cited