计算机科学 ›› 2011, Vol. 38 ›› Issue (8): 284-286.

• 体系结构 • 上一篇    下一篇

异构平台上基于OpenCL的FFT实现与优化

李焱,张云泉,王可,赵美超   

  1. (中国科学院软件研究所并行软件与计算科学实验室 北京100190);(中国科学院软件研究所计算机科学国家重点实验室 北京100190);(中国科学院研究生院 北京100190)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家863计划项目(2006AA01A125, 2009AA01A129, 2009AA01 A134),国家重大项核高基项目(2009GX01036-001-002)资助。

Implementation and Optimization of the FFT Using OpenCL on Heterogeneous Platforms

LI Yan,ZHANG Yun-quan, WANG Ke,ZHAO Mei chao   

  • Online:2018-11-16 Published:2018-11-16

摘要: 快速傅立叶变换作为20世纪公认的最重要的基础算法之一,在大规模科学计算处理、数字信号处理、图形图像仿真等众多领域有着广泛的应用。OpenCL是首个面向异构系统通用的并行编程标准,为软件开发人员提供了统一的面向异构系统的并行编程环境。首先,在异构平台Cell和GPU上使用OpcnCL实现了基于2的幂一维FFT,并 对其进行了测试和分析,在Cell平台上当数据规模适中时它能够达到SDK性能的65%,当数据规模继续增大时,相对性能有所降低。此外,针对Nvidia Fermi平台,手工调优了小因子的FFT,使其性能接近于CUFFT的140%。

关键词: FFT, OpcnCL, Cell, CUDA, GPU,快速傅立叶变换

Abstract: Fourier methods have revolutionized fields of science and engineering, from astronomy to medical imaging,from seismology to spectroscopy. A fast Fourier transform(FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. OpenCL(Open Computing Language) is a new framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors, and it provides parallel computing using task-based and data-based parallelism. In this paper, we first implemented FFT with OpenCL, then tested and analyzed the performance of it on heterogeneous multi-core platforms like Cell, NVIDIA GPU. I}he performance we achieved is about 65% of Cell SDK, and 75% of CUDA CUFFT,and it needs to improve in the near further.Furthermore, we acquire unprecedented performance results that nearly 140% of CUFF’on Fermi GPU by exploiting hardware features when the size of FFT is small.

Key words: FFT, OpenCL, Cell, CUDA, GPU, Fast fourier transform

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!