Computer Science ›› 2020, Vol. 47 ›› Issue (1): 31-39.doi: 10.11896/jsjkx.190900179

• Computer Architecture • Previous Articles     Next Articles

High-performance Implementation Method for Even Basis of Cooley-Tukey FFT

GONG Tong-yan1,2,ZHANG Guang-ting2,JIA Hai-peng2,YUAN Liang2   

  1. (School of Information,Guizhou University of Finance and Economics,Guiyang 550025,China)1;
    (State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2019-08-21 Published:2020-01-19
  • About author:GONG Tong-yan,born in 1992,postgraduate,not member of China Computer Federation (CCF).Her main research interests include parallel processing and high-performance computing;ZHANG Guang-ting,born in 1987,postgraduate,is member of China Computer Federation (CCF).Her main research interests include parallel algorithms and big data.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2018YFC0809306),Young Scientists Fund of the National Natural Science Foundation of China (61602442) and Key Program of the National Natural Science Foundation of China (61432018).

Abstract: Fast Fourier transform (FFT) is one of the most important basic algorithms,which is widely used in scientific calculation,signal processing,image processing and other fields.With the further improvement of real-time requirements in these application fields,fast Fourier transform algorithms are facing higher and higher performance requirements.In the existing FFT algorithm library,the solution speed and calculation accuracy of FFT algorithm are limited to a certain extent,and few researchers put forward corresponding optimization strategies and conducted in-depth research on the implementation of cooley-tukey fast Fourier transform based on even Numbers.Based on this,this paper put forward a set of for even basis of optimization strategy and methodfor Colley-Turkey fast Fourier transform.Firstly,a friendly butterfly network supporting SIMD mixed is constructed.Secondly,according to the even base rotation factor characteristics,the complexity of the butterfly calculation is reduced to a maximum degree.Thirdly,through the SIMD assembly optimization,assembly instruction rearrangement and selection,register allocation strategy and high performance matrix transpose algorithm method,the application is optimized .Finally a high performance FFT algorithm library is achieved.Currently,the most popular and widely used FFT are FFTW and Intel MKL.Experimental results show that on X86 computing platform,the performance of FFT library based on cooley-tukey FFT is better than MKL and FFTW.The high performance algorithm is put forward by the new optimization method and implementation technology system,which can be generalized to other except the even base based on the realization and optimization of a certain basis for further research and development work,to break through the FFT algorithm performance bottlenecks in the hardware platform,to achieve a high performance FFT algorithms library for a specific platform.

Key words: Fast Fourier transform algorithm, Even basis, Butterfly calculation optimization, Butterfly network optimization, SIMD assembly optimization, High performance FFT library

CLC Number: 

  • TP311.52
[1]COOLEY J W,TUKEY J W.An algorithm for the machine calculation of complex Fourier series[J].Mathematics of Computation,1965,19(90):297-301.
[2]COCHRAN W T,COOLEY J W,FAVIN D L,et al.What is the fast Fourier transform?[J].Proceedings of the IEEE,1967,55(10):1664-1674.
[3]DUHAMEL P,VETTERLI M.Fast Fourier transforms:a tutorial review and a state of the art[J].Signal Processing,1990,19(4):259-299.
[4]STRANG,GILBERT.“Wavelets”[J].American Scientist, 1994,2(3):250-255.
[5]KENT R D,READ C.Acoustic Analysis of Speech[M].Singular Publishing Group,2002.
[6]DONGARRA J,SULLIVAN F.Guest editors' introduction:The top 10 algorithms[J].Computing in Science & Engineering,2000,2(1):22-23.
[7]FRIGO M,JOHNSON S G.FFTW:An adaptive software architecture for the FFT[C]∥Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998,3:1381-1384.
[8]Intel® Math Kernel Library Release Notes and New Features .(2017-09-10) .
[9]Developer Reference for Intel® Math Kernel Library - C .(2019-09-10) .
[10]FRIGO M,JOHNSON S G.The Design and Implementation of FFTW3∥Proceedings of the IEEE.2005,216-231.
[11]JOHNSON S G,FRIGO M.Implementing FFTs in practice.
[12]MOHAMED K,ALI F H H M,ARIFFIN S,et al.An Improved AES S-box Based on Fibonacci Numbers and Prime Factor.IJ Network Security,2018,20(6):1206-1214.
[13]JOHNSON S G,FRIGO M.A modified split-radix FFT with fewer arithmetic operations.Signal Processing,2007,55(1):111-119.
[14]DUHAMEL P,HOLLMANN H.Split radix'FFT algorithm.Electronics letters,1984,20(1):14-16.
[15]BADAR S,DANDEKAR D R.High speed FFT processor design using radix? 4 pipelined architecture[C]∥2015 International Conference on Industrial Instrumentation and Control (ICIC).IEEE,2015:1050-1055.
[16]NUGTEREN C.Clblast:A tuned opencl BLAS library[C]∥Proceedings of the International Workshop on OpenCL.ACM,2018:5.
[17]NAKATA M.Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK[M]∥The Art of High Performance Computing for Computational Science. Singapore:Springer,2019:83-112.
[18]BUJANOVIC' Z,DRMACˇ Z.New robust ScaLAPACK routine for computing the QR factorization with column pivoting.arXiv:1910.05623,2019.
[19]MULTIPLICATION M,SICARD-RAMÍREZ A.Intel R G Math Kernel Library.Universidad EAFIT,2018,2018:3-21.
[20]INTEL R.Math kernel library[J/OL].
[21]FOG A. Intel’s “cripple AMD” function.(2009-12-30).
[22]YANG M R.MKL has bad performances on an AMD cpu[EB/QL].(2019-02-02). ngru-yang/programming/mkl-has-bad-performance-on-an-amd-cpu.
[23]The NVIDIA CUDA Fast Fourier Transform library [EB/OL].(2019-08-14) .
[24]ALGLIB User Guide online [EB/OL].(2019-02-21) .
[25]EISL J,GRIMMER M,SIMON D,et al.Trace-based Register Allocation in a JIT Compiler[C]∥Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform:Virtual Machines,Languages,and Tools.ACM,2016:14.
[26]EISL J,LEOPOLDSEDER D,MÖSSENBÖCK H.Parallel trace register allocation[C]∥Conference on Managed Languages & Runtimes (ManLang’18).2018.
No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[3] WANG Zhen-wu, LV Xiao-hua and HAN Xiao-hui. Survey of Terrain LOD Technology Based on Quadtree Segmentation[J]. Computer Science, 2018, 45(4): 34 -45 .
[4] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .
[5] PANG Bo, JIN Qian-kun, HENIGULI·Wu Mai Er and QI Xing-bin. Routing Scheme Based on Network Slicing and ILP Model in SDN[J]. Computer Science, 2018, 45(4): 143 -147 .
[6] CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing. History and Development Tendency of Data Quality[J]. Computer Science, 2018, 45(4): 1 -10 .
[7] LI Hui, ZHOU Lin and XIN Wen-bo. Optimization of Networked Air-defense Operational Formation Structure Based on Bilevel Programming[J]. Computer Science, 2018, 45(4): 266 -272, 300 .
[8] HU Qing-cheng, ZHANG Yong, XING Chun-xiao. K-clique Heuristic Algorithm for Influence Maximization in Social Network[J]. Computer Science, 2018, 45(6): 32 -35 .
[9] ZHANG Pan-pan, PENG Chang-gen, HAO Chen-yan. Privacy Protection Model and Privacy Metric Methods Based on Privacy Preference[J]. Computer Science, 2018, 45(6): 130 -134 .
[10] JI Hai-juan, ZHOU Cong-hua, LIU Zhi-feng. Symbolic Aggregate Approximation Method of Time Series Based on Beginning and End Distance[J]. Computer Science, 2018, 45(6): 216 -221 .