Computer Science ›› 2020, Vol. 47 ›› Issue (1): 31-39.doi: 10.11896/jsjkx.190900179

• Computer Architecture • Previous Articles     Next Articles

High-performance Implementation Method for Even Basis of Cooley-Tukey FFT

GONG Tong-yan1,2,ZHANG Guang-ting2,JIA Hai-peng2,YUAN Liang2   

  1. (School of Information,Guizhou University of Finance and Economics,Guiyang 550025,China)1;
    (State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)2
  • Received:2019-08-21 Published:2020-01-19
  • About author:GONG Tong-yan,born in 1992,postgraduate,not member of China Computer Federation (CCF).Her main research interests include parallel processing and high-performance computing;ZHANG Guang-ting,born in 1987,postgraduate,is member of China Computer Federation (CCF).Her main research interests include parallel algorithms and big data.
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2018YFC0809306),Young Scientists Fund of the National Natural Science Foundation of China (61602442) and Key Program of the National Natural Science Foundation of China (61432018).

Abstract: Fast Fourier transform (FFT) is one of the most important basic algorithms,which is widely used in scientific calculation,signal processing,image processing and other fields.With the further improvement of real-time requirements in these application fields,fast Fourier transform algorithms are facing higher and higher performance requirements.In the existing FFT algorithm library,the solution speed and calculation accuracy of FFT algorithm are limited to a certain extent,and few researchers put forward corresponding optimization strategies and conducted in-depth research on the implementation of cooley-tukey fast Fourier transform based on even Numbers.Based on this,this paper put forward a set of for even basis of optimization strategy and methodfor Colley-Turkey fast Fourier transform.Firstly,a friendly butterfly network supporting SIMD mixed is constructed.Secondly,according to the even base rotation factor characteristics,the complexity of the butterfly calculation is reduced to a maximum degree.Thirdly,through the SIMD assembly optimization,assembly instruction rearrangement and selection,register allocation strategy and high performance matrix transpose algorithm method,the application is optimized .Finally a high performance FFT algorithm library is achieved.Currently,the most popular and widely used FFT are FFTW and Intel MKL.Experimental results show that on X86 computing platform,the performance of FFT library based on cooley-tukey FFT is better than MKL and FFTW.The high performance algorithm is put forward by the new optimization method and implementation technology system,which can be generalized to other except the even base based on the realization and optimization of a certain basis for further research and development work,to break through the FFT algorithm performance bottlenecks in the hardware platform,to achieve a high performance FFT algorithms library for a specific platform.

Key words: Butterfly calculation optimization, Butterfly network optimization, Even basis, Fast Fourier transform algorithm, High performance FFT library, SIMD assembly optimization

CLC Number: 

  • TP311.52
[1]COOLEY J W,TUKEY J W.An algorithm for the machine calculation of complex Fourier series[J].Mathematics of Computation,1965,19(90):297-301.
[2]COCHRAN W T,COOLEY J W,FAVIN D L,et al.What is the fast Fourier transform?[J].Proceedings of the IEEE,1967,55(10):1664-1674.
[3]DUHAMEL P,VETTERLI M.Fast Fourier transforms:a tutorial review and a state of the art[J].Signal Processing,1990,19(4):259-299.
[4]STRANG,GILBERT.“Wavelets”[J].American Scientist, 1994,2(3):250-255.
[5]KENT R D,READ C.Acoustic Analysis of Speech[M].Singular Publishing Group,2002.
[6]DONGARRA J,SULLIVAN F.Guest editors' introduction:The top 10 algorithms[J].Computing in Science & Engineering,2000,2(1):22-23.
[7]FRIGO M,JOHNSON S G.FFTW:An adaptive software architecture for the FFT[C]∥Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,1998,3:1381-1384.
[8]Intel® Math Kernel Library Release Notes and New Features .(2017-09-10) .
[9]Developer Reference for Intel® Math Kernel Library - C .(2019-09-10) .
[10]FRIGO M,JOHNSON S G.The Design and Implementation of FFTW3∥Proceedings of the IEEE.2005,216-231.
[11]JOHNSON S G,FRIGO M.Implementing FFTs in practice.
[12]MOHAMED K,ALI F H H M,ARIFFIN S,et al.An Improved AES S-box Based on Fibonacci Numbers and Prime Factor.IJ Network Security,2018,20(6):1206-1214.
[13]JOHNSON S G,FRIGO M.A modified split-radix FFT with fewer arithmetic operations.Signal Processing,2007,55(1):111-119.
[14]DUHAMEL P,HOLLMANN H.Split radix'FFT algorithm.Electronics letters,1984,20(1):14-16.
[15]BADAR S,DANDEKAR D R.High speed FFT processor design using radix? 4 pipelined architecture[C]∥2015 International Conference on Industrial Instrumentation and Control (ICIC).IEEE,2015:1050-1055.
[16]NUGTEREN C.Clblast:A tuned opencl BLAS library[C]∥Proceedings of the International Workshop on OpenCL.ACM,2018:5.
[17]NAKATA M.Basics and Practice of Linear Algebra Calculation Library BLAS and LAPACK[M]∥The Art of High Performance Computing for Computational Science. Singapore:Springer,2019:83-112.
[18]BUJANOVIC' Z,DRMACˇ Z.New robust ScaLAPACK routine for computing the QR factorization with column pivoting.arXiv:1910.05623,2019.
[19]MULTIPLICATION M,SICARD-RAMÍREZ A.Intel R G Math Kernel Library.Universidad EAFIT,2018,2018:3-21.
[20]INTEL R.Math kernel library[J/OL].
[21]FOG A. Intel’s “cripple AMD” function.(2009-12-30).
[22]YANG M R.MKL has bad performances on an AMD cpu[EB/QL].(2019-02-02). ngru-yang/programming/mkl-has-bad-performance-on-an-amd-cpu.
[23]The NVIDIA CUDA Fast Fourier Transform library [EB/OL].(2019-08-14) .
[24]ALGLIB User Guide online [EB/OL].(2019-02-21) .
[25]EISL J,GRIMMER M,SIMON D,et al.Trace-based Register Allocation in a JIT Compiler[C]∥Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform:Virtual Machines,Languages,and Tools.ACM,2016:14.
[26]EISL J,LEOPOLDSEDER D,MÖSSENBÖCK H.Parallel trace register allocation[C]∥Conference on Managed Languages & Runtimes (ManLang’18).2018.
[1] YU Dun-hui, CHENG Tao, YUAN Xu. Software Crowdsourcing Task Recommendation Algorithm Based on Learning to Rank [J]. Computer Science, 2020, 47(12): 106-113.
[2] WANG Dong, SHANG Hong-hui, ZHANG Yun-quan, LI Kun, HE Xin-fu, JIA Li-xia. Application of Atomic Dynamics Monte Carlo Program MISA-KMC in Study of Irradiation Damage of Reactor Pressure Vessel Steel [J]. Computer Science, 2020, 47(4): 30-35.
[3] QU Guang-qiang, SUN Bin. Study on Trustworthy Backtracking Mechanism of Experimental Teaching Fund Based on Blockchain [J]. Computer Science, 2019, 46(11A): 553-556.
[4] HOU Yu-chen, WU Wei. Design and Implementation of Crowdsourcing System for Still Image Activity Annotation [J]. Computer Science, 2019, 46(11A): 580-583.
[5] LV Xiao-hu, HAN Xiao-dong, GONG Jiang-lei, WANG Zhi-jie, LIU Xiao-kun. Systemic Muti-factors Based Verification Method for Safety-critical Software [J]. Computer Science, 2019, 46(9): 156-161.
[6] YAO Qing, ZHENG Kai, LIU Yao, WANG Su, SUN Jun, XU Meng-xuan. Implementation and Optimization of SOM Algorithm on Sunway Many-core Processors [J]. Computer Science, 2018, 45(11A): 591-596.
[7] LIU Zheng-hao, ZHANG Guang-quan. Distributed Knowledge Framework of IOT Software Based on WSAN and Its Implementation Strategy [J]. Computer Science, 2019, 46(1): 148-154.
[8] HAO Jun-sheng,LI Bing-feng,CHEN Xi,GAO Wen-juan. Design and Implementation of Network Subscription System Based on Android Platform [J]. Computer Science, 2018, 45(6A): 591-594.
Full text



No Suggested Reading articles found!