Computer Science ›› 2020, Vol. 47 ›› Issue (8): 17-15.doi: 10.11896/jsjkx.200100124
Previous Articles Next Articles
CUI Xiang1, 2, LI Xiao-wen3, CHEN Yi-feng1
CLC Number:
[1]NUMRICH R W.Co-array Fortran for parallel programming[C]∥Acm Sigplan Fortran Forum.1998. [2]YELICK K A, SEMENZATO L, PIKE G, et al.Titanium:A High-performance Java Dialect[J].Concurrency Practice & Experience, 1998, 10(11/12/13):825-836. [3]HILFINGER P N, BONACHEA D, GAY D, et al.Titanium language reference manual v1[J].The Url & Gt/02/17 IEEE, 2005, 20(4):102-103. [4]ZHANG F, XIE F Y, CHEN S L, et al.Predictions of titanium alloy properties using thermodynamic modeling tools[J].Journal of Materials Engineering & Performance, 2005, 14(6):717-721. [5]CHRISTOULIS D K, GUETTA S, GUIPONT V, et al.The influence of the substrate on the deposition of cold sprayed tita-nium:an experimental and numerical study[J].Journal of Thermal Spray Technology, 2011, 20(3):523-533. [6]NIEPLOCHA, JAROSLAW, KRISHNAN, et al.Global Arrays Parallel Programming Toolkit[J].Encyclopedia of Parallel Computing, 2011:779-787. [7]Consortium, UPC.UPC Language Specifications V1.2[J].lawrence Berkeley National Laboratory, 2005(7):146-159. [8]ELGHAZAWI, TAREK, CARLSON, et al.UPC:Distributed Shar- ed Memory Programming[M].Wiley-Interscience, 2003. [9]ELGHAZAWI T, CARLSON W, STERLING T, et al.UPC(Distributed Shared Memory Programming) || Performance Tuning and Optimization.https://www.onacademic.com/detail/journal_1000040810431210_4d1d.html. [10]GOVINDARAJU N K, LLOYD B, DOTSENKO Y, et al.High performance discrete Fourier transforms on graphics processors[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2008).Austin, Texas, USA.ACM, 2008. [11]MICIKEVICIUS P.3D finite difference computation on GPUs using CUDA[C]∥Workshop on General Purpose Processing on Graphics Processing Units.2009. [12]FRANCHETTI F, PUSCHEL M, VORONENKO Y, et al.Discrete fourier transform on multicore[J].Signal Processing Ma-gazine IEEE, 2009, 26(6):90-102. [13]VOLKOV V, KAZIAN B.FFT prototype [EB/OL].(2014-12-30)[2018-07-28].http://www.cs.berkeley.edu/ volkov/. [14]DOTSENKO Y, BAGHSORKHI S S, LLOYD B, et al.Auto-tuning of fast fourier transform on graphics processors[J].Acm Sigplan Notices, 2011, 46(8):257. [15]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science & Technology, 2013, 28(1):90-105. [16]NUKADA A, MATSUOKA S.Auto-tuning 3-D FFT library for Cuda GPUs[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2009).Portland, Oregon, USA, ACM, 2009:14-20. [17]NUKADA A, OGATA Y, ENDO T, et al.Bandwidth Intensive 3-D FFT Kernel for GPUs using CUDA[C]∥International Conference for High Performance Computing.2008. [18]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science and Technology, 2013(1):90-105. [19]MUNDO C D, FENG W C.Towards a performance-portableFFT library for heterogeneous computing[M].ACM, 2014. [20]Fastest Fourier Transform in the West[EB/OL].(2014-12-30) [2018-07-28].http://www.fftw.org/. [21]Intel Corp.Intel-mkl[EB/OL].(2014-12-30) [2018-07-28].http://software.Intel.com/en-us/Intel-mkl/. [22]Parallel three-dimensional fast fourier transforms[EB/OL].(2014-12-30) [2018-07-28].http://www.sdsc.edu/us/resources/p3dfft/. [23]PEKUROVSKY, DMITRY.P3DFFT:A Framework for Parallel Computations of Fourier Transforms in Three Dimensions[J].SIAM Journal on Scientific Computing, 2012, 34(4):C192-C209. [24]AYALA O, WANG L P.Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition[J].Parallel Computing, 2013, 39(1):58-77. [25]PIPPIG, MICHAEL.PFFT - An Extension of FFTW to Massively Parallel Architectures[J].Siam Journal on entific Computing, 2013, 35(3):C213-C236. [26]ELEFTHERIOU M, FITCH B G, RAYSHUBSKIY A, et al.Scalable framework for 3D FFTs on the Blue Gene/L supercomputer:Implementation and early performance measurements[J].IBM Journal of Research and Development, 2005, 49(2/3):457-464. [27]SABHARWAL Y, GARG S K, GARG R, et al.Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer[C]∥International Conference on High-Performance Computing.Berlin:Springer, 2008. [28]2decomp&fft[EB/OL].(2014-12-30) [2018-07-28].http://www.2decomp.org/. [29]KANDALLA K, SUBRAMONI H, TOMKO K, et al.High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters:a study with parallel 3D FFT[J].Computer Ence, 2011, 26(3/4):237-246. [30]RAHIMIAN A, LASHUK I, VEERAPANENI S, et al.Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures[C]∥High Performance Computing, Networking, Storage & Analysis.IEEE, 2010. [31]HAMADA T, NITADORI K, BENKRID K, et al.A novel mul-tiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - towards cost effective, high performance N-body simulation[J].Computer Science, 2009, 24(1/2):21-31. [32]SONG S, HOLLINGSWORTH J K.Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap[C]∥Acm Sigplan Symposium on Principles & Practice of Pa-rallel Programming.ACM, 2014. [33]BELL C, BONACHEA D, NISHTALA R, et al.Optimizingbandwidth limited problems using one-sided communication and overlap[C]∥IEEE International Parallel & Distributed Processing Symposium.IEEE, 2006. [34]FANG B, DENG Y, MARTYNA G.Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer[J].Computer Physics Communications, 2007, 176(8):531-538. [35]WU J, XIONG X, BERROCAL E, et al.Topology mapping of irregular parallel applications on torus-connected supercomputers[J].Journal of Supercomputing, 2017, 73(4):1691-1714. [36]NUKADA A, SATO K, MATSUOKA S.Scalable multi-GPU3-D FFT for TSUBAME 2.0 supercomputer[C]∥International Conference for High Performance Computing, Networking, Sto-rage & Analysis.IEEE, 2012. [37]CZECHOWSKI K, BATTAGLINO C, MCCLANAHAN C, et al.On the Communication Complexity of 3D FFTs and its Implications for Exascale[C]∥Proc.ACM Int’l.Conf.Supercomputing (ICS).ACM, 2012. [38]HPCC[EB/OL].(2014-12-30) [2018-07-28].http://icl.cs.utk.edu/hpcc/index.html. [39]ORSZAG S A, PATTERSON G S.Numerical Simulation ofThree-Dimensional Homogeneous Isotropic Turbulence[J].Physical Review Letters, 1972, 28(2):76-79. [40]NICOLAI C, JACOB B, GUALTIERI P, et al.Inertial Particles in Homogeneous Shear Turbulence:Experiments and Direct Numerical Simulation[J].Flow Turbulence & Combustion, 2014, 92(1/2):65-82. [41]CHEN S, SHAN X.High-resolution turbulent simulations using the Connection Machine-2[J].Computers in Physics, 1992, 176(8):531-538. [42]YOKOKAWA M.16.4-Tflops direct numerical simulation ofturbulence by a Fourier spectral method on the Earth Simulator[J].Proc.ieee/acm Sc Conf.baltimore, 2002, 20(3):523-533. [43]YEUNG P K, POPE S B, LAMORGESE A G, et al.Accelera- tion and dissipation statistics of numerically simulated isotropic turbulence[J].Physics of Fluids, 2006, 18(6):065103. [44]WATANABE T, GOTOH T.Inertial-range intermittency andaccuracy of direct numerical simulation for turbulence and passive scalar turbulence[J].Journal of Fluid Mechanics, 2007, 590:117-146. [45]LI Y, PERLMAN E, WAN M, et al.A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence[J].Journal of Turbulence, 2008, 9(31):N31. [46]KANEDA Y, ISHIHARA T, YOKOKAWA M, et al.Energydissipation rate and energy spectrum in high resolution direct numerical simulations of turbulence in a periodic box[J].Phy-sics of Fluids, 2003, 15(2). [47]ISHIHARA T, KANEDA Y, YOKOKAWA M, et al.Small-scale statistics in high-resolution direct numerical simulation of turbulence:Reynolds number dependence of one-point velocity gradient statistics[J].Journal of Fluid Mechanics, 2007, 592:335-366. [48]CHEN Y F, CUI X, MEI H.PARRAY:a unifying array representation for heterogeneous parallelism[J].Acm Sigplan Notices, 2012, 47(8):171-180. [49]PARRAY Manual[EB/OL].(2014-12-30)[2018-07-28].http://code.google.com/p/parray-programming/. [50]CHEN Y, CUI X, MEI H.Large-scale FFT on GPU clusters[C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba, Ibaraki, Japan, ACM, 2010:2-4. |
[1] | ZHAO Xin, MA Zai-chao, LIU Ying-bo, DING Yu-ting, WEI Mu-heng. Incremental FFT Based on Apache Storm and Its Application [J]. Computer Science, 2020, 47(11A): 504-507. |
[2] | GONG Tong-yan,ZHANG Guang-ting,JIA Hai-peng,YUAN Liang. High-performance Implementation Method for Even Basis of Cooley-Tukey FFT [J]. Computer Science, 2020, 47(1): 31-39. |
[3] | MENG Jia-hui, ZHAO Dan-feng, TIAN Hai. Simulation Research on Improved Decoding Algorithm Based on Non-binary LDPC for 5G [J]. Computer Science, 2018, 45(9): 141-145. |
[4] | XU Ai-ping, WU Di, XU Wu-ping and CHEN Jun. Research on Online Multi-task Load Balance Algorithm in Cloud Server Cluster [J]. Computer Science, 2016, 43(6): 50-54. |
[5] | YANG Lin, WU Jia-zhu, HU Xiao and TIAN Xi. Realization and Optimization of Cross-correlation Based on YHFT-QDSP [J]. Computer Science, 2015, 42(11): 53-55. |
[6] | . Efficient and Scalable Parallel Algorithm for Motif Finding on Heterogeneous Cluster Systems [J]. Computer Science, 2012, 39(3): 279-282. |
[7] | . Comparison and Analysis of Three Types of FFT Adaptive Libraries on Loongson 3A [J]. Computer Science, 2012, 39(12): 281-285. |
[8] | LI Yan,ZHANG Yun-quan, WANG Ke,ZHAO Mei chao. Implementation and Optimization of the FFT Using OpenCL on Heterogeneous Platforms [J]. Computer Science, 2011, 38(8): 284-286. |
[9] | WANG Yuan Zhen, GONG Wei Hua (College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074). [J]. Computer Science, 2006, 33(6): 106-108. |
|