Computer Science ›› 2020, Vol. 47 ›› Issue (8): 17-15.doi: 10.11896/jsjkx.200100124

;

Previous Articles     Next Articles

Communication Optimization Method of Heterogeneous Cluster Application Based on New Language Mechanism

CUI Xiang1, 2, LI Xiao-wen3, CHEN Yi-feng1   

  1. 1 School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2 College of Computer & Information Engineering, Henan University, Kaifeng, Henan 475000, China
    3 College of Computer and Information Technology, Henan Finance University, Zhengzhou 450000, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CUI Xiang, born in 1975, Ph.D, associa-te professor.His main research inte-rests include programming method of heterogeneous clusters and so on.
    LI Xiao-wen, born in 1984, lecturer.Her main research interests include parallel programming method and so on.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2017YFB0202001) and National Natural Science Foundation of China (61672208).

Abstract: Compared with traditional cluster, heterogeneous cluster has obvious advantage in cost performance.However, compared with the rapidly developing hardware technology, the current software technology is still backward and cannot adapt to the constantly updated heterogeneous hardware and the super-large scale parallel computing environment.Currently, the common solution is to directly use parallel programming tools for different hardware.The disadvantages of this combination solution are that the programming level is low and it is difficult to develop, modify and debug.This paper introduces a new language mechanism to describe the multi-dimensional rule structure, arrangement and communication mode of data and threads.A new method of software migration and optimization between heterogeneous systems based on new language mechanism is proposed.Taking the direct normal turbulence simulation as an example, the communication optimization and fast migration for different heterogeneous systems are realized.

Key words: Direct simulation method for turbulence, FFT, Heterogeneous cluster, Programming method

CLC Number: 

  • TP312
[1]NUMRICH R W.Co-array Fortran for parallel programming[C]∥Acm Sigplan Fortran Forum.1998.
[2]YELICK K A, SEMENZATO L, PIKE G, et al.Titanium:A High-performance Java Dialect[J].Concurrency Practice & Experience, 1998, 10(11/12/13):825-836.
[3]HILFINGER P N, BONACHEA D, GAY D, et al.Titanium language reference manual v1[J].The Url & Gt/02/17 IEEE, 2005, 20(4):102-103.
[4]ZHANG F, XIE F Y, CHEN S L, et al.Predictions of titanium alloy properties using thermodynamic modeling tools[J].Journal of Materials Engineering & Performance, 2005, 14(6):717-721.
[5]CHRISTOULIS D K, GUETTA S, GUIPONT V, et al.The influence of the substrate on the deposition of cold sprayed tita-nium:an experimental and numerical study[J].Journal of Thermal Spray Technology, 2011, 20(3):523-533.
[6]NIEPLOCHA, JAROSLAW, KRISHNAN, et al.Global Arrays Parallel Programming Toolkit[J].Encyclopedia of Parallel Computing, 2011:779-787.
[7]Consortium, UPC.UPC Language Specifications V1.2[J].lawrence Berkeley National Laboratory, 2005(7):146-159.
[8]ELGHAZAWI, TAREK, CARLSON, et al.UPC:Distributed Shar- ed Memory Programming[M].Wiley-Interscience, 2003.
[9]ELGHAZAWI T, CARLSON W, STERLING T, et al.UPC(Distributed Shared Memory Programming) || Performance Tuning and Optimization.https://www.onacademic.com/detail/journal_1000040810431210_4d1d.html.
[10]GOVINDARAJU N K, LLOYD B, DOTSENKO Y, et al.High performance discrete Fourier transforms on graphics processors[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2008).Austin, Texas, USA.ACM, 2008.
[11]MICIKEVICIUS P.3D finite difference computation on GPUs using CUDA[C]∥Workshop on General Purpose Processing on Graphics Processing Units.2009.
[12]FRANCHETTI F, PUSCHEL M, VORONENKO Y, et al.Discrete fourier transform on multicore[J].Signal Processing Ma-gazine IEEE, 2009, 26(6):90-102.
[13]VOLKOV V, KAZIAN B.FFT prototype [EB/OL].(2014-12-30)[2018-07-28].http://www.cs.berkeley.edu/ volkov/.
[14]DOTSENKO Y, BAGHSORKHI S S, LLOYD B, et al.Auto-tuning of fast fourier transform on graphics processors[J].Acm Sigplan Notices, 2011, 46(8):257.
[15]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science & Technology, 2013, 28(1):90-105.
[16]NUKADA A, MATSUOKA S.Auto-tuning 3-D FFT library for Cuda GPUs[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2009).Portland, Oregon, USA, ACM, 2009:14-20.
[17]NUKADA A, OGATA Y, ENDO T, et al.Bandwidth Intensive 3-D FFT Kernel for GPUs using CUDA[C]∥International Conference for High Performance Computing.2008.
[18]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science and Technology, 2013(1):90-105.
[19]MUNDO C D, FENG W C.Towards a performance-portableFFT library for heterogeneous computing[M].ACM, 2014.
[20]Fastest Fourier Transform in the West[EB/OL].(2014-12-30) [2018-07-28].http://www.fftw.org/.
[21]Intel Corp.Intel-mkl[EB/OL].(2014-12-30) [2018-07-28].http://software.Intel.com/en-us/Intel-mkl/.
[22]Parallel three-dimensional fast fourier transforms[EB/OL].(2014-12-30) [2018-07-28].http://www.sdsc.edu/us/resources/p3dfft/.
[23]PEKUROVSKY, DMITRY.P3DFFT:A Framework for Parallel Computations of Fourier Transforms in Three Dimensions[J].SIAM Journal on Scientific Computing, 2012, 34(4):C192-C209.
[24]AYALA O, WANG L P.Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition[J].Parallel Computing, 2013, 39(1):58-77.
[25]PIPPIG, MICHAEL.PFFT - An Extension of FFTW to Massively Parallel Architectures[J].Siam Journal on entific Computing, 2013, 35(3):C213-C236.
[26]ELEFTHERIOU M, FITCH B G, RAYSHUBSKIY A, et al.Scalable framework for 3D FFTs on the Blue Gene/L supercomputer:Implementation and early performance measurements[J].IBM Journal of Research and Development, 2005, 49(2/3):457-464.
[27]SABHARWAL Y, GARG S K, GARG R, et al.Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer[C]∥International Conference on High-Performance Computing.Berlin:Springer, 2008.
[28]2decomp&fft[EB/OL].(2014-12-30) [2018-07-28].http://www.2decomp.org/.
[29]KANDALLA K, SUBRAMONI H, TOMKO K, et al.High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters:a study with parallel 3D FFT[J].Computer Ence, 2011, 26(3/4):237-246.
[30]RAHIMIAN A, LASHUK I, VEERAPANENI S, et al.Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures[C]∥High Performance Computing, Networking, Storage & Analysis.IEEE, 2010.
[31]HAMADA T, NITADORI K, BENKRID K, et al.A novel mul-tiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - towards cost effective, high performance N-body simulation[J].Computer Science, 2009, 24(1/2):21-31.
[32]SONG S, HOLLINGSWORTH J K.Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap[C]∥Acm Sigplan Symposium on Principles & Practice of Pa-rallel Programming.ACM, 2014.
[33]BELL C, BONACHEA D, NISHTALA R, et al.Optimizingbandwidth limited problems using one-sided communication and overlap[C]∥IEEE International Parallel & Distributed Processing Symposium.IEEE, 2006.
[34]FANG B, DENG Y, MARTYNA G.Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer[J].Computer Physics Communications, 2007, 176(8):531-538.
[35]WU J, XIONG X, BERROCAL E, et al.Topology mapping of irregular parallel applications on torus-connected supercomputers[J].Journal of Supercomputing, 2017, 73(4):1691-1714.
[36]NUKADA A, SATO K, MATSUOKA S.Scalable multi-GPU3-D FFT for TSUBAME 2.0 supercomputer[C]∥International Conference for High Performance Computing, Networking, Sto-rage & Analysis.IEEE, 2012.
[37]CZECHOWSKI K, BATTAGLINO C, MCCLANAHAN C, et al.On the Communication Complexity of 3D FFTs and its Implications for Exascale[C]∥Proc.ACM Int’l.Conf.Supercomputing (ICS).ACM, 2012.
[38]HPCC[EB/OL].(2014-12-30) [2018-07-28].http://icl.cs.utk.edu/hpcc/index.html.
[39]ORSZAG S A, PATTERSON G S.Numerical Simulation ofThree-Dimensional Homogeneous Isotropic Turbulence[J].Physical Review Letters, 1972, 28(2):76-79.
[40]NICOLAI C, JACOB B, GUALTIERI P, et al.Inertial Particles in Homogeneous Shear Turbulence:Experiments and Direct Numerical Simulation[J].Flow Turbulence & Combustion, 2014, 92(1/2):65-82.
[41]CHEN S, SHAN X.High-resolution turbulent simulations using the Connection Machine-2[J].Computers in Physics, 1992, 176(8):531-538.
[42]YOKOKAWA M.16.4-Tflops direct numerical simulation ofturbulence by a Fourier spectral method on the Earth Simulator[J].Proc.ieee/acm Sc Conf.baltimore, 2002, 20(3):523-533.
[43]YEUNG P K, POPE S B, LAMORGESE A G, et al.Accelera-
tion and dissipation statistics of numerically simulated isotropic turbulence[J].Physics of Fluids, 2006, 18(6):065103.
[44]WATANABE T, GOTOH T.Inertial-range intermittency andaccuracy of direct numerical simulation for turbulence and passive scalar turbulence[J].Journal of Fluid Mechanics, 2007, 590:117-146.
[45]LI Y, PERLMAN E, WAN M, et al.A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence[J].Journal of Turbulence, 2008, 9(31):N31.
[46]KANEDA Y, ISHIHARA T, YOKOKAWA M, et al.Energydissipation rate and energy spectrum in high resolution direct numerical simulations of turbulence in a periodic box[J].Phy-sics of Fluids, 2003, 15(2).
[47]ISHIHARA T, KANEDA Y, YOKOKAWA M, et al.Small-scale statistics in high-resolution direct numerical simulation of turbulence:Reynolds number dependence of one-point velocity gradient statistics[J].Journal of Fluid Mechanics, 2007, 592:335-366.
[48]CHEN Y F, CUI X, MEI H.PARRAY:a unifying array representation for heterogeneous parallelism[J].Acm Sigplan Notices, 2012, 47(8):171-180.
[49]PARRAY Manual[EB/OL].(2014-12-30)[2018-07-28].http://code.google.com/p/parray-programming/.
[50]CHEN Y, CUI X, MEI H.Large-scale FFT on GPU clusters[C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba, Ibaraki, Japan, ACM, 2010:2-4.
[1] ZHAO Xin, MA Zai-chao, LIU Ying-bo, DING Yu-ting, WEI Mu-heng. Incremental FFT Based on Apache Storm and Its Application [J]. Computer Science, 2020, 47(11A): 504-507.
[2] GONG Tong-yan,ZHANG Guang-ting,JIA Hai-peng,YUAN Liang. High-performance Implementation Method for Even Basis of Cooley-Tukey FFT [J]. Computer Science, 2020, 47(1): 31-39.
[3] MENG Jia-hui, ZHAO Dan-feng, TIAN Hai. Simulation Research on Improved Decoding Algorithm Based on Non-binary LDPC for 5G [J]. Computer Science, 2018, 45(9): 141-145.
[4] XU Ai-ping, WU Di, XU Wu-ping and CHEN Jun. Research on Online Multi-task Load Balance Algorithm in Cloud Server Cluster [J]. Computer Science, 2016, 43(6): 50-54.
[5] YANG Lin, WU Jia-zhu, HU Xiao and TIAN Xi. Realization and Optimization of Cross-correlation Based on YHFT-QDSP [J]. Computer Science, 2015, 42(11): 53-55.
[6] . Efficient and Scalable Parallel Algorithm for Motif Finding on Heterogeneous Cluster Systems [J]. Computer Science, 2012, 39(3): 279-282.
[7] . Comparison and Analysis of Three Types of FFT Adaptive Libraries on Loongson 3A [J]. Computer Science, 2012, 39(12): 281-285.
[8] LI Yan,ZHANG Yun-quan, WANG Ke,ZHAO Mei chao. Implementation and Optimization of the FFT Using OpenCL on Heterogeneous Platforms [J]. Computer Science, 2011, 38(8): 284-286.
[9] WANG Yuan Zhen, GONG Wei Hua (College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074). [J]. Computer Science, 2006, 33(6): 106-108.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!