计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 17-15.doi: 10.11896/jsjkx.200100124

• 高性能计算 • 上一篇    下一篇

基于新型语言机制的异构集群应用通信优化方法

崔翔1, 2, 李晓雯3, 陈一峯1   

  1. 1 北京大学信息科学技术学院 北京 100871
    2 河南大学计算机与信息工程学院 河南 开封 475000
    3 河南财政金融学院计算机与信息技术学院 郑州450000
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 李晓雯(1206375360@pku.edu.cn)
  • 作者简介:cui@pku.edu.cn
  • 基金资助:
    国家重点研发计划(2017YFB0202001);国家自然科学基金(61672208)

Communication Optimization Method of Heterogeneous Cluster Application Based on New Language Mechanism

CUI Xiang1, 2, LI Xiao-wen3, CHEN Yi-feng1   

  1. 1 School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2 College of Computer & Information Engineering, Henan University, Kaifeng, Henan 475000, China
    3 College of Computer and Information Technology, Henan Finance University, Zhengzhou 450000, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CUI Xiang, born in 1975, Ph.D, associa-te professor.His main research inte-rests include programming method of heterogeneous clusters and so on.
    LI Xiao-wen, born in 1984, lecturer.Her main research interests include parallel programming method and so on.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2017YFB0202001) and National Natural Science Foundation of China (61672208).

摘要: 与传统集群相比, 异构集群具有较高的性价比。但相比迅速发展的硬件技术, 当前软件技术仍然落后, 不能适应不断更新的异构硬件和超大规模的并行计算环境。当前普遍采用的解决方案是直接使用针对不同硬件的并行编程工具, 这一组合方案的缺点是编程层次低, 开发、修改与调试困难。文中介绍了新型语言机制用于描述数据与线程的多维规则结构、排列方式以及通讯模式, 提出了基于新型语言机制的不同类型异构系统之间的软件移植和优化方法。以直接法湍流模拟为例, 实现了应用在不同异构系统上的通信优化和快速移植。

关键词: 异构集群, 程序设计, 直接法湍流模拟, 快速傅里叶变换

Abstract: Compared with traditional cluster, heterogeneous cluster has obvious advantage in cost performance.However, compared with the rapidly developing hardware technology, the current software technology is still backward and cannot adapt to the constantly updated heterogeneous hardware and the super-large scale parallel computing environment.Currently, the common solution is to directly use parallel programming tools for different hardware.The disadvantages of this combination solution are that the programming level is low and it is difficult to develop, modify and debug.This paper introduces a new language mechanism to describe the multi-dimensional rule structure, arrangement and communication mode of data and threads.A new method of software migration and optimization between heterogeneous systems based on new language mechanism is proposed.Taking the direct normal turbulence simulation as an example, the communication optimization and fast migration for different heterogeneous systems are realized.

Key words: Heterogeneous cluster, Programming method, Direct simulation method for turbulence, FFT

中图分类号: 

  • TP312
[1] NUMRICH R W.Co-array Fortran for parallel programming[C]∥Acm Sigplan Fortran Forum.1998.
[2] YELICK K A, SEMENZATO L, PIKE G, et al.Titanium:A High-performance Java Dialect[J].Concurrency Practice & Experience, 1998, 10(11/12/13):825-836.
[3] HILFINGER P N, BONACHEA D, GAY D, et al.Titanium language reference manual v1[J].The Url & Gt/02/17 IEEE, 2005, 20(4):102-103.
[4] ZHANG F, XIE F Y, CHEN S L, et al.Predictions of titanium alloy properties using thermodynamic modeling tools[J].Journal of Materials Engineering & Performance, 2005, 14(6):717-721.
[5] CHRISTOULIS D K, GUETTA S, GUIPONT V, et al.The influence of the substrate on the deposition of cold sprayed tita-nium:an experimental and numerical study[J].Journal of Thermal Spray Technology, 2011, 20(3):523-533.
[6] NIEPLOCHA, JAROSLAW, KRISHNAN, et al.Global Arrays Parallel Programming Toolkit[J].Encyclopedia of Parallel Computing, 2011:779-787.
[7] Consortium, UPC.UPC Language Specifications V1.2[J].lawrence Berkeley National Laboratory, 2005(7):146-159.
[8] ELGHAZAWI, TAREK, CARLSON, et al.UPC:Distributed Shar- ed Memory Programming[M].Wiley-Interscience, 2003.
[9] ELGHAZAWI T, CARLSON W, STERLING T, et al.UPC(Distributed Shared Memory Programming) || Performance Tuning and Optimization.https://www.onacademic.com/detail/journal_1000040810431210_4d1d.html.
[10] GOVINDARAJU N K, LLOYD B, DOTSENKO Y, et al.High performance discrete Fourier transforms on graphics processors[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2008).Austin, Texas, USA.ACM, 2008.
[11] MICIKEVICIUS P.3D finite difference computation on GPUs using CUDA[C]∥Workshop on General Purpose Processing on Graphics Processing Units.2009.
[12] FRANCHETTI F, PUSCHEL M, VORONENKO Y, et al.Discrete fourier transform on multicore[J].Signal Processing Ma-gazine IEEE, 2009, 26(6):90-102.
[13] VOLKOV V, KAZIAN B.FFT prototype [EB/OL].(2014-12-30)[2018-07-28].http://www.cs.berkeley.edu/ volkov/.
[14] DOTSENKO Y, BAGHSORKHI S S, LLOYD B, et al.Auto-tuning of fast fourier transform on graphics processors[J].Acm Sigplan Notices, 2011, 46(8):257.
[15] LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science & Technology, 2013, 28(1):90-105.
[16] NUKADA A, MATSUOKA S.Auto-tuning 3-D FFT library for Cuda GPUs[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2009).Portland, Oregon, USA, ACM, 2009:14-20.
[17] NUKADA A, OGATA Y, ENDO T, et al.Bandwidth Intensive 3-D FFT Kernel for GPUs using CUDA[C]∥International Conference for High Performance Computing.2008.
[18] LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science and Technology, 2013(1):90-105.
[19] MUNDO C D, FENG W C.Towards a performance-portableFFT library for heterogeneous computing[M].ACM, 2014.
[20] Fastest Fourier Transform in the West[EB/OL].(2014-12-30) [2018-07-28].http://www.fftw.org/.
[21] Intel Corp.Intel-mkl[EB/OL].(2014-12-30) [2018-07-28].http://software.Intel.com/en-us/Intel-mkl/.
[22] Parallel three-dimensional fast fourier transforms[EB/OL].(2014-12-30) [2018-07-28].http://www.sdsc.edu/us/resources/p3dfft/.
[23] PEKUROVSKY, DMITRY.P3DFFT:A Framework for Parallel Computations of Fourier Transforms in Three Dimensions[J].SIAM Journal on Scientific Computing, 2012, 34(4):C192-C209.
[24] AYALA O, WANG L P.Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition[J].Parallel Computing, 2013, 39(1):58-77.
[25] PIPPIG, MICHAEL.PFFT - An Extension of FFTW to Massively Parallel Architectures[J].Siam Journal on entific Computing, 2013, 35(3):C213-C236.
[26] ELEFTHERIOU M, FITCH B G, RAYSHUBSKIY A, et al.Scalable framework for 3D FFTs on the Blue Gene/L supercomputer:Implementation and early performance measurements[J].IBM Journal of Research and Development, 2005, 49(2/3):457-464.
[27] SABHARWAL Y, GARG S K, GARG R, et al.Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer[C]∥International Conference on High-Performance Computing.Berlin:Springer, 2008.
[28] 2decomp&fft[EB/OL].(2014-12-30) [2018-07-28].http://www.2decomp.org/.
[29] KANDALLA K, SUBRAMONI H, TOMKO K, et al.High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters:a study with parallel 3D FFT[J].Computer Ence, 2011, 26(3/4):237-246.
[30] RAHIMIAN A, LASHUK I, VEERAPANENI S, et al.Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures[C]∥High Performance Computing, Networking, Storage & Analysis.IEEE, 2010.
[31] HAMADA T, NITADORI K, BENKRID K, et al.A novel mul-tiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - towards cost effective, high performance N-body simulation[J].Computer Science, 2009, 24(1/2):21-31.
[32] SONG S, HOLLINGSWORTH J K.Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap[C]∥Acm Sigplan Symposium on Principles & Practice of Pa-rallel Programming.ACM, 2014.
[33] BELL C, BONACHEA D, NISHTALA R, et al.Optimizingbandwidth limited problems using one-sided communication and overlap[C]∥IEEE International Parallel & Distributed Processing Symposium.IEEE, 2006.
[34] FANG B, DENG Y, MARTYNA G.Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer[J].Computer Physics Communications, 2007, 176(8):531-538.
[35] WU J, XIONG X, BERROCAL E, et al.Topology mapping of irregular parallel applications on torus-connected supercomputers[J].Journal of Supercomputing, 2017, 73(4):1691-1714.
[36] NUKADA A, SATO K, MATSUOKA S.Scalable multi-GPU3-D FFT for TSUBAME 2.0 supercomputer[C]∥International Conference for High Performance Computing, Networking, Sto-rage & Analysis.IEEE, 2012.
[37] CZECHOWSKI K, BATTAGLINO C, MCCLANAHAN C, et al.On the Communication Complexity of 3D FFTs and its Implications for Exascale[C]∥Proc.ACM Int’l.Conf.Supercomputing (ICS).ACM, 2012.
[38] HPCC[EB/OL].(2014-12-30) [2018-07-28].http://icl.cs.utk.edu/hpcc/index.html.
[39] ORSZAG S A, PATTERSON G S.Numerical Simulation ofThree-Dimensional Homogeneous Isotropic Turbulence[J].Physical Review Letters, 1972, 28(2):76-79.
[40] NICOLAI C, JACOB B, GUALTIERI P, et al.Inertial Particles in Homogeneous Shear Turbulence:Experiments and Direct Numerical Simulation[J].Flow Turbulence & Combustion, 2014, 92(1/2):65-82.
[41] CHEN S, SHAN X.High-resolution turbulent simulations using the Connection Machine-2[J].Computers in Physics, 1992, 176(8):531-538.
[42] YOKOKAWA M.16.4-Tflops direct numerical simulation ofturbulence by a Fourier spectral method on the Earth Simulator[J].Proc.ieee/acm Sc Conf.baltimore, 2002, 20(3):523-533.
[43] YEUNG P K, POPE S B, LAMORGESE A G, et al.Accelera-tion and dissipation statistics of numerically simulated isotropic turbulence[J].Physics of Fluids, 2006, 18(6):065103.
[44] WATANABE T, GOTOH T.Inertial-range intermittency andaccuracy of direct numerical simulation for turbulence and passive scalar turbulence[J].Journal of Fluid Mechanics, 2007, 590:117-146.
[45] LI Y, PERLMAN E, WAN M, et al.A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence[J].Journal of Turbulence, 2008, 9(31):N31.
[46] KANEDA Y, ISHIHARA T, YOKOKAWA M, et al.Energydissipation rate and energy spectrum in high resolution direct numerical simulations of turbulence in a periodic box[J].Phy-sics of Fluids, 2003, 15(2).
[47] ISHIHARA T, KANEDA Y, YOKOKAWA M, et al.Small-scale statistics in high-resolution direct numerical simulation of turbulence:Reynolds number dependence of one-point velocity gradient statistics[J].Journal of Fluid Mechanics, 2007, 592:335-366.
[48] CHEN Y F, CUI X, MEI H.PARRAY:a unifying array representation for heterogeneous parallelism[J].Acm Sigplan Notices, 2012, 47(8):171-180.
[49] PARRAY Manual[EB/OL].(2014-12-30)[2018-07-28].http://code.google.com/p/parray-programming/.
[50] CHEN Y, CUI X, MEI H.Large-scale FFT on GPU clusters[C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba, Ibaraki, Japan, ACM, 2010:2-4.
[1] 王桂平, 刘君, 罗宪, 陈旺桥. 一个基于多种评判模式的在线评判系统[J]. 计算机科学, 2020, 47(11A): 657-661.
[2] 龚彤艳,张广婷,贾海鹏,袁良. 一种偶数基Cooley-Tukey FFT高性能实现方法[J]. 计算机科学, 2020, 47(1): 31-39.
[3] 袁良,张云泉,白雪瑞,张广婷. 并行程序设计语言中局部性机制的研究[J]. 计算机科学, 2020, 47(1): 7-16.
[4] 权利,胡越黎,诸安骥,燕明. 基于改进双域滤波的视频降噪算法[J]. 计算机科学, 2016, 43(7): 294-296.
[5] 徐爱萍,吴笛,徐武平,陈军. 在线多任务异构云服务器负载均衡算法研究[J]. 计算机科学, 2016, 43(6): 50-54.
[6] 刘振,张志政. 一种基于ILP和ASP的学习B语言描述的动作模型方法[J]. 计算机科学, 2015, 42(1): 220-226.
[7] 邢萌,吴杨,王韬,李进东. 基于游程检测与快速傅里叶变换的加密比特流识别[J]. 计算机科学, 2015, 42(1): 164-169.
[8] 吴迪,陈林,徐宝文. SIMPLE:一种新型多范型程序设计语言[J]. 计算机科学, 2014, 41(7): 1-8.
[9] 吴迪,徐宝文. Ada语言的发展[J]. 计算机科学, 2014, 41(1): 1-15.
[10] 游珍 薛锦云 应时. Apla语言中并发分布式机制的研究[J]. 计算机科学, 2012, 39(1): 104-108.
[11] 李慧琪,赵致琢. 逻辑语言剪枝算子的过程语义及其实现[J]. 计算机科学, 2011, 38(5): 123-126.
[12] 古思山,蔡树彬,李师贤. 从面向方面程序设计的定义到面向方面程序设计语言[J]. 计算机科学, 2011, 38(10): 133-139.
[13] 武华北,孙济洲,王文义. 面向混合并行计算系统编程环境的研究与实现[J]. 计算机科学, 2010, 37(4): 143-.
[14] 杜欣,丁立新,谢承旺,陈莉. 基于EDA的并行基因表达式程序设计方法[J]. 计算机科学, 2010, 37(2): 196-199.
[15] 韩小芬 李凡长. 动态模糊逻辑程序设计语言的指称语义[J]. 计算机科学, 2009, 36(1): 153-157.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .