计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 17-15.doi: 10.11896/jsjkx.200100124
所属专题: 高性能计算
崔翔1, 2, 李晓雯3, 陈一峯1
CUI Xiang1, 2, LI Xiao-wen3, CHEN Yi-feng1
摘要: 与传统集群相比, 异构集群具有较高的性价比。但相比迅速发展的硬件技术, 当前软件技术仍然落后, 不能适应不断更新的异构硬件和超大规模的并行计算环境。当前普遍采用的解决方案是直接使用针对不同硬件的并行编程工具, 这一组合方案的缺点是编程层次低, 开发、修改与调试困难。文中介绍了新型语言机制用于描述数据与线程的多维规则结构、排列方式以及通讯模式, 提出了基于新型语言机制的不同类型异构系统之间的软件移植和优化方法。以直接法湍流模拟为例, 实现了应用在不同异构系统上的通信优化和快速移植。
中图分类号:
[1]NUMRICH R W.Co-array Fortran for parallel programming[C]∥Acm Sigplan Fortran Forum.1998. [2]YELICK K A, SEMENZATO L, PIKE G, et al.Titanium:A High-performance Java Dialect[J].Concurrency Practice & Experience, 1998, 10(11/12/13):825-836. [3]HILFINGER P N, BONACHEA D, GAY D, et al.Titanium language reference manual v1[J].The Url & Gt/02/17 IEEE, 2005, 20(4):102-103. [4]ZHANG F, XIE F Y, CHEN S L, et al.Predictions of titanium alloy properties using thermodynamic modeling tools[J].Journal of Materials Engineering & Performance, 2005, 14(6):717-721. [5]CHRISTOULIS D K, GUETTA S, GUIPONT V, et al.The influence of the substrate on the deposition of cold sprayed tita-nium:an experimental and numerical study[J].Journal of Thermal Spray Technology, 2011, 20(3):523-533. [6]NIEPLOCHA, JAROSLAW, KRISHNAN, et al.Global Arrays Parallel Programming Toolkit[J].Encyclopedia of Parallel Computing, 2011:779-787. [7]Consortium, UPC.UPC Language Specifications V1.2[J].lawrence Berkeley National Laboratory, 2005(7):146-159. [8]ELGHAZAWI, TAREK, CARLSON, et al.UPC:Distributed Shar- ed Memory Programming[M].Wiley-Interscience, 2003. [9]ELGHAZAWI T, CARLSON W, STERLING T, et al.UPC(Distributed Shared Memory Programming) || Performance Tuning and Optimization.https://www.onacademic.com/detail/journal_1000040810431210_4d1d.html. [10]GOVINDARAJU N K, LLOYD B, DOTSENKO Y, et al.High performance discrete Fourier transforms on graphics processors[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2008).Austin, Texas, USA.ACM, 2008. [11]MICIKEVICIUS P.3D finite difference computation on GPUs using CUDA[C]∥Workshop on General Purpose Processing on Graphics Processing Units.2009. [12]FRANCHETTI F, PUSCHEL M, VORONENKO Y, et al.Discrete fourier transform on multicore[J].Signal Processing Ma-gazine IEEE, 2009, 26(6):90-102. [13]VOLKOV V, KAZIAN B.FFT prototype [EB/OL].(2014-12-30)[2018-07-28].http://www.cs.berkeley.edu/ volkov/. [14]DOTSENKO Y, BAGHSORKHI S S, LLOYD B, et al.Auto-tuning of fast fourier transform on graphics processors[J].Acm Sigplan Notices, 2011, 46(8):257. [15]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science & Technology, 2013, 28(1):90-105. [16]NUKADA A, MATSUOKA S.Auto-tuning 3-D FFT library for Cuda GPUs[C]∥Proceedings of the ACM/IEEE Conference on High Performance Computing(SC 2009).Portland, Oregon, USA, ACM, 2009:14-20. [17]NUKADA A, OGATA Y, ENDO T, et al.Bandwidth Intensive 3-D FFT Kernel for GPUs using CUDA[C]∥International Conference for High Performance Computing.2008. [18]LI Y, ZHANG Y Q, LIU Y Q, et al.MPFFT:An Auto-Tuning FFT Library for OpenCL GPUs[J].Journal of Computer Science and Technology, 2013(1):90-105. [19]MUNDO C D, FENG W C.Towards a performance-portableFFT library for heterogeneous computing[M].ACM, 2014. [20]Fastest Fourier Transform in the West[EB/OL].(2014-12-30) [2018-07-28].http://www.fftw.org/. [21]Intel Corp.Intel-mkl[EB/OL].(2014-12-30) [2018-07-28].http://software.Intel.com/en-us/Intel-mkl/. [22]Parallel three-dimensional fast fourier transforms[EB/OL].(2014-12-30) [2018-07-28].http://www.sdsc.edu/us/resources/p3dfft/. [23]PEKUROVSKY, DMITRY.P3DFFT:A Framework for Parallel Computations of Fourier Transforms in Three Dimensions[J].SIAM Journal on Scientific Computing, 2012, 34(4):C192-C209. [24]AYALA O, WANG L P.Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition[J].Parallel Computing, 2013, 39(1):58-77. [25]PIPPIG, MICHAEL.PFFT - An Extension of FFTW to Massively Parallel Architectures[J].Siam Journal on entific Computing, 2013, 35(3):C213-C236. [26]ELEFTHERIOU M, FITCH B G, RAYSHUBSKIY A, et al.Scalable framework for 3D FFTs on the Blue Gene/L supercomputer:Implementation and early performance measurements[J].IBM Journal of Research and Development, 2005, 49(2/3):457-464. [27]SABHARWAL Y, GARG S K, GARG R, et al.Optimization of Fast Fourier Transforms on the Blue Gene/L Supercomputer[C]∥International Conference on High-Performance Computing.Berlin:Springer, 2008. [28]2decomp&fft[EB/OL].(2014-12-30) [2018-07-28].http://www.2decomp.org/. [29]KANDALLA K, SUBRAMONI H, TOMKO K, et al.High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters:a study with parallel 3D FFT[J].Computer Ence, 2011, 26(3/4):237-246. [30]RAHIMIAN A, LASHUK I, VEERAPANENI S, et al.Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures[C]∥High Performance Computing, Networking, Storage & Analysis.IEEE, 2010. [31]HAMADA T, NITADORI K, BENKRID K, et al.A novel mul-tiple-walk parallel algorithm for the Barnes-Hut treecode on GPUs - towards cost effective, high performance N-body simulation[J].Computer Science, 2009, 24(1/2):21-31. [32]SONG S, HOLLINGSWORTH J K.Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap[C]∥Acm Sigplan Symposium on Principles & Practice of Pa-rallel Programming.ACM, 2014. [33]BELL C, BONACHEA D, NISHTALA R, et al.Optimizingbandwidth limited problems using one-sided communication and overlap[C]∥IEEE International Parallel & Distributed Processing Symposium.IEEE, 2006. [34]FANG B, DENG Y, MARTYNA G.Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer[J].Computer Physics Communications, 2007, 176(8):531-538. [35]WU J, XIONG X, BERROCAL E, et al.Topology mapping of irregular parallel applications on torus-connected supercomputers[J].Journal of Supercomputing, 2017, 73(4):1691-1714. [36]NUKADA A, SATO K, MATSUOKA S.Scalable multi-GPU3-D FFT for TSUBAME 2.0 supercomputer[C]∥International Conference for High Performance Computing, Networking, Sto-rage & Analysis.IEEE, 2012. [37]CZECHOWSKI K, BATTAGLINO C, MCCLANAHAN C, et al.On the Communication Complexity of 3D FFTs and its Implications for Exascale[C]∥Proc.ACM Int’l.Conf.Supercomputing (ICS).ACM, 2012. [38]HPCC[EB/OL].(2014-12-30) [2018-07-28].http://icl.cs.utk.edu/hpcc/index.html. [39]ORSZAG S A, PATTERSON G S.Numerical Simulation ofThree-Dimensional Homogeneous Isotropic Turbulence[J].Physical Review Letters, 1972, 28(2):76-79. [40]NICOLAI C, JACOB B, GUALTIERI P, et al.Inertial Particles in Homogeneous Shear Turbulence:Experiments and Direct Numerical Simulation[J].Flow Turbulence & Combustion, 2014, 92(1/2):65-82. [41]CHEN S, SHAN X.High-resolution turbulent simulations using the Connection Machine-2[J].Computers in Physics, 1992, 176(8):531-538. [42]YOKOKAWA M.16.4-Tflops direct numerical simulation ofturbulence by a Fourier spectral method on the Earth Simulator[J].Proc.ieee/acm Sc Conf.baltimore, 2002, 20(3):523-533. [43]YEUNG P K, POPE S B, LAMORGESE A G, et al.Accelera- tion and dissipation statistics of numerically simulated isotropic turbulence[J].Physics of Fluids, 2006, 18(6):065103. [44]WATANABE T, GOTOH T.Inertial-range intermittency andaccuracy of direct numerical simulation for turbulence and passive scalar turbulence[J].Journal of Fluid Mechanics, 2007, 590:117-146. [45]LI Y, PERLMAN E, WAN M, et al.A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence[J].Journal of Turbulence, 2008, 9(31):N31. [46]KANEDA Y, ISHIHARA T, YOKOKAWA M, et al.Energydissipation rate and energy spectrum in high resolution direct numerical simulations of turbulence in a periodic box[J].Phy-sics of Fluids, 2003, 15(2). [47]ISHIHARA T, KANEDA Y, YOKOKAWA M, et al.Small-scale statistics in high-resolution direct numerical simulation of turbulence:Reynolds number dependence of one-point velocity gradient statistics[J].Journal of Fluid Mechanics, 2007, 592:335-366. [48]CHEN Y F, CUI X, MEI H.PARRAY:a unifying array representation for heterogeneous parallelism[J].Acm Sigplan Notices, 2012, 47(8):171-180. [49]PARRAY Manual[EB/OL].(2014-12-30)[2018-07-28].http://code.google.com/p/parray-programming/. [50]CHEN Y, CUI X, MEI H.Large-scale FFT on GPU clusters[C]∥Proceedings of the 24th International Conference on Supercomputing.Tsukuba, Ibaraki, Japan, ACM, 2010:2-4. |
[1] | 王桂平, 刘君, 罗宪, 陈旺桥. 一个基于多种评判模式的在线评判系统 Online Judge System Based on Multiple Judgement Modes 计算机科学, 2020, 47(11A): 657-661. https://doi.org/10.11896/jsjkx.200500048 |
[2] | 袁良,张云泉,白雪瑞,张广婷. 并行程序设计语言中局部性机制的研究 Research on Locality-aware Design Mechanism of State-of-the-art Parallel Programming Languages 计算机科学, 2020, 47(1): 7-16. https://doi.org/10.11896/jsjkx.181202409 |
[3] | 龚彤艳,张广婷,贾海鹏,袁良. 一种偶数基Cooley-Tukey FFT高性能实现方法 High-performance Implementation Method for Even Basis of Cooley-Tukey FFT 计算机科学, 2020, 47(1): 31-39. https://doi.org/10.11896/jsjkx.190900179 |
[4] | 权利,胡越黎,诸安骥,燕明. 基于改进双域滤波的视频降噪算法 Video Denoising Method Based on Improved Dual-domain Image Denoising 计算机科学, 2016, 43(7): 294-296. https://doi.org/10.11896/j.issn.1002-137X.2016.07.054 |
[5] | 徐爱萍,吴笛,徐武平,陈军. 在线多任务异构云服务器负载均衡算法研究 Research on Online Multi-task Load Balance Algorithm in Cloud Server Cluster 计算机科学, 2016, 43(6): 50-54. https://doi.org/10.11896/j.issn.1002-137X.2016.06.010 |
[6] | 刘振,张志政. 一种基于ILP和ASP的学习B语言描述的动作模型方法 Learning Action Models Described in Action Language B by Combining ILP and ASP 计算机科学, 2015, 42(1): 220-226. https://doi.org/10.11896/j.issn.1002-137X.2015.01.049 |
[7] | 邢萌,吴杨,王韬,李进东. 基于游程检测与快速傅里叶变换的加密比特流识别 Identification of Encrypted Bit Stream Based on Runs Test and Fast Fourier Transform 计算机科学, 2015, 42(1): 164-169. https://doi.org/10.11896/j.issn.1002-137X.2015.01.038 |
[8] | 吴迪,陈林,徐宝文. SIMPLE:一种新型多范型程序设计语言 SIMPLE:A Novel Multi-paradigm Programming Language 计算机科学, 2014, 41(7): 1-8. https://doi.org/10.11896/j.issn.1002-137X.2014.07.001 |
[9] | 吴迪,徐宝文. Ada语言的发展 Evolution of Ada Programming Language 计算机科学, 2014, 41(1): 1-15. |
[10] | 游珍 薛锦云 应时. Apla语言中并发分布式机制的研究 Research on Concurrent and Distributed Mechanism of Apla Language 计算机科学, 2012, 39(1): 104-108. |
[11] | 李慧琪,赵致琢. 逻辑语言剪枝算子的过程语义及其实现 Procedural Semantics and its Implementation of Pruning Operators in Logic Programming Language 计算机科学, 2011, 38(5): 123-126. |
[12] | 古思山,蔡树彬,李师贤. 从面向方面程序设计的定义到面向方面程序设计语言 From the Definition of Aspect-oriented Programming to Aspect-oriented Programming Languages 计算机科学, 2011, 38(10): 133-139. |
[13] | 武华北,孙济洲,王文义. 面向混合并行计算系统编程环境的研究与实现 Research and Implementation of Parallel Programming Environment for Hybrid Parallel Computing System 计算机科学, 2010, 37(4): 143-. |
[14] | 杜欣,丁立新,谢承旺,陈莉. 基于EDA的并行基因表达式程序设计方法 Parallel Gene Expression Programming Based on FDA 计算机科学, 2010, 37(2): 196-199. |
[15] | 韩小芬 李凡长. 动态模糊逻辑程序设计语言的指称语义 计算机科学, 2009, 36(1): 153-157. |
|