计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 56-61.doi: 10.11896/jsjkx.200200112
所属专题: 高性能计算
程盛淦1, 于浩然2, 韦建文1, 林新华1
CHENG Sheng-gan1, YU Hao-ran2, WEI Jian-wen1, James LIN1
摘要: 现代天体物理学的研究离不开大规模N-body模拟。N-body模拟常用的算法之一是粒子网格(Particle-Mesh, PM)算法, 但是PM算法需要消耗较多的内存容量。内存限制成为了N-body模拟在现代超算平台大规模扩展的瓶颈。因此, 文中使用了利用定点压缩技术减少内存消耗的方法, 将存储每个N-body粒子相空间的内存消耗减少到最低6个字节, 比传统PM算法低近一个数量级。文中实现了基于定点压缩技术的双层粒子网格算法, 并使用包括混合精度计算、通信优化在内的方法对其性能进行了优化。这些优化技术显著降低了定点压缩带来的性能损耗, 将压缩和解压在程序总耗时中的占比从21%降低至8%, 并且在核心计算热点上达到了最高2.3倍的加速效果, 使得程序在较低的内存消耗下保持较高的计算效率和扩展性。
中图分类号:
[1] FENG L, ZHU W.The simulation techniques and applications in modern cosmology.SCIENTIA SINICA Physica, Mechanica &Astronomica, 2013(6):1. [2] SI Y, WEI J, SEE S, et al.Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster.Computer Science, 2017, 44(10):80-84. [3] YANG X, FENG L, ZHE Y.Wavelet power spectrum analysis of cosmic large-scale structures:methods and numerical simulation tests.Science in China(Series A), 2001, 31(3):278-288. [4] HUANG W.Neutrino Mass and the Superstructure of the Universe.HIGH Energy Physics and Nnclear Physics, 1991, 15(12):1135-1136. [5] YU H R, EMBERSON J, INMAN D, et al.Differential neutrino condensation onto cosmic structure.Nature Astronomy, 2017, 1(7):1-5. [6] HARNOIS-DRAPS J, PEN U L, ILIEV I T, et al.High-per-formance P3M N-body code:CUBEP3M.Monthly Notices of the Royal Astronomical Society, 2013, 436(1):540-559. [7] YU H R, PEN U L, WANG X.CUBE:An Information-opti-mized Parallel Cosmological N-body Algorithm.The Astrophysical Journal Supplement Series, 2018, 237(2):24. [8] PEEBLES P J, YU J.Primeval adiabatic perturbation in an expanding universe.The Astrophysical Journal, 1970, 162:815. [9] ISHIYAMA T, ENOKI M, KOBAYASHI M A, et al.The ν2GC simulations:Quantifying the dark side of the universe in the [10] Planck cosmology.Publications of the Astronomical Society of Japan, 2015, 67(4):61. [11] HEITMANN K, FRONTIERE N, SEWELL C, et al.The Qcontinuum simulation:harnessing the power of GPU accelerated supercomputers.The Astrophysical Journal Supplement Series, 2015, 219(2):34. [12] HEITMANN K, FINKEL H, POPE A, et al.The Outer RimSimulation:A Path to Many-core Supercomputers.The Astrophysical Journal Supplement Series, 2019, 245(1):16. [13] SPRINGEL V.The cosmological simulation code GADGET-2.Monthly Notices of the Royal Astronomical Society, 2005, 364(4):1105-1134. [14] ISHIYAMA T, NITADORI K, MAKINO J.4.45 Pflops astrophysical N-body simulation on K computer-The gravitational trillion-body problem∥Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.2012:1-10. [15] LINDSTROM P.Fixed-rate compressed floating-point arrays.IEEE Transactions on Visualization and Computer Graphi-cs, 2014, 20(12):2674-2683. [16] LINDSTROM P, ISENBURG M.Fast and efficient compression of floating-point data.IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5):1245-1250. [17] PIPPIG M.PFFT:An extension of FFTW to massively parallel architectures.SIAM Journal on Scientific Computing, 2013, 35(3):C213-C236. [18] EMBERSON J, YU H-R, INMAN D, et al.Cosmological neutrino simulations at extreme scale.Research in Astronomy and Astrophysics, 2017, 17(8):85. [19] WANG Y, LIN J, CAI L, et al.Porting and Optimizing GTC-P on TaihuLinght Supercomputer with Sunway OpenACC.Journal of Computer Research and Development, 2018, 55(4):875-884. [20] MENG D, WEN M, WEI J, et al.Porting and Optimizing OpenFOAM on Sunway TaihuLight System.Computer Science, 2017, 44(10):64-70. |
[1] | 刘燕 杨晓东. MPP系统的互连通信技术研究 计算机科学, 1999, 26(6): 37-40. |
|