计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 56-61.doi: 10.11896/jsjkx.200200112

所属专题: 高性能计算

• 高性能计算 • 上一篇    下一篇

基于定点压缩技术的双层粒子网格算法的设计与优化

程盛淦1, 于浩然2, 韦建文1, 林新华1   

  1. 1 上海交通大学高性能计算中心 上海 200240
    2 厦门大学天文学系 福建 厦门 361005
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 林新华(james@sjtu.edu.cn)
  • 作者简介:chengshenggan@sjtu.edu.cn
  • 基金资助:
    国家重点研发计划(2016YFB0201800, 2018YFA0404603)

Design and Optimization of Two-level Particle-mesh Algorithm Based on Fixed-point Compression

CHENG Sheng-gan1, YU Hao-ran2, WEI Jian-wen1, James LIN1   

  1. 1 Center for High Performance Computing, Shanghai Jiao Tong University, Shanghai 200240, China,
    2 Department of Astronomy, Xiamen University, Xiamen, Fujian 361005, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CHENG Sheng-gan, born in 1997, bachelor.His main research interests include heterogeneous computing and parallel computing.
    James LIN, born in 1979, Ph.D, asso-ciate professor, is a senior member of China Computer Federation.His main research interests include HPC and so on.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2016YFB0201800, 2018YFA0404603).

摘要: 现代天体物理学的研究离不开大规模N-body模拟。N-body模拟常用的算法之一是粒子网格(Particle-Mesh, PM)算法, 但是PM算法需要消耗较多的内存容量。内存限制成为了N-body模拟在现代超算平台大规模扩展的瓶颈。因此, 文中使用了利用定点压缩技术减少内存消耗的方法, 将存储每个N-body粒子相空间的内存消耗减少到最低6个字节, 比传统PM算法低近一个数量级。文中实现了基于定点压缩技术的双层粒子网格算法, 并使用包括混合精度计算、通信优化在内的方法对其性能进行了优化。这些优化技术显著降低了定点压缩带来的性能损耗, 将压缩和解压在程序总耗时中的占比从21%降低至8%, 并且在核心计算热点上达到了最高2.3倍的加速效果, 使得程序在较低的内存消耗下保持较高的计算效率和扩展性。

关键词: N-body模拟, 大规模并行, 混合精度计算, 粒子网格算法

Abstract: Large-scale N-body simulation is of great significance for the study of modern physical cosmology.One of the most popular N-body simulation algorithms is particle-mesh(PM).However, the PM-based algorithms cost considerable amounts of memory, which becomes the bottleneck to scale the N-body simulations in the modern supercomputer.Therefore, this paper pro-poses to use fixed-point compression to reduce memory footprints per N-body particle to only 6 bytes, nearly an order of magnitude lower than the traditional PM-based algorithms.This paper implements the two-level particle-mesh algorithm with fixed-point compression and optimizes it with mixed-precision computation and communication optimizations.These optimizations significantly reduce the performance loss caused by fixed-point compression.The proportion of compression and decompression in the total time of the program reduces from 21% to 8% and achieves up to 2.3 times speedup on computing hotspots which make the algorithm maintain high efficiency and scalability with low memory consumption.

Key words: Large-scale parallelism, Mixed-precision calculation, N-body simulation, Particle-mesh method

中图分类号: 

  • TP391
[1] FENG L, ZHU W.The simulation techniques and applications in modern cosmology.SCIENTIA SINICA Physica, Mechanica &Astronomica, 2013(6):1.
[2] SI Y, WEI J, SEE S, et al.Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster.Computer Science, 2017, 44(10):80-84.
[3] YANG X, FENG L, ZHE Y.Wavelet power spectrum analysis of cosmic large-scale structures:methods and numerical simulation tests.Science in China(Series A), 2001, 31(3):278-288.
[4] HUANG W.Neutrino Mass and the Superstructure of the Universe.HIGH Energy Physics and Nnclear Physics, 1991, 15(12):1135-1136.
[5] YU H R, EMBERSON J, INMAN D, et al.Differential neutrino condensation onto cosmic structure.Nature Astronomy, 2017, 1(7):1-5.
[6] HARNOIS-DRAPS J, PEN U L, ILIEV I T, et al.High-per-formance P3M N-body code:CUBEP3M.Monthly Notices of the Royal Astronomical Society, 2013, 436(1):540-559.
[7] YU H R, PEN U L, WANG X.CUBE:An Information-opti-mized Parallel Cosmological N-body Algorithm.The Astrophysical Journal Supplement Series, 2018, 237(2):24.
[8] PEEBLES P J, YU J.Primeval adiabatic perturbation in an expanding universe.The Astrophysical Journal, 1970, 162:815.
[9] ISHIYAMA T, ENOKI M, KOBAYASHI M A, et al.The ν2GC simulations:Quantifying the dark side of the universe in the
[10] Planck cosmology.Publications of the Astronomical Society of Japan, 2015, 67(4):61.
[11] HEITMANN K, FRONTIERE N, SEWELL C, et al.The Qcontinuum simulation:harnessing the power of GPU accelerated supercomputers.The Astrophysical Journal Supplement Series, 2015, 219(2):34.
[12] HEITMANN K, FINKEL H, POPE A, et al.The Outer RimSimulation:A Path to Many-core Supercomputers.The Astrophysical Journal Supplement Series, 2019, 245(1):16.
[13] SPRINGEL V.The cosmological simulation code GADGET-2.Monthly Notices of the Royal Astronomical Society, 2005, 364(4):1105-1134.
[14] ISHIYAMA T, NITADORI K, MAKINO J.4.45 Pflops astrophysical N-body simulation on K computer-The gravitational trillion-body problem∥Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.2012:1-10.
[15] LINDSTROM P.Fixed-rate compressed floating-point arrays.IEEE Transactions on Visualization and Computer Graphi-cs, 2014, 20(12):2674-2683.
[16] LINDSTROM P, ISENBURG M.Fast and efficient compression of floating-point data.IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5):1245-1250.
[17] PIPPIG M.PFFT:An extension of FFTW to massively parallel architectures.SIAM Journal on Scientific Computing, 2013, 35(3):C213-C236.
[18] EMBERSON J, YU H-R, INMAN D, et al.Cosmological neutrino simulations at extreme scale.Research in Astronomy and Astrophysics, 2017, 17(8):85.
[19] WANG Y, LIN J, CAI L, et al.Porting and Optimizing GTC-P on TaihuLinght Supercomputer with Sunway OpenACC.Journal of Computer Research and Development, 2018, 55(4):875-884.
[20] MENG D, WEN M, WEI J, et al.Porting and Optimizing OpenFOAM on Sunway TaihuLight System.Computer Science, 2017, 44(10):64-70.
[1] 刘燕 杨晓东.
MPP系统的互连通信技术研究

计算机科学, 1999, 26(6): 37-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!