Computer Science ›› 2020, Vol. 47 ›› Issue (8): 56-61.doi: 10.11896/jsjkx.200200112

;

Previous Articles     Next Articles

Design and Optimization of Two-level Particle-mesh Algorithm Based on Fixed-point Compression

CHENG Sheng-gan1, YU Hao-ran2, WEI Jian-wen1, James LIN1   

  1. 1 Center for High Performance Computing, Shanghai Jiao Tong University, Shanghai 200240, China,
    2 Department of Astronomy, Xiamen University, Xiamen, Fujian 361005, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:CHENG Sheng-gan, born in 1997, bachelor.His main research interests include heterogeneous computing and parallel computing.
    James LIN, born in 1979, Ph.D, asso-ciate professor, is a senior member of China Computer Federation.His main research interests include HPC and so on.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2016YFB0201800, 2018YFA0404603).

Abstract: Large-scale N-body simulation is of great significance for the study of modern physical cosmology.One of the most popular N-body simulation algorithms is particle-mesh(PM).However, the PM-based algorithms cost considerable amounts of memory, which becomes the bottleneck to scale the N-body simulations in the modern supercomputer.Therefore, this paper pro-poses to use fixed-point compression to reduce memory footprints per N-body particle to only 6 bytes, nearly an order of magnitude lower than the traditional PM-based algorithms.This paper implements the two-level particle-mesh algorithm with fixed-point compression and optimizes it with mixed-precision computation and communication optimizations.These optimizations significantly reduce the performance loss caused by fixed-point compression.The proportion of compression and decompression in the total time of the program reduces from 21% to 8% and achieves up to 2.3 times speedup on computing hotspots which make the algorithm maintain high efficiency and scalability with low memory consumption.

Key words: Large-scale parallelism, Mixed-precision calculation, N-body simulation, Particle-mesh method

CLC Number: 

  • TP391
[1] FENG L, ZHU W.The simulation techniques and applications in modern cosmology.SCIENTIA SINICA Physica, Mechanica &Astronomica, 2013(6):1.
[2] SI Y, WEI J, SEE S, et al.Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster.Computer Science, 2017, 44(10):80-84.
[3] YANG X, FENG L, ZHE Y.Wavelet power spectrum analysis of cosmic large-scale structures:methods and numerical simulation tests.Science in China(Series A), 2001, 31(3):278-288.
[4] HUANG W.Neutrino Mass and the Superstructure of the Universe.HIGH Energy Physics and Nnclear Physics, 1991, 15(12):1135-1136.
[5] YU H R, EMBERSON J, INMAN D, et al.Differential neutrino condensation onto cosmic structure.Nature Astronomy, 2017, 1(7):1-5.
[6] HARNOIS-DRAPS J, PEN U L, ILIEV I T, et al.High-per-formance P3M N-body code:CUBEP3M.Monthly Notices of the Royal Astronomical Society, 2013, 436(1):540-559.
[7] YU H R, PEN U L, WANG X.CUBE:An Information-opti-mized Parallel Cosmological N-body Algorithm.The Astrophysical Journal Supplement Series, 2018, 237(2):24.
[8] PEEBLES P J, YU J.Primeval adiabatic perturbation in an expanding universe.The Astrophysical Journal, 1970, 162:815.
[9] ISHIYAMA T, ENOKI M, KOBAYASHI M A, et al.The ν2GC simulations:Quantifying the dark side of the universe in the
[10] Planck cosmology.Publications of the Astronomical Society of Japan, 2015, 67(4):61.
[11] HEITMANN K, FRONTIERE N, SEWELL C, et al.The Qcontinuum simulation:harnessing the power of GPU accelerated supercomputers.The Astrophysical Journal Supplement Series, 2015, 219(2):34.
[12] HEITMANN K, FINKEL H, POPE A, et al.The Outer RimSimulation:A Path to Many-core Supercomputers.The Astrophysical Journal Supplement Series, 2019, 245(1):16.
[13] SPRINGEL V.The cosmological simulation code GADGET-2.Monthly Notices of the Royal Astronomical Society, 2005, 364(4):1105-1134.
[14] ISHIYAMA T, NITADORI K, MAKINO J.4.45 Pflops astrophysical N-body simulation on K computer-The gravitational trillion-body problem∥Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.2012:1-10.
[15] LINDSTROM P.Fixed-rate compressed floating-point arrays.IEEE Transactions on Visualization and Computer Graphi-cs, 2014, 20(12):2674-2683.
[16] LINDSTROM P, ISENBURG M.Fast and efficient compression of floating-point data.IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5):1245-1250.
[17] PIPPIG M.PFFT:An extension of FFTW to massively parallel architectures.SIAM Journal on Scientific Computing, 2013, 35(3):C213-C236.
[18] EMBERSON J, YU H-R, INMAN D, et al.Cosmological neutrino simulations at extreme scale.Research in Astronomy and Astrophysics, 2017, 17(8):85.
[19] WANG Y, LIN J, CAI L, et al.Porting and Optimizing GTC-P on TaihuLinght Supercomputer with Sunway OpenACC.Journal of Computer Research and Development, 2018, 55(4):875-884.
[20] MENG D, WEN M, WEI J, et al.Porting and Optimizing OpenFOAM on Sunway TaihuLight System.Computer Science, 2017, 44(10):64-70.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!