计算机科学 ›› 2017, Vol. 44 ›› Issue (10): 80-84.doi: 10.11896/j.issn.1002-137X.2017.10.015

• 2016 全国高性能计算学术年会 • 上一篇    下一篇

星系分组算法的并行设计与优化:SGI系统与分布式集群对比

司雨濛,韦建文,Simon SEE,林新华   

  1. 上海交通大学高性能计算中心 上海200240,上海交通大学高性能计算中心 上海200240,上海交通大学高性能计算中心 上海200240;NVIDIA 新加坡 138522,上海交通大学高性能计算中心 上海200240
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家重点研发计划(2016YFB0201400,2016YFB0201800),日本学术振兴会JSPS的RONPAKU项目,上海交通大学SMC-晨星青年学者奖励计划资助

Parallel Design and Optimization of Galaxy Group Finding Algorithm on Comparation of SGI and Distributed-memory Cluster

SI Yu-meng, WEI Jian-wen, Simon SEE and James LIN   

  • Online:2018-12-01 Published:2018-12-01

摘要: Halo-based Galaxy Group Finder (HGGF) 是一种有效的星系分组算法,它根据星系的空间位置、红移、质量等多种属性将星系分组,从而为星系组的形成与演化研究提供重要依据。但是,算法当前的OpenMP实现版本仅能利用单节点提供的资源,在大规模星系分组问题上的应用受到限制。一种优化思路是采用多机并行,使其可以利用更多资源来解决更大规模的星系分组问题,并缩短执行时间。因此,有必要对算法重新进行设计与实现。实现此目标的一大挑战是程序中存在大量半随机性远端内存访问,其在多机并行环境下会对性能造成重大影响。为克服这一难题,设计中提出了邻接星系链表思想,并采用Unified Parallel C (UPC)进行程序实现。对于核代码部分,使用4,8,16节点时,可分别取得2.25,2.78,5.07倍的加速比;同时,对单个节点的内存需求也显著减少。OpenMP版本在SGI UV 2000上的实验结果显示,受限于程序的访存特性与机器体系架构的特点,类似HGGF算法这种具有随机数据访问特征的程序,很难有效利用NUMA结构的共享内存系统中提供的大规模线程与内存资源来直接取得高加速比。在分布式内存集群上采用两级并行设计,以更好地利用局部性原理,可能是更好的解决方案。

关键词: 高性能计算,星系分组,并行计算,UPC,OpenMP

Abstract: Halo-based galaxy group finder (HGGF) is an effective algorithm that accomplishes the task of galaxy group finding based on galaxy coordinates,redshift and mass etc.,and provides great help in the research of galaxy group formation and evolution.However,current pure OpenMP implementation of the algorithm is limited by the resource of the underlying single compute node when dealing with large-scale group finding problems.One of the possible solutions is using resources from multiple nodes to reduce execution time while solving large-size galaxy group finding problem.Therefore,it is essential to redesign and implement the algorithm.The major hurdle for such an attempt is remoting memory access due to semi-random galaxy access in the algorithm which damages the performance in multi-node environment.To tackle such a problem,we paralleled the algorithm with adjacent galaxy list design and used unified parallel C (UPC) to implement it.2.25,2.78 and 5.07 times speedup for the kernel were achieved with 4,8 and 16 nodes respectively.Meanwhile,the memory requirement on each node was also reduced significantly.Experiments of OpenMP version of the algorithm on SGI UV 2000 show that due to the nature of the program and the features of NUMA architecture,programs with random memory access behavior like HGGF may not readily benefit from the large number of threads and shared memory provided by such machines.Two-level parallel design that takes advantage of locality principle on distributed memory clusters may be a better solution.

Key words: High performance computing,Galaxy group finding,Parallel computing,UPC,OpenMP

[1] KWON Y C,DYLAN N,JEFFREY G,et al.Scalable clusteringalgorithm for N-body simulations in a shared-nothing cluster[M]∥Scientific and Statistical Database Management.Springer Berlin Heidelberg,2010:132-150.
[2] YANG X H,MO H J,VAN DEN B,et al.A halo-based galaxy group finder:calibration and application to the 2dFGRS[J].Monthly Notices of the Royal Astronomical Society,2005,356(4):1293-1307.
[3] YANG X H,MO H J,VAN DEN B,et al.Galaxy groups in the SDSS DR4.I.The catalog and basic properties[J].The Astrophysical Journal,2007,671(1):153.
[4] WILLIAM C,et al.UPC Language Specifications Version 1.3https://upc-lang.org/assets/Uploads/spec/upc-lang-spec-1.3.pdf.
[5] Intel VTune Amplifier XE.https://software.intel.com/en-us/intel-vtune-amplifier-xe.
[6] Technical Advances in the SGI UV Architecture.https://www.sgi.com/pdfs/4192.pdf.
[7] HAO H,SI Y M,WEI J W,et al.Optimizing Irregular Memory Access in Astrophysical Clustering Studies[J].Journal of Frontiers of Computer Science and Technology,2017,11(1):80-90.(in Chinese) 郝赫,司雨蒙,韦建文,等.天体物理成团研究中的非规则访存优化[J].计算机科学与探索,2017,11(1):80-90.
[8] MARK D,GEORGE E,CAROS F,et al.The evolution of large-scale structure in a universe dominated by cold dark matter[J].The Astrophysical Journal,1985,292(2):371-394.
[9] NEAL K,LARS H,DAVID W.Galaxies and gas in a cold dark matter universe[J].The Astrophysical Journal,1992,399(2):L109-L112.
[10] EISENSTEIN D J,HUT P.Hop:A new group-finding algo-rithm for n-body simulations[J].The Astrophysical Journal,1998,498(1):137.
[11] LIU Y,LIAO W K,CHOUDHARY A.Design and evaluation of a parallel HOP clustering algorithm for cosmological simulation[C]∥International Parallel and Distributed Processing Symposium,2003.IEEE,2003.
[12] FU B,REN K,LPEZ J,et al.DiscFinder:A data-intensivescalable cluster finder for astrophysics[C]∥Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing.2010:348-351.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .