Computer Science ›› 2017, Vol. 44 ›› Issue (10): 85-90.doi: 10.11896/j.issn.1002-137X.2017.10.016

Evaluation of Resource Management Methods for Large High Energy Physics Computer Cluster

SUN Zhen-yu, SHI Jing-yan, JIANG Xiao-wei, ZOU Jia-heng and DU Ran   

  Online:2018-12-01 Published:2018-12-01

Abstract: High energy physics data consist of multiple events,among which there is no relativity.A high energy phy-sics computing mission is parallelized by running multiple jobs processing multiple different data files simultaneously.Therefore,high energy physics computing is a typical high throughput computing scenario.The computer cluster running at the institute of high energy physics (IHEP) uses the open-source TORQUE/Maui for resource management and job scheduling.IHEP keeps a fair-use policy by dividing the computing resources of this cluster into multiple queues,and limiting the maximum number of running jobs of each user.However,this leads up to a low overall resource usage of the cluster.SLURM and HTCondor are both popular open-source resource management system.SLURM has plenty of job scheduling policy,while HTCondor well suits high throughput computing.Both of them are the possible solutions of resource management for computer clusters,replacing old,lack-of-service TORQUE/Maui.In this paper,job submission behavior of users from Daya Bay experiment was simulated at SLURM and HTCondor testing cluster,testing the resource allocation behaviors and efficiencies of SLURM and HTCondor.Their scheduling results were then compared with the actual scheduling result of the same jobs on IHEP TORQUE/Maui cluster.Finally the strengths and weaknesses of SLURM and HTCondor were analyzed,and the practicability of using SLURM or HTCondor to manage the IHEP computer cluster was discussed.

Key words: Resource management system,Job scheduler,Computer cluster,High throughput computing,High energy physics computing

