Computer Science ›› 2025, Vol. 52 ›› Issue (2): 261-267.doi: 10.11896/jsjkx.240200072

• Computer Network • Previous Articles     Next Articles

Two-stage Multi-factor Algorithm for Job Runtime Prediction Based on Usage Characteristics

SHANG Qiuyan1, LI Yicong2,3, WEN Ruilin1,2, MA Yinping1,2, OUYANG Rongbin1, FAN Chun1,2   

  1. 1 Computer Center,Peking University,Beijing 100084,China
    2 Peking University Changsha Institute for Computing and Digital Economy,Changsha 410000,China
    3 College of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 610000,China
  • Received:2024-02-21 Revised:2024-04-04 Online:2025-02-15 Published:2025-02-17
  • About author:SHANG Qiuyan,born in 1999,postgra-duate.Her main research interests include high performance computing,scheduling algorithm and artificial intelligence.
    MA Yinping,born in 1993,postgra-duate,engineer,is a member of CCF(No.L2132M).Her main research interests include high performance computing and artificial intelligence.
  • Supported by:
    Special Funds for Construction of Innovative Provinces in Hunan Province(2023GK1010) and High-performance Computing Platform of Peking University for Providing Computational Resources.

Abstract: To address the chain impact of inaccurate user-set job runtime on the scheduling system of high-performance computing platforms,a versatile two-stage multi-factor(TSMF) algorithm for job runtime prediction is proposed.TSMF integrates intricate user behavior patterns and nuanced job contextual features to ensure accurate and reliable predictions.TSMF can seamlessly embed into the scheduling systems of most high-performance computing platforms,thereby enhancing their performance.The multi-angle simulation experiments on the dataset and real scheduling system of Peking University's high-performance computing clusters show that TSMF performs well in prediction accuracy and can achieve accurate prediction on most jobs.For example,in up to 60.8 % of jobs,the prediction error is as low as less than one minute.Furthermore,TSMF significantly enhances the sche-duling algorithms in practical scenarios,improving resource utilization and substantially reducing user waiting times.

Key words: High-performance computing, Job runtime prediction, Job scheduling, Behavior patterns, Machine learning

CLC Number: 

  • TP391
[1]LU P J,XIONG Z Y,LAI M C.Analysis of the current status of high-performance computing technology and standards[J].Computer Science,2023,50(11):1-7.
[2]ZRIGUI S,DE CAMARGO R Y,LEGRAND A,et al.Improving the performance of batch schedulers using online job run-time classification[J].Journal of Parallel and Distributed Computing,2022,164:83-95.
[3]MENEAR K,NAG A,PERR-SAUER J,et al.Mastering HPC Runtime Prediction:From Observing Patterns to a Methodological Approach[M]//Practice and Experience in Advanced Research Computing.2023:75-85.
[4]MARO R,BAI Y,BAHAR R I.Dynamically reconfiguring processor resources to reduce power consumption in high-perfor-mance processors[C]//Power-Aware Computer Systems:First International Workshop,PACS 2000 Cambridge,MA,USA,November 12,2000 Revised Papers 1.Springer Berlin Heidelberg,2001:97-111.
[5]FAN Y,LAN Z,CHILDERS T,et al.Deep reinforcement agent for scheduling in HPC[C]//2021 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2021:807-816.
[6]ZHANG D,DAI D,HE Y,et al.RLScheduler:an automatedHPC batch job scheduler using reinforcement learning[C]//SC20:International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2020:1-15.
[7]FAN Y,LAN Z.DRAS-CQSim:A reinforcement learning based framework for HPC cluster scheduling[J].Software Impacts,2021,8:100077.
[8]TSAFRIR D,ETSION Y,FEITELSON D G.Backfilling usingsystem-generated predictions rather than user runtime estimates[J].IEEE Transactions on Parallel and Distributed Systems,2007,18(6):789-803.
[9]PHINJAROENPHAN P,BEVINAKOPPA S,ZEEPHONG-SEKUL P.A method for estimating the execution time of a pa-rallel task on a grid node[C]//Advances in Grid Computing-EGC 2005:European Grid Conference,Amsterdam,The Netherlands,February 14-16,2005,Revised Selected Papers.Springer Berlin Heidelberg,2005:226-236.
[10]QUINLAN J R.Induction of decision trees[J].Machine Lear-ning,1986,1:81-106.
[11]SMOLA A J,SCHÖLKOPF B.A tutorial on support vector regression[J].Statistics and Computing,2004,14:199-222.
[12]ROSENBLATT F.The perceptron:a probabilistic model for information storage and organization in the brain[J].Psychological Review,1958,65(6):386.
[13]YU X X,WEI J W,ZHANG Z B,et al.Research on machine learning to predict job running time of school-level computing teaching platform[J].Soft Ware Guide,2023,22(11):104-109.
[14]CHEN X,LU C D,PATTABIRAMAN K.Predicting job completion times using system logs in supercomputing clusters[C]//2013 43rd Annual IEEE/IFIP Conference on Dependable Systems and Networks Workshop(DSN-W).IEEE,2013:1-8.
[15]BAUM L E,PETRIE T,SOULES G,et al.A maximizationtechnique occurring in the statistical analysis of probabilistic functions of Markov chains[J].The Annals of Mathematical Statistics,1970,41(1):164-171.
[16]WU G P,SHEN Y,ZHANG W S,et al.Job duration prediction for backfill optimization[J].Journal of Chinese Computer System,2019,40(1):6-12.
[17]ZHOU L F,YANG W Y,HAN Y G,et al.Job name hierarchical clustering algorithm predicts job running time[J].Journal of National University of Defense Technology/Guofang Keji Daxue Xuebao,2022,44(5):13-23.
[18]ZHOU L,ZHANG X,YANG W,et al.Prep:Predicting jobruntime with job running path on supercomputers[C]//Proceedings of the 50th International Conference on Parallel Processing.2021:1-10.
[19]CHEN T,GUESTRIN C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[20]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]//Neural Information Processing Systems.Curran Associates Inc,2017.
[21]FRIEDMAN J H.Greedy function approximation:a gradientboosting machine[J].Annals of Statistics,2001:1189-1232.
[22]HU Q,SUN P,YAN S,et al.Characterization and prediction of deep learning workloads in large-scale gpu datacenters[C]//Proceedings of the International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.2021:1-15.
[23]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[24]BOX G E P,JENKINS G M,REINSEL G C,et al.Time series analysis:forecasting and control[M].John Wiley & Sons,2015.
[1] YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[2] GAO Yiqin, LUO Zhiyu, WANG Yichao, LIN Xinhua. Performance Evaluation and Optimization of Operating System for Domestic Supercomputer [J]. Computer Science, 2025, 52(5): 11-24.
[3] WU Xingli, ZHANG Haoyue, LIAO Huchang. Review of Doctor Recommendation Methods and Applications for Consultation Platforms [J]. Computer Science, 2025, 52(5): 109-121.
[4] JIAO Jian, CHEN Ruixiang, HE Qiang, QU Kaiyang, ZHANG Ziyi. Study on Smart Contract Vulnerability Repair Based on T5 Model [J]. Computer Science, 2025, 52(4): 362-368.
[5] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[6] XIONG Qibing, MIAO Qiguang, YANG Tian, YUAN Benzheng, FEI Yangyang. Malicious Code Detection Method Based on Hybrid Quantum Convolutional Neural Network [J]. Computer Science, 2025, 52(3): 385-390.
[7] ZUO Xuhong, WANG Yongquan, QIU Geping. Study on Integrated Model of Securities Illegal Margin Trading Accounts Identification Based on Trading Behavior Characteristics [J]. Computer Science, 2025, 52(2): 125-133.
[8] LI Haixia, SONG Danlei, KONG Jianing, SONG Yafei, CHANG Haiyan. Evaluation of Hyperparameter Optimization Techniques for Traditional Machine Learning Models [J]. Computer Science, 2024, 51(8): 242-255.
[9] ZHANG Daili, WANG Tinghua, ZHU Xinglin. Overview of Sample Reduction Algorithms for Support Vector Machine [J]. Computer Science, 2024, 51(7): 59-70.
[10] LIU Wei, SONG You, ZHUO Peiyan, WU Weiqiang, LIAN Xin. Study on Kcore-GCN Anti-fraud Algorithm Fusing Multi-source Graph Features [J]. Computer Science, 2024, 51(6A): 230600040-7.
[11] CHEN Xiangxiao, CUI Xin, DU Qin, TANG Haoyao. Study on Optimization of Abnormal Traffic Detection Model Based on Machine Learning [J]. Computer Science, 2024, 51(6A): 230700051-5.
[12] ZHOU Tianyang, YANG Lei. Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB [J]. Computer Science, 2024, 51(6A): 230800046-6.
[13] SI Jia, LIANG Jianfeng, XIE Shuo, DENG Yingjun. Research Progress of Anomaly Detection in IaaS Cloud Operation Driven by Deep Learning [J]. Computer Science, 2024, 51(6A): 230400016-8.
[14] WANG Zhaodan, ZOU Weiqin, LIU Wenjie. Buggy File Identification Based on Recommendation Lists [J]. Computer Science, 2024, 51(6A): 230600088-8.
[15] TIAN Shuaihua, LI Zheng, WU Yonghao, LIU Yong. Identifying Coincidental Correct Test Cases Based on Machine Learning [J]. Computer Science, 2024, 51(6): 68-77.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!