Computer Science ›› 2026, Vol. 53 ›› Issue (6): 171-184.doi: 10.11896/jsjkx.250800064

• High Performance Computing • Previous Articles     Next Articles

Workload Analysis and Modeling Method for High-performance Computing

WU Can, XIAO Haili, WANG Xiaoning, ZHAO Yining, LU Shasha, HE Rong   

  1. Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083,China
  • Received:2025-08-14 Revised:2026-03-23 Online:2026-06-15 Published:2026-06-09
  • About author:WU Can,born in 1992,Ph.D,is a member of CCF(No.P7131M).Her main research interests include high perfor-mance computing and distributed system.
  • Supported by:
    National Key R & D Program of China(2023YFB3002302).

Abstract: In the field of high-performance computing(HPC),machine learning driven scheduling algorithms are increasingly becoming a research focus,with their performance optimization highly dependent on the quality of workload data.Therefore,research on workload analysis and modeling methods for HPC environments is of significant importance for improving the efficiency and adaptability of scheduling algorithms.This study conducts an in-depth analysis of HPC workloads based on publicly available operational logs from supercomputing centers,establishing a comprehensive methodology for workload analysis and modeling.The DBSCAN algorithm is first employed to clean anomalous data,followed by a systematic analysis of job arrival patterns,processor core distribution characteristics,application distribution features,runtime distribution attributes,and correlations among different parameters.Based on these findings,a flexible HPC workload generator is developed,supporting both default parameter configurations and automated analysis of SWF-format workload characteristics to meet diverse simulation needs.Experimental validation demonstrates that the synthetic data generated by this tool aligns more closely with real-world workload distributions compared to randomly generated data.The proposed generator can provide high-quality training data for the development of novel scheduling algorithms,contributing to improved resource utilization efficiency in supercomputing centers.

Key words: High performance computing, Workload, Data analysis, Data modeling, Workload generator

CLC Number: 

  • TP311
[1]XIANG Y,YANG X M,SUN Y,et al.A Fault-tolerant andCost-efficient Workflow Scheduling Approach Based on Deep Reinforcement Learning for IT Operation and Maintenance[C]//International Conference on Computer Supported Cooperative Work in Design.2023:411-416.
[2]WANG B Y,LI H F,LIN Z W,et al.Temporal Fusion Pointer network-based Reinforcement Learning algorithm for Multi-Objective Workflow Scheduling in the cloud[C]//International Joint Conference on Neural Networks.2020:1-8.
[3]ASGHARI A,SOHRABI M K,YAGHMAEE F.Online scheduling of dependent tasks of cloud's workflows to enhance resource utilization and reduce the makespan using multiple reinforcement learning-based agents[J].Soft Computing,2020,24:16177-16199.
[4]WEI Y,KUDENKO D,LIU S,et al.A Reinforcement Learning Based Workflow Application Scheduling Approach in Dynamic Cloud Environment[C]//International Conference on Collaborative Computing:Networking,Applications and Worksharing.CollaborateCom,2017:120-131.
[5]DONG T T,XUE F,TANG H L,et al.Deep reinforcementlearning for fault-tolerant workflow scheduling in cloud environment[J].Applied Intelligence,2022,53(9):9916-9932.
[6]LUBLIN U,FEITELSON D.The workload on parallel super-computers:modeling the characteristics of rigid jobs[J].Parallel Distributed Computing,2003,63:1105-1122.
[7]CIRNE W,BERMAN F.A comprehensive model of the supercomputer workload[C]//Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization.2001:140-148.
[8]PATEL T,LIU Z C,KETTIMUTHU R,et al.Job Characteris-tics on Large-Scale Systems:Long-Term Analysis,Quantification,and Implications[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.2020:1-17.
[9]中国高性能计算工作负载库[EB/OL].https://git.ustc.edu.cn/shenyu/CSWA.git.
[10]WANG Q Q,LI J,WANG S,et al.A Novel Two-Step Job Run-time Estimation Method Based on Input Parameters in HPC System[C]//International Conference on Cloud Computing and Big Data Analysis.2019:311-316.
[11]WANG Q Q,SHEN Y,LI J.User-level Workload Analysis for Supercomputers[C]//Conference on Software Engineering and Information Management.2021:16-18.
[12]FEITELSON D,TSAFIRI D,KRAKOV D.Experience withusing the Parallel Workloads Archive[J].Parallel and Distributed Computing,2014,74(10):2967-2982.
[13]LOSUP A,EPEMA D.Grid Computing Workloads[J].IEEEInternet Computing,2011,15(2):19-26.
[14]LOSUP A,SONMEZ O,ANOEP S,et al.The performance of bags-of-tasks in large-scale distributed systems[C]//High Performance Distributed Computing.2008:97-108.
[15]CARVALHO M,BRASILEIRO F.A User-Based Model of Grid Computing Workloads[C]//International Conference on Grid Computing.2012:40-48.
[16]LOSUP A,JAN M,SONMEZ O,et al.The Characteristics and Performance of Groups of Jobs in Grids[C]//International Euro-Par Conference on Parallel Processing.2007:382-393.
[17]SCHLAGKAMP S,SILVA R F D,ALLCOCK W,et al.Conse-cutive Job Submission Behavior at Mira Supercomputer[C]//International Symposium on High-Performance Parallel and Distributed Computing.2016:93-96.
[18]RODRIGO G P,OSTBERG P O,ELMROTH E,et al.Towards understanding HPC users and systems:A NERSC case study[J].Parallel and Distributed Computing,2018,111:206-221.
[19]FAN Y P,LAN Z L,CHILDERS T,et al.Deep Reinforcement Agent for Scheduling in HPC[C]//International Parallel and Distributed Processing Symposium.2021:807-816.
[1] LI Jinyou, ZHANG Wenshuai, SHEN Yu, ZHANG Yundong, LI Huimin, LI Jing. Machine Learning-based Parallel Parameter Optimization in High-performance ComputingApplications [J]. Computer Science, 2026, 53(6): 153-162.
[2] LI Fei, LIU Song, GUO Songjian, LIU Jiazheng, ZHANG Ying, HONG Longwei, ZHANG Boxuan. High-performance Image Preprocessing Operators for Cambricon MLU Accelerator Card [J]. Computer Science, 2026, 53(6): 193-202.
[3] ZHAO Yining, WANG Xiaoning, NIU Tie, ZHAO Yi, XIAO Haili. Node Failure and Anomaly Prediction Method for Supercomputing Systems [J]. Computer Science, 2025, 52(9): 128-136.
[4] LIAO Zeming, LIU Guikai, HU Yonghua, XIE Anxing. Research on Efficient Code Generation Techniques for Array Computation for Vector DSPs [J]. Computer Science, 2025, 52(6A): 240300156-7.
[5] ZUO Xianyu, ZHOU Xiaohu, ZHOU Liming, XIE Yi, LIU Cheng. Efficient Remote Sensing Common Product Production Algorithm Based on Product Reuse Model [J]. Computer Science, 2025, 52(6): 316-323.
[6] ZHANG Minghao, XIAO Bohuai, ZHENG Song, CHEN Xing. Resource Allocation Method with Workload-time Windows for Serverless Applications inCloud-edge Collaborative Environment [J]. Computer Science, 2025, 52(6): 336-345.
[7] TAN Zhengyuan, ZHONG Jiaqing, CHEN Juan. AI+HPC:An Overview of Supercomputing System Software and Application Technology Development Driven by “AI+” [J]. Computer Science, 2025, 52(5): 1-10.
[8] LIAO Qiucheng, ZHOU Yang, LIN Xinhua. Metrics and Tools for Evaluating the Deviation in Parallel Timing [J]. Computer Science, 2025, 52(5): 41-49.
[9] LI Jinhui, CAO Lifeng, WANG Xiaoqin, BAI Jinlong, CHEN Yang. Information Level Inference Method for Data Aggregation Based on Granular Association [J]. Computer Science, 2025, 52(11A): 241200047-8.
[10] LI Ruiyang, LI Shuyi, YANG Yuexi, PENG Chuhan, XING Jingyu, QIAO Gaoxiu. Research on Portfolio Construction Based on Topological Structure Features [J]. Computer Science, 2025, 52(10): 13-21.
[11] ZUO Shun, LI Yongkun, XU Yinlong. Study on Collaborative Data Persistence in NewSQL Databases [J]. Computer Science, 2025, 52(1): 131-141.
[12] CHEN Yiyang, WANG Xiaoning, YAN Xiaoting, LI Guanlong ZHAO Yining, LU Shasha, XIAO Haili. Study on High Performance Computing Container Checkpoint Technology Based on CRIU [J]. Computer Science, 2024, 51(9): 40-50.
[13] YAN Xiaoting, WANG Xiaoning, DONG Sheng, ZHAO Yining, XIAO Haili. Review on the Development and Application of Checkpointing Technology in High-performanceComputing [J]. Computer Science, 2024, 51(9): 1-14.
[14] BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.
[15] MAO Xin, LEI Zhanyao, QI Zhengwei. Automated Kaomoji Extraction Based on Large-scale Danmaku Texts [J]. Computer Science, 2024, 51(1): 284-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!