融合多维算子特征的深度学习训练时间预测算法

doi:10.11896/jsjkx.250900001

Abstract

Abstract: Offline tasks are delay-tolerant workloads without strict requirements on completion time,typically including batch processing or machine learning tasks.With the development of deep learning technology,deep learning tasks have become one of the important parts of offline workloads in cloud data centers.Accurate runtime prediction of offline tasks improves resource utilization during idle periods of online tasks.However,deep learning models exhibit diverse architectures and vast scale differences.Factors such as batch sizes,hyperparameters and operator characteristics during training also significantly affect task execution time.Existing methods struggle to comprehensively account for all these factors:configuration-based methods ignore the internal execution mechanism of the algorithm;operator-based methods neglect the impact of computation graph structure;graph-based methods either face excessive model complexity with graph neural networks or lose dependency information when simplifying to topological sequences.In view of the deficiencies of the topological sequence methods,this paper proposes the MDOT(Multi-dimensional Operator Transformer) algorithm to convert the computational graph into an operator sequence according to topological sorting.Based on this sequence of operators,MDOT uses Transformer to fuse the three-dimensional information of the operators:operator type,operator configuration,and computational load to perform multi-dimensional operator encoding,more comprehensively modeling the execution characteristics of the operators.Secondly,in order to capture the dependencies of the computational graph,MDOT designs a graph position encoding mechanism,which captures the relationships between operator sequences through the self-attention of the Transformer and models the mutual influence of operators in terms of running time.Experimental results show that MDOT outperforms existing methods in predicting the training time of deep learning tasks,with the mean absolute error and root mean square error being 25% and 45% lower than those of suboptimal models,respectively.

Key words: Cloud computing, Execution time prediction, Deep learning, Operator, Offline task

CLC Number:

TP312

CHEN Yuansheng, CHEN Shunjue, MO Xuan, WU Weigang, LI Jialun. Deep Learning Training Time Prediction Algorithm Integrating Multi-dimensional Operator Features[J].Computer Science, 2026, 53(5): 129-136.

References

[1]WENG Q,XIAO W,YU Y,et al.Mlaas in the wild:Workloadanalysis and scheduling in large-scale heterogeneous GPU clusters[C]//19th USENIX Symposium on Networked Systems Design and Implementation.2022:945-960.
[2]SUBRAMANYA S J,ARFEEN D,LIN S,et al.Sia:Heterogeneity-aware,goodput-optimized ml-cluster scheduling[C]//Proceedings of the 29th Symposium on Operating Systems Principles.2023:642-657.
[3]MOHAN J,PHANISHAYEE A,KULKARNI J,et al.Looking beyond gpus for DNN scheduling on multitenant clusters[C]//16th USENIX Symposium on Operating Systems Design and Implementation.2022:579-596.
[4]GAO W,YE Z,SUN P,et al.Chronus:A novel deadline-aware scheduler for deep learning training jobs[C]//ACM Symposium on Cloud Computing.2021:609-623.
[5]LE T N,SUN X,CHOWDHURY M,et al.Allox:compute allocation in hybrid clusters[C]//Fifteenth European Conference on Computer Systems.2020:31:1-31:16.
[6]GU D,ZHAO Y,ZHONG Y,et al.Elasticflow:An elasticserverless training platform for distributed deep learning[C]//Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2023:266-280.
[7]YANG Z C,WU H,WU Y W,et al.A review of deep learning training task scheduling based on performance modeling[J].Journal of Software,2025,36(4):1570-1589.
[8]HU Q,SUN P,YAN S,et al.Characterization and prediction of deep learning workloads in large-scale GPU datacenters[C]//International Conference for High Performance Computing,Networking,Storage and Analysis.2021.
[9]YANG Z,WU H,XU Y,et al.Hydra:Deadline-aware and efficiency-oriented scheduling for deep learning jobs on heterogeneous gpus[J].IEEE Transactions on Computers,2023,72(8):2224-2236.
[10]YU G X,GAO Y,GOLIKOV P,et al.Habitat:A runtime-based computational performance predictor for deep neural network training[C]//Proceedings of the 2021 USENIX Annual Technical Conference.2021:503-521.
[11]LIU G,WANG S,BAO Y.SEER:A time prediction model for cnns from GPU kernel’s view[C]//30th International Confe-rence on Parallel Architectures and Compilation Techniques.2021:173-185.
[12]WANG C,LIAO Y,KAO M,et al.Perfnet:Platform-aware performance modeling for deep neural networks[C]//International Conference on Research in Adaptive and Convergent Systems.2020:90-95.
[13]LEE S,PHANISHAYEE A,MAHAJAN D.Forecasting GPU performance for deep learning training and inference[C]//Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2025:493-508.
[14]LI Y,SUN Y,JOG A.Path forward beyond simulators:Fast and accurate GPU execution time prediction for DNN workloads[C]//Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture.2023:380-394.
[15]GAO Y,GU X,ZHANG H,et al.Runtime performance prediction for deep learning models with graph neural network[C]//45th IEEE/ACM International Conference on Software Engineering:Software Engineering in Practice.2023:368-380.
[16]YANG G,SHIN C,LEE J,et al.Prediction of the resource consumption of distributed deep learning systems[J].Proceedings of the ACM on Measurement and Analysis of Computing Systems,2022,6(2):29:1-29:25.
[17]YEUNG G,BOROWIEC D,YANG R,et al.Horus:Interfe-rence-aware and prediction-based scheduling in deep learning systems[J].IEEE Transactions on Parallel and Distributed Systems,2022,33(1):88-100.
[18]ZAREMBA W,SUTSKEVER I,VINYALS O.Recurrent neural network regularization[J].arXiv:1409.2329,2014.
[19]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[20]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:5998-6008.
[21]YANG Z,GUO H,WU H,et al.ETS:deep learning trainingiteration time prediction based on execution trace sliding window[C]//Proceedings of the 33rd International Symposium on High Performance Parallel and Distributed Computing.2024:56-68.
[22]ZHU H,PHANISHAYEE A,PEKHIMENKO G.Daydream:Accurately estimating the efficacy of optimizations for DNN training[C]//Proceedings of the 2020 USENIX Annual Technical Conference.2020:337-352.
[23]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24]TAN M,CHEN B,PANG R,et al.Mnasnet:Platform-awareneural architecture search for mobile[C]//IEEE Conference on Computer Vision and Pattern Recognition.2019:2820-2828.
[25]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1106-1114.
[26]LIU Z,MAO H,WU C,et al.A convnet for the 2020s[C]//IEEE Conference on Computer Vision and Pattern Recognition.2022:11966-11976.
[27]ZAGORUYKO S,KOMODAKIS N.Wide residual networks[C]//Proceedings of the British Machine Vision Conference.2016.
[28]TAN M,LE Q V.Efficientnet:Rethinking model scaling forconvolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning.2019:6105-6114.
[29]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//IEEE Confe-rence on Computer Vision and Pattern Recognition.2016:2818-2826.
[30]RADOSAVOVIC I,KOSARAJU R P,GIRSHICK R B,et al.Designing network design spaces[C]//IEEE Conference on Computer Vision and Pattern Recognition.2020:10425-10433.
[31]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[32]IANDOLA F N,HAN S,MOSKEWICZ M W,et al.Sque-ezenet:Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size[J].arXiv:1602.07360,2016.
[33]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//3rd International Conference on Learning Representations.2015.
[34]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[35]PEI Z Q,LI C S,QIN X W,et al.Iteration time prediction for cnn in multi-gpu platform:modeling and analysis[J].IEEE Access,2019,7:64788-64797.
[36]GYEONGSIK Y,SHIN C Y,JEUNGHWAN L,et al.Prediction of the resource consumption of distributed deep learning systems[C]//Proceedings of the ACM on Measurement and Analysis of Computing Systems.2022:1-25.
[37]ZHU H Y,AMAR P,GENNADY P.Daydream:Accuratelyestimating the efficacy of optimizations for {DNN} training[C]//2020 USENIX Annual Technical Conference(USENIX ATC 20).2020:337-352.

Related Articles 15

[1]	GUO Jingchen, YANG Kuiwu, DING Mengdi, WEI Jianghong. Survey of Adversarial Sample Attacks for Vision Transformer [J]. Computer Science, 2026, 53(5): 404-418.
[2]	ZHENG Cheng, BAN Qingqing. Knowledge-assisted and Reinforced Syntax-driven for Aspect-based Sentiment Analysis [J]. Computer Science, 2026, 53(4): 406-414.
[3]	YIN Chuang, LIU Jianyi, ZHANG Ru. Cross-modal Fusion Few-sample Ransomware Classifier:Multimodal Encoding Based on Pre-trained Models [J]. Computer Science, 2026, 53(4): 435-444.
[4]	GAO Tai, REN Yanzhang, WANG Huiqing, LI Ying, WANG Bin. KGMamba:Gene Regulatory Network Prediction Model Based on Kolmogorov-Arnold Network Optimizing Graph Convolutional Network and Mamba [J]. Computer Science, 2026, 53(4): 101-111.
[5]	ZHANG Xueqin, WANG Zhineng, LI Jinsheng, LU Yisong, LUO Fei. Key Node Identification in Temporal Social Networks Based on Deep Learning and Multi-feature Fusion [J]. Computer Science, 2026, 53(4): 143-154.
[6]	GU Bokai, LIU Dun, SUN Yang. STWD-DLFRD:Multi-granularity Fake Review Detection via Sequential Three-way Decisions and Deep Learning [J]. Computer Science, 2026, 53(4): 188-196.
[7]	FU Yukai, LI Qingzhen, DONG Zhixue, SHI Dongli, ZHAO Peng. Pedestrian Re-identification Methods Based on Limited Target Data and Deep Learning [J]. Computer Science, 2026, 53(3): 287-294.
[8]	YU Ding, LI Zhangwei. Prediction Method of RNA Secondary Structure Based on Transformer Architecture [J]. Computer Science, 2026, 53(3): 375-382.
[9]	DU Jiantong, GUAN Zeli, XUE Zhe. Multi-task Learning-based Ophthalmic Video Feature Fusion and Multi-dimensional Profiling [J]. Computer Science, 2026, 53(3): 383-391.
[10]	SU Ruitao, REN Jiongjiong, CHEN Shaozhen. Deep Learning-based Neural Differential Distinguishers for GIFT-128 and ASCON [J]. Computer Science, 2026, 53(3): 453-458.
[11]	LI Zequn, DING Fei. Fatigue Driving Detection Based on Dual-branch Fusion and Segmented Domain AdaptationTransfer Learning [J]. Computer Science, 2026, 53(3): 78-87.
[12]	WEN Jia, WU Shuxia, YU Zhengxin, MIAO Wang, CHEN Zheyi. Multi-objective Optimization for Virtual Machine Placement in Large-scale Hadoop Cluster [J]. Computer Science, 2026, 53(2): 387-395.
[13]	XI Penghui, WU Xiazhen, JIANG Wencong, FANG Liangda, HE Chaobo, GUAN Quanlong. Review of Personalized Educational Resource Recommendations [J]. Computer Science, 2026, 53(2): 1-15.
[14]	HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[15]	LIU Chenhong, LI Fenglian, YANG Jia, WANG Suzhe, CHEN Guijun. Boundary-focused Multi-scale Feature Fusion Network for Stroke Lesion Segmentation [J]. Computer Science, 2026, 53(2): 264-272.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Deep Learning Training Time Prediction Algorithm Integrating Multi-dimensional Operator Features

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0