计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241000036-5.doi: 10.11896/jsjkx.241000036

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于深度神经网络的大样本作战仿真资源分配方法

叶帅, 李豪, 史培腾, 黄昱霖   

  1. 军事科学院战争研究院 北京 100091
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 李豪(lihao07@nudt.edu.cn)
  • 作者简介:yeshuai09@nudt.edu.cn
  • 基金资助:
    国家自然科学基金青年科学基金(62102446)

Deep Neural Network-based Resource Allocation for Large-scale Operation Simulation

YE Shuai, LI Hao, SHI Peiteng, HUANG Yulin   

  1. Academy of Military Science,Beijing 100091,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(62102446).

摘要: 随着人工智能的发展,作战实验呈现智能化趋势。大样本仿真是开展智能化作战实验的重要支撑,是解决作战实验变量因子多、组合复杂等问题的有效手段,具有样本数量大、速率要求高的特点。海量仿真样本的高速运行依赖于高性能硬件集群的高效调度,面临样本计算资源需求差异大、人工分配难的问题。如何精准预测并动态分配各个样本所需的计算资源,是提高大样本仿真效率的关键。为此,提出了一种基于深度神经网络(DNN)的大样本作战仿真计算资源预测模型。该方法首先构建了深度神经网络在环的仿真资源管理架构。其次,对作战仿真样本文件进行特征提取和学习构建深度神经网络预测模型。在大样本仿真运行时,通过在线预测每个样本所需的计算资源,实现海量作战仿真作业资源的精准预测与动态分配。测试结果表明,在千级样本的典型作战实验仿真场景中,相比于传统配置方法,提出的预测模型在10个高性能服务器节点上的完成时间减少了20.8%。

关键词: 深度神经网络, 大样本仿真, 资源预测, 集群管理

Abstract: With the development of artificial intelligence,operation experiments tend to be intelligent.Large-scale operation simulation is an important support for conducting intelligent operation experiments and an effective means to solve problems such as multiple variables and complex combinations in operation experiments.It has the characteristics of large sample size and high speed requirements.The high-speed operation of massive simulation samples depends on the efficient scheduling of high-perfor-mance hardware clusters,which faces the problems of large differences in computing resource requirements and difficult manual allocation.How to accurately predict and dynamically allocate the resources required for each sample is the key to improving the efficiency of large-scale simulation.This paper proposes a deep neural network(DNN)-based resource prediction model for large-scale operation simulation.The method firstly constructs a deep neural network in-loop simulation resource management architecture.Secondly,it constructs a deep neural network prediction model by extracting features and learning from combat simulation sample files.During the operation of large-scale simulation,it achieves accurate prediction and dynamic allocation of massive ope-ration simulation job resources by online predicting the computing resources required for each sample.Test results show that in a typical operation experiment simulation scenario with thousands of samples,theproposed prediction model reduces the completion time by 20.8% on 10 high-performance server nodes compared to traditional configuration methods.

Key words: Deep neural network, Large-scale simulation, Resource prediction, Cluster management

中图分类号: 

  • TP391.9
[1]LI G J.AI4R:The fifth scientific research paradigm[J].Bulletin of Chinese Academy of Sciences,2024,39(1):1-9.
[2]WANG J,CHEN R,TANG L.Research on the technology of Army intelligent combat exercise with large sample[J].National Defense Science and Technology,2020,41(1):41-44.
[3]WU X P,CAO D,DONG S,et al.Large-sample simulation design and verification of system simulation platform[J/OL].Computer Measurement and Control,1-8[2024-05-17].http://kns.cnki.net/kcms/detail/11.4762.TP.20240221.2108.006.html.
[4]LU W R.Research and improvement of container cloud resource scheduling strategy based onKubernetes[D].Hangzhou:Zhejiang Sci-Tech University,2023.
[5]JASAK,ALEKSANDAR J H,TUKOVIĆ,et al.OpenFOAM:A C++ Library for Complex Physics Simulations[C]//Coupled Methods in Numerical Dyamics.2007.
[6]ZHAO Z,et al.Design of general CFD software PHengLEI [J].Computer Engineering & Science,2020,42(2):210-219.
[7]LIU H,DONG X Y,YANG Z H.BiGRU-LGB cloud load prediction model based on stacking framework[J].Journal of Xidian University,2023,50(3):83-94,104.
[8]THINAKARAN P,GUNASEKARAN J R,SHARMA B,et al.Kube-Knots:Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters[C]//2019 IEEE International Conference on Cluster Computing(CLUSTER).2019:1-13.
[9]TANG S,LEE B S,HE B.Fair Resource Allocation for Data-Intensive Computing in the Cloud[J].IEEE Transactions on Ser-vices Computing,2018,11(1):20-33.
[10]CHEN Z,HU J,MIN G,et al.Adaptive and Efficient Resource Allocation in Cloud Datacenters Using Actor-Critic Deep Reinforcement Learning[C]//IEEE Transactions on Parallel and Distributed Systems.2022:1911-1923.
[11]HU Z,LI D,ZHANG D,et al.ReLoca:Optimize Resource Al-location for Data-parallel Jobs using Deep Learning[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications.2020:1163-1171.
[12]CHEN S Y,ZHUANG Y,LI J.Multiple load prediction model for mobile cloud computing based on LSTM network[J].Computer and Modernization,2021(6):74-85.
[13]LIANG R H,XIE X L,ZHAI Q H,et al.Research on container cloud load prediction based on improved stacking ensemble mo-del[J].Computer Applications & Software,2023,40(12):48-55,100.
[14]PASZKE A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Advances in Neural Information Processing Systems.2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!