计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 214-222.doi: 10.11896/jsjkx.220700121

• 人工智能 • 上一篇    下一篇

基于课程强化学习的无人机反坦克策略训练模型

林泽阳, 赖俊, 陈希亮, 王军   

  1. 陆军工程大学指挥控制工程学院 南京210007
  • 收稿日期:2022-07-12 修回日期:2022-11-15 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 赖俊(2568754202@qq.com)
  • 作者简介:(libseplzy@foxmail.com)
  • 基金资助:
    国家自然科学基金(61806221)

UAV Anti-tank Policy Training Model Based on Curriculum Reinforcement Learning

LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun   

  1. College of Command and Control Engineering,Army Engineering University, Nanjing 210007,China
  • Received:2022-07-12 Revised:2022-11-15 Online:2023-10-10 Published:2023-10-10
  • About author:LIN Zeyang,born in 1995,postgra-duate.His main research interests include deep reinforcement learning and curriculum learning.LAI Jun,born in 1979,postgraduate,associate professor.His main research interests include artificial intelligence and intelligent command and control.
  • Supported by:
    National Natural Science Foundation of China(61806221).

摘要: 智能化时代,陆战场的争夺从平面制陆向立体制陆拓展,无人机反坦克作战在未来智能化战争制陆权的争夺中起着至关重要的作用。针对深度强化学习方法在复杂问题求解中面临的决策空间爆炸、奖励稀疏等问题,提出了一种基于VDN的动态多智能体课程学习方法。该方法在多智能体深度强化学习的训练过程中加入课程学习方法,结合Stein变分梯度下降算法改善课程学习的学习过程,解决了强化学习在复杂任务中初始训练效果差、训练时间长和收敛难的问题,并在多智能体粒子环境和无人机反坦克作战场景中分别构建了课程学习模型,实现了模型与训练先验知识从易到难的迁移。实验结果表明,通过课程学习DyMA-CL机制对强化学习训练过程进行改善,强化学习智能体在进行困难任务学习时能够获得更好的初始训练效果和更快的模型收敛速度,从而得到更好的最终效果。

关键词: 深度强化学习, 课程学习, Stein变分梯度下降, 无人机, 反坦克

Abstract: In the intelligent era,the battle for land battlefield expands from planar land control to vertical land control.UAV anti-tank operation plays a crucial role in the battle for land control in future intelligent war.Deep reinforcement learning method in complex problem solving are faced with problems such as decision space explosion and sparse reward,this paper puts forward a dynamic multi-agent curriculum learning method based on VDN,the curriculum learning method is added into the training process of multi-agent deep reinforcement learning in this method,and combined with Stein variational gradient descent algorithm to improve the curriculum learning process.The problems of poor initial training effect,long training time and difficult convergence of reinforcement learning in complex tasks are solved.In addition,the curriculum learning model is constructed in the multi-agent particle environment and UAV anti-tank combat scene respectively,and the transfer of the model and training prior knowledge from easy to difficult is realized.Experimental results show that the curriculum learning DyMA-CL mechanism can improve the reinforcement learning training process,and the reinforcement learning agent can obtain better initial training effect,model convergence speed and final effect when conducting difficult task learning.

Key words: Deep reinforcement learning, Curriculum learning, Stein variational gradient descent, Unmanned aerial vehicle, Anti-tank

中图分类号: 

  • TP181
[1]WANG Y C,QI W H,XU L Z.Security collaboration of UAV cluster based on blockchain [J].Computer Science,2021,48(S2):528-532,546.
[2]SUN Y,LI Q W,XU Z X,CHEN X L.Air Combat Game Trai-ning Model based on Multi-agent Deep Reinforcement Learning [J].Command Information System and Technology,2021,12(2):16-20.
[3]FOGLINO F,CHRISTAKOU C C,GUTIERREZ R L,et al.Curriculum learning for cumulative return maximization[J].arXiv:1906.06178,2019.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[5]FANG M,ZHOU T,DU Y,et al.Curriculum-guided hindsight experience replay[J].Advances in Neural Information Proces-sing Systems,2019,32(3):33-39.
[6]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[7]FAN X L,LI D,ZHANG W,et al.Research on Missile Evasive Decision Training Based on Deep Reinforcement Learning [J].Electronics Optics & Control,2021,28(1):81-85.
[8]HOSTALLERO W J K D E,SON K,KIM D,et al.Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 31st International Conference on Machine Learning.PMLR,2019:532-546.
[9]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095.
[10]CELLI A,CICCONE M,BONGO R,et al.Coordination in ad-versarial sequential team games via multi-agent deep reinforcement learning[J].arXiv:1912.07712,2019.
[11]FU Q,WANG G,LU W C,et al.Exploration and Practice of Intelligent Accusation of Air Defense and Anti-missile[C]//Proceedings of the 8th China Command and Control Conference.Chinese Society for Command and Control,2020:321-334.
[12]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[13]WANG X,CHEN Y,ZHU W.A Survey on Curriculum Lear-ning[J].arXiv:2010.13166,2020.
[14]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.ICML,2009:41-48.
[15]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020,34(5):7293-7300.
[16]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:An overview[J].Innovations in Multi-agent Systems and Applications,2010,26(2):183-221.
[17]LIU Q,WANG D.Stein variational gradient descent:A general purpose bayesian inference algorithm[J].Advances in Neural Information Processing Systems,2016,29(3):19-26.
[18]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,2018.
[19]PUTERMAN M L.Markov decision processes:discrete stochastic dynamic programming[M].New Jersey:John Wiley & Sons,2014.
[20]SZEPESVÁRI C.Algorithms for reinforcement learning[J].Synthesis Lectures on Artificial Intelligence and Machine Lear-ning,2010,4(1):1-103.
[21]YU Q,IKAMI D,IRIE G,et al.Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning[C]//European Conference on Computer Vision.PMLR,2020:438-454.
[22]ZHANG M,QIN H,LAN M,et al.A high fidelity simulator for a quadrotor UAV using ROS and Gazebo[C]// 41st Annual Conference of the IEEE Industrial Electronics Society(IECON 2015).IEEE,2015:2846-2851.
[23]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30(3):65-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!