计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 214-222.doi: 10.11896/jsjkx.220700121
林泽阳, 赖俊, 陈希亮, 王军
LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun
摘要: 智能化时代,陆战场的争夺从平面制陆向立体制陆拓展,无人机反坦克作战在未来智能化战争制陆权的争夺中起着至关重要的作用。针对深度强化学习方法在复杂问题求解中面临的决策空间爆炸、奖励稀疏等问题,提出了一种基于VDN的动态多智能体课程学习方法。该方法在多智能体深度强化学习的训练过程中加入课程学习方法,结合Stein变分梯度下降算法改善课程学习的学习过程,解决了强化学习在复杂任务中初始训练效果差、训练时间长和收敛难的问题,并在多智能体粒子环境和无人机反坦克作战场景中分别构建了课程学习模型,实现了模型与训练先验知识从易到难的迁移。实验结果表明,通过课程学习DyMA-CL机制对强化学习训练过程进行改善,强化学习智能体在进行困难任务学习时能够获得更好的初始训练效果和更快的模型收敛速度,从而得到更好的最终效果。
中图分类号:
[1]WANG Y C,QI W H,XU L Z.Security collaboration of UAV cluster based on blockchain [J].Computer Science,2021,48(S2):528-532,546. [2]SUN Y,LI Q W,XU Z X,CHEN X L.Air Combat Game Trai-ning Model based on Multi-agent Deep Reinforcement Learning [J].Command Information System and Technology,2021,12(2):16-20. [3]FOGLINO F,CHRISTAKOU C C,GUTIERREZ R L,et al.Curriculum learning for cumulative return maximization[J].arXiv:1906.06178,2019. [4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [5]FANG M,ZHOU T,DU Y,et al.Curriculum-guided hindsight experience replay[J].Advances in Neural Information Proces-sing Systems,2019,32(3):33-39. [6]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [7]FAN X L,LI D,ZHANG W,et al.Research on Missile Evasive Decision Training Based on Deep Reinforcement Learning [J].Electronics Optics & Control,2021,28(1):81-85. [8]HOSTALLERO W J K D E,SON K,KIM D,et al.Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 31st International Conference on Machine Learning.PMLR,2019:532-546. [9]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095. [10]CELLI A,CICCONE M,BONGO R,et al.Coordination in ad-versarial sequential team games via multi-agent deep reinforcement learning[J].arXiv:1912.07712,2019. [11]FU Q,WANG G,LU W C,et al.Exploration and Practice of Intelligent Accusation of Air Defense and Anti-missile[C]//Proceedings of the 8th China Command and Control Conference.Chinese Society for Command and Control,2020:321-334. [12]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017. [13]WANG X,CHEN Y,ZHU W.A Survey on Curriculum Lear-ning[J].arXiv:2010.13166,2020. [14]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.ICML,2009:41-48. [15]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020,34(5):7293-7300. [16]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:An overview[J].Innovations in Multi-agent Systems and Applications,2010,26(2):183-221. [17]LIU Q,WANG D.Stein variational gradient descent:A general purpose bayesian inference algorithm[J].Advances in Neural Information Processing Systems,2016,29(3):19-26. [18]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,2018. [19]PUTERMAN M L.Markov decision processes:discrete stochastic dynamic programming[M].New Jersey:John Wiley & Sons,2014. [20]SZEPESVÁRI C.Algorithms for reinforcement learning[J].Synthesis Lectures on Artificial Intelligence and Machine Lear-ning,2010,4(1):1-103. [21]YU Q,IKAMI D,IRIE G,et al.Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning[C]//European Conference on Computer Vision.PMLR,2020:438-454. [22]ZHANG M,QIN H,LAN M,et al.A high fidelity simulator for a quadrotor UAV using ROS and Gazebo[C]// 41st Annual Conference of the IEEE Industrial Electronics Society(IECON 2015).IEEE,2015:2846-2851. [23]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30(3):65-78. |
|