Computer Science ›› 2023, Vol. 50 ›› Issue (10): 214-222.doi: 10.11896/jsjkx.220700121

• Artificial Intelligence • Previous Articles     Next Articles

UAV Anti-tank Policy Training Model Based on Curriculum Reinforcement Learning

LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun   

  1. College of Command and Control Engineering,Army Engineering University, Nanjing 210007,China
  • Received:2022-07-12 Revised:2022-11-15 Online:2023-10-10 Published:2023-10-10
  • About author:LIN Zeyang,born in 1995,postgra-duate.His main research interests include deep reinforcement learning and curriculum learning.LAI Jun,born in 1979,postgraduate,associate professor.His main research interests include artificial intelligence and intelligent command and control.
  • Supported by:
    National Natural Science Foundation of China(61806221).

Abstract: In the intelligent era,the battle for land battlefield expands from planar land control to vertical land control.UAV anti-tank operation plays a crucial role in the battle for land control in future intelligent war.Deep reinforcement learning method in complex problem solving are faced with problems such as decision space explosion and sparse reward,this paper puts forward a dynamic multi-agent curriculum learning method based on VDN,the curriculum learning method is added into the training process of multi-agent deep reinforcement learning in this method,and combined with Stein variational gradient descent algorithm to improve the curriculum learning process.The problems of poor initial training effect,long training time and difficult convergence of reinforcement learning in complex tasks are solved.In addition,the curriculum learning model is constructed in the multi-agent particle environment and UAV anti-tank combat scene respectively,and the transfer of the model and training prior knowledge from easy to difficult is realized.Experimental results show that the curriculum learning DyMA-CL mechanism can improve the reinforcement learning training process,and the reinforcement learning agent can obtain better initial training effect,model convergence speed and final effect when conducting difficult task learning.

Key words: Deep reinforcement learning, Curriculum learning, Stein variational gradient descent, Unmanned aerial vehicle, Anti-tank

CLC Number: 

  • TP181
[1]WANG Y C,QI W H,XU L Z.Security collaboration of UAV cluster based on blockchain [J].Computer Science,2021,48(S2):528-532,546.
[2]SUN Y,LI Q W,XU Z X,CHEN X L.Air Combat Game Trai-ning Model based on Multi-agent Deep Reinforcement Learning [J].Command Information System and Technology,2021,12(2):16-20.
[3]FOGLINO F,CHRISTAKOU C C,GUTIERREZ R L,et al.Curriculum learning for cumulative return maximization[J].arXiv:1906.06178,2019.
[4]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[5]FANG M,ZHOU T,DU Y,et al.Curriculum-guided hindsight experience replay[J].Advances in Neural Information Proces-sing Systems,2019,32(3):33-39.
[6]SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[7]FAN X L,LI D,ZHANG W,et al.Research on Missile Evasive Decision Training Based on Deep Reinforcement Learning [J].Electronics Optics & Control,2021,28(1):81-85.
[8]HOSTALLERO W J K D E,SON K,KIM D,et al.Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 31st International Conference on Machine Learning.PMLR,2019:532-546.
[9]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095.
[10]CELLI A,CICCONE M,BONGO R,et al.Coordination in ad-versarial sequential team games via multi-agent deep reinforcement learning[J].arXiv:1912.07712,2019.
[11]FU Q,WANG G,LU W C,et al.Exploration and Practice of Intelligent Accusation of Air Defense and Anti-missile[C]//Proceedings of the 8th China Command and Control Conference.Chinese Society for Command and Control,2020:321-334.
[12]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[13]WANG X,CHEN Y,ZHU W.A Survey on Curriculum Lear-ning[J].arXiv:2010.13166,2020.
[14]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.ICML,2009:41-48.
[15]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2020,34(5):7293-7300.
[16]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:An overview[J].Innovations in Multi-agent Systems and Applications,2010,26(2):183-221.
[17]LIU Q,WANG D.Stein variational gradient descent:A general purpose bayesian inference algorithm[J].Advances in Neural Information Processing Systems,2016,29(3):19-26.
[18]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT Press,2018.
[19]PUTERMAN M L.Markov decision processes:discrete stochastic dynamic programming[M].New Jersey:John Wiley & Sons,2014.
[20]SZEPESVÁRI C.Algorithms for reinforcement learning[J].Synthesis Lectures on Artificial Intelligence and Machine Lear-ning,2010,4(1):1-103.
[21]YU Q,IKAMI D,IRIE G,et al.Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning[C]//European Conference on Computer Vision.PMLR,2020:438-454.
[22]ZHANG M,QIN H,LAN M,et al.A high fidelity simulator for a quadrotor UAV using ROS and Gazebo[C]// 41st Annual Conference of the IEEE Industrial Electronics Society(IECON 2015).IEEE,2015:2846-2851.
[23]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30(3):65-78.
[1] LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317.
[2] LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356.
[3] JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67.
[4] XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[5] WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[6] ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216.
[7] YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[8] XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[9] Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[10] ZHENG Hongqiang, ZHANG Jianshan, CHEN Xing. Deployment Optimization and Computing Offloading of Space-Air-Ground Integrated Mobile Edge Computing System [J]. Computer Science, 2023, 50(2): 69-79.
[11] PENG Yingxuan, SHI Dianxi, YANG Huanhuan, HU Haomeng, YANG Shaowu. Intention-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2023, 50(10): 156-164.
[12] HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204.
[13] ZHANG Qiyang, CHEN Xiliang, ZHANG Qiao. Sparse Reward Exploration Method Based on Trajectory Perception [J]. Computer Science, 2023, 50(1): 262-269.
[14] WEI Nan, WEI Xianglin, FAN Jianhua, XUE Yu, HU Yongyang. Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model [J]. Computer Science, 2023, 50(1): 351-361.
[15] JIAN Qi-rui, CHEN Ze-mao, WU Xiao-kang. Authentication and Key Agreement Protocol for UAV Communication [J]. Computer Science, 2022, 49(8): 306-313.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!