Computer Science ›› 2024, Vol. 51 ›› Issue (1): 301-309.doi: 10.11896/jsjkx.230500146

• Artificial Intelligence • Previous Articles     Next Articles

Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments

LUO Ruiqing1, ZENG Kun1, ZHANG Xinjing2   

  1. 1 School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China
    2 91976 Unit,People’s Liberation Army of China,Guangzhou 510430,China
  • Received:2023-05-22 Revised:2023-09-20 Online:2024-01-15 Published:2024-01-12
  • About author:LUO Ruiqing,born in 1995,postgra-duate.His main research interests include machine learning and reinforcement learning.ZENG Kun,born in 1982,Ph.D,asso-ciate professor.His main research in-terests include computer vision,machine learning,and graphics.
  • Supported by:
    National Natural Science Foundation of China(U1711266) and Guangdong Basic and Applied Basic Research Foundation(2019A1515011078).

Abstract: The battlefield of modern warfare is large and has a variety of units,and the use of multi-agent reinforcement learning(MARL) in battlefield simulation can enhance the collaborative decision-making ability among combat units and thus improve combat effectiveness.Current applications of Multi-agent reinforcement learning(MARL) in military simulation often rely on two simplifications:the homogeneity of agents and dense distribution of combat units,real-world warfare scenarios may not always adhere to these assumptions and may include various heterogeneous agents and sparsely distributed combat units.In order to explore the potential applications of reinforcement learning in a wider range of scenarios,this paper proposes improvements in these two aspects.Firstly,a multi-scale multi-agent amphibious landing environment(M2ALE) is designed to address the simplifications,incorporating various heterogeneous agents and scenarios with sparsely distributed combat units.These complex settings exacerbate the exploration difficulty and non-stationarity of multi-agent environments,making it difficult to train with commonly used multi-agent algorithms.Secondly,a heterogeneous multi-agent curriculum learning framework(HMACL) is proposed to address the challenges in the M2ALE environment.HMACL consists of three modules:source task generating(STG) module,class policy improving(CPI) module,and Trainer module.The STG module generates source tasks to guide agent training,while the CPI module proposes a class-based parameter sharing strategy to mitigate the non-stationarity of the multi-agent system and implement parameter sharing in a heterogeneous agent system.The Trainer module trains the latest policy using any MARL algorithm with the source tasks generated by the STG and the latest policy from the CPI.HMACL can alleviate the exploration difficulty and non-stationarity issues of commonly used MARL algorithms in the M2ALE environment and guide the learning process of the multi-agent system.Experiments show that using HMACL significantly improves the sampling efficiency and final performance of MARL algorithms in the M2ALE environment.

Key words: Multi-agent reinforcement learning, Combat simulation, Curriculum learning, Parameter sharing, Multi-agent environment design

CLC Number: 

  • TP183
[1]MORDATCH I,ABBEEL P.Emergence of grounded compositional language in multi-agent populations[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[2]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019.
[3]TERRY J,BLACK B,GRAMMEL N,et al.Pettingzoo:Gym for multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2021,34:15032-15043.
[4]WANG B H,WU T Y,LI W H,et al.Large-scale UAVs Confrontation Based on Multi-agent Reinforcement Learning[J].Journal of System Simulation,2021,33(8):1739-1753.
[5]ZHENG L,YANG J,CAI H,et al.Magent:A many-agent reinforcement learning platform for artificial collective intelligence[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[6]LI Y.Deep reinforcement learning:An overview[J].arXiv:1701.07274,2017.
[7]WATKINS C J C H.Learning from delayed rewards[J/OL].https://d1wqtxts1xzle7.cloudfront.net/50360235/Learning_from_delayed_rewards_20161116-28282-v2pwvq-libre.pdf?1479337768=&response-content-disposition=inline%3B+fi-lename%3DLearning_from_delayed_rewards.pdf&Expires=1697437463&Signature=DTEgpQ1CwNQSh73wVPYhujim-RY-brTt06a6MNFrAhzcQnOQ8jPb5K8AbuSd4o5HwMabnNv0N7-weYKFszXWSgDgnHC73-jwDsGIT3KhsE9wbR8H1PUyqXlR-lkr~kapd2K5NF~yj92hGkbtHxVT5YCm4t8bC3LFSMZvrd-D0i5z1AIgd97DF94bUdJ-YoR9-Ag6eaADJWZmow6WKki8oKhAvyGoOY9~pJi94w4dKLww-IqnrGNhiSCCITANWMVeH7rc5x-1MsDfd1iP31vWrdlDpF71nn1uh28tm35rr03HmESv4Tbnt-RxG410d4E7QeUe31ItR8Htrq5CWiIITEnulLBcg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA.
[8]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[9]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
[10]TERRY J K,GRAMMEL N,HARI A,et al.Revisiting parameter sharing in multi-agent deep reinforcement learning[J].ar-Xiv:2005.13625,2020.
[11]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//Autonomous Agents and Multiagent Systems:AAMAS 2017 Workshops,Best Papers,São Paulo,Brazil,May 8-12,2017,Revised Selected Papers 16.Springer International Publishing,2017:66-83.
[12]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN M A,et al.Scaling multi-agent reinforcement learning with selective para-meter sharing[C]//International Conference on Machine Lear-ning.PMLR,2021:1989-1998.
[13]DORRI A,KANHERE S S,JURDAK R.Multi-agent systems:A survey[J].IEEEAccess,2018,6:28573-28593.
[14]ZHENG Y,ZHU Y,WANG L.Consensus of heterogeneousmulti-agent systems[J].IET Control Theory & Applications,2011,5(16):1881-1888.
[15]PORTELAS R,COLAS C,WENG L,et al.Automatic curriculum learning for deep rl:A short survey[J].arXiv:2003.04664,2020.
[16]LIU I J,JAIN U,YEH R A,et al.Cooperative exploration for multi-agent deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2021:6826-6836.
[17]DENNIS M,JAQUES N,VINITSKY E,et al.Emergent complexity and zero-shot transfer via unsupervised environment design[J].Advances in Neural Information Processing Systems,2020,33:13049-13061.
[18]YU W W,YANG X Y,LI H C,et al.Attentional Intention and Communication for Multi-Agent Learning[J].Acta Automatica Sinica,2021,47:1-16.
[19]ZANG R,WANG L,SHI T F.Multiagent reinforcement lear-ning based on attentional message sharing[J].Journal of Compu-ter Applications,2022,42(11):3346-3353.
[20]ZHAO Y P,FAN Z J.Research into The Evaluation Method of Naval Warfare Based on Simulation Deduction[J].Shipboard Electronic Countermeasure,2019,42(3):1-4.
[21]XIAO Z,ZHANG S Y.Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games[J].Journal of Software,2008,19(11):2957-2967.
[22]DU H W,CUI M L,HAN T,et al.Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J].Journal of Beijing University of Aeronautics and Astronautics,2018,44(11):2247-2256.
[24]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[25]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.1993:330-337.
[26]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937.
[27]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[28]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems.2017.
[29]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624.
[30]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[31]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[32]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
[33]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN M A,et al.Scaling multi-agent reinforcement learning with selective para-meter sharing[C]//International Conference on Machine Learning.PMLR,2021:1989-1998.
[34]NARVEKAR S,SINAPOV J,LEONETTI M,et al.Source task creation for curriculum learning[C]//Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems.2016:566-574.
[35]DENNIS M,JAQUES N,VINITSKY E,et al.Emergent com-plexity and zero-shot transfer via unsupervised environment design[J].Advances in Neural Information Processing Systems,2020,33:13049-13061.
[36]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015.
[37]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hindsight experience replay[C]//Advances in Neural Information Processing Systems.2017.
[1] XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[2] LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[3] HUANG Feihu, SHUAI Jianbo, PENG Jian. Collaborative Recommendation Based on Curriculum Learning and Graph Embedding [J]. Computer Science, 2023, 50(11A): 221100030-8.
[4] LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun. UAV Anti-tank Policy Training Model Based on Curriculum Reinforcement Learning [J]. Computer Science, 2023, 50(10): 214-222.
[5] RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252.
[6] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[7] YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[8] DU Wei, DING Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(8): 1-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!