稀疏异质多智能体环境下基于强化学习的课程学习框架

doi:10.11896/jsjkx.230500146

Abstract

Abstract: The battlefield of modern warfare is large and has a variety of units,and the use of multi-agent reinforcement learning(MARL) in battlefield simulation can enhance the collaborative decision-making ability among combat units and thus improve combat effectiveness.Current applications of Multi-agent reinforcement learning(MARL) in military simulation often rely on two simplifications:the homogeneity of agents and dense distribution of combat units,real-world warfare scenarios may not always adhere to these assumptions and may include various heterogeneous agents and sparsely distributed combat units.In order to explore the potential applications of reinforcement learning in a wider range of scenarios,this paper proposes improvements in these two aspects.Firstly,a multi-scale multi-agent amphibious landing environment(M2ALE) is designed to address the simplifications,incorporating various heterogeneous agents and scenarios with sparsely distributed combat units.These complex settings exacerbate the exploration difficulty and non-stationarity of multi-agent environments,making it difficult to train with commonly used multi-agent algorithms.Secondly,a heterogeneous multi-agent curriculum learning framework(HMACL) is proposed to address the challenges in the M2ALE environment.HMACL consists of three modules:source task generating(STG) module,class policy improving(CPI) module,and Trainer module.The STG module generates source tasks to guide agent training,while the CPI module proposes a class-based parameter sharing strategy to mitigate the non-stationarity of the multi-agent system and implement parameter sharing in a heterogeneous agent system.The Trainer module trains the latest policy using any MARL algorithm with the source tasks generated by the STG and the latest policy from the CPI.HMACL can alleviate the exploration difficulty and non-stationarity issues of commonly used MARL algorithms in the M2ALE environment and guide the learning process of the multi-agent system.Experiments show that using HMACL significantly improves the sampling efficiency and final performance of MARL algorithms in the M2ALE environment.

Key words: Multi-agent reinforcement learning, Combat simulation, Curriculum learning, Parameter sharing, Multi-agent environment design

CLC Number:

TP183

LUO Ruiqing, ZENG Kun, ZHANG Xinjing. Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments[J].Computer Science, 2024, 51(1): 301-309.

References

[1]MORDATCH I,ABBEEL P.Emergence of grounded compositional language in multi-agent populations[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[2]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019.
[3]TERRY J,BLACK B,GRAMMEL N,et al.Pettingzoo:Gym for multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2021,34:15032-15043.
[4]WANG B H,WU T Y,LI W H,et al.Large-scale UAVs Confrontation Based on Multi-agent Reinforcement Learning[J].Journal of System Simulation,2021,33(8):1739-1753.
[5]ZHENG L,YANG J,CAI H,et al.Magent:A many-agent reinforcement learning platform for artificial collective intelligence[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[6]LI Y.Deep reinforcement learning:An overview[J].arXiv:1701.07274,2017.
[7]WATKINS C J C H.Learning from delayed rewards[J/OL].https://d1wqtxts1xzle7.cloudfront.net/50360235/Learning_from_delayed_rewards_20161116-28282-v2pwvq-libre.pdf?1479337768=&response-content-disposition=inline%3B+fi-lename%3DLearning_from_delayed_rewards.pdf&Expires=1697437463&Signature=DTEgpQ1CwNQSh73wVPYhujim-RY-brTt06a6MNFrAhzcQnOQ8jPb5K8AbuSd4o5HwMabnNv0N7-weYKFszXWSgDgnHC73-jwDsGIT3KhsE9wbR8H1PUyqXlR-lkr~kapd2K5NF~yj92hGkbtHxVT5YCm4t8bC3LFSMZvrd-D0i5z1AIgd97DF94bUdJ-YoR9-Ag6eaADJWZmow6WKki8oKhAvyGoOY9~pJi94w4dKLww-IqnrGNhiSCCITANWMVeH7rc5x-1MsDfd1iP31vWrdlDpF71nn1uh28tm35rr03HmESv4Tbnt-RxG410d4E7QeUe31ItR8Htrq5CWiIITEnulLBcg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA.
[8]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[9]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
[10]TERRY J K,GRAMMEL N,HARI A,et al.Revisiting parameter sharing in multi-agent deep reinforcement learning[J].ar-Xiv:2005.13625,2020.
[11]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//Autonomous Agents and Multiagent Systems:AAMAS 2017 Workshops,Best Papers,São Paulo,Brazil,May 8-12,2017,Revised Selected Papers 16.Springer International Publishing,2017:66-83.
[12]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN M A,et al.Scaling multi-agent reinforcement learning with selective para-meter sharing[C]//International Conference on Machine Lear-ning.PMLR,2021:1989-1998.
[13]DORRI A,KANHERE S S,JURDAK R.Multi-agent systems:A survey[J].IEEEAccess,2018,6:28573-28593.
[14]ZHENG Y,ZHU Y,WANG L.Consensus of heterogeneousmulti-agent systems[J].IET Control Theory & Applications,2011,5(16):1881-1888.
[15]PORTELAS R,COLAS C,WENG L,et al.Automatic curriculum learning for deep rl:A short survey[J].arXiv:2003.04664,2020.
[16]LIU I J,JAIN U,YEH R A,et al.Cooperative exploration for multi-agent deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2021:6826-6836.
[17]DENNIS M,JAQUES N,VINITSKY E,et al.Emergent complexity and zero-shot transfer via unsupervised environment design[J].Advances in Neural Information Processing Systems,2020,33:13049-13061.
[18]YU W W,YANG X Y,LI H C,et al.Attentional Intention and Communication for Multi-Agent Learning[J].Acta Automatica Sinica,2021,47:1-16.
[19]ZANG R,WANG L,SHI T F.Multiagent reinforcement lear-ning based on attentional message sharing[J].Journal of Compu-ter Applications,2022,42(11):3346-3353.
[20]ZHAO Y P,FAN Z J.Research into The Evaluation Method of Naval Warfare Based on Simulation Deduction[J].Shipboard Electronic Countermeasure,2019,42(3):1-4.
[21]XIAO Z,ZHANG S Y.Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games[J].Journal of Software,2008,19(11):2957-2967.
[22]DU H W,CUI M L,HAN T,et al.Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J].Journal of Beijing University of Aeronautics and Astronautics,2018,44(11):2247-2256.
[24]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[25]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.1993:330-337.
[26]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937.
[27]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[28]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Advances in Neural Information Processing Systems.2017.
[29]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624.
[30]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[31]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[32]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].The Journal of Machine Learning Research,2020,21(1):7234-7284.
[33]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN M A,et al.Scaling multi-agent reinforcement learning with selective para-meter sharing[C]//International Conference on Machine Learning.PMLR,2021:1989-1998.
[34]NARVEKAR S,SINAPOV J,LEONETTI M,et al.Source task creation for curriculum learning[C]//Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems.2016:566-574.
[35]DENNIS M,JAQUES N,VINITSKY E,et al.Emergent com-plexity and zero-shot transfer via unsupervised environment design[J].Advances in Neural Information Processing Systems,2020,33:13049-13061.
[36]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015.
[37]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hindsight experience replay[C]//Advances in Neural Information Processing Systems.2017.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 11

Metrics

Comments

Recommended 0

[1]	HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[2]	XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
[3]	SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290.
[4]	XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[5]	LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[6]	HUANG Feihu, SHUAI Jianbo, PENG Jian. Collaborative Recommendation Based on Curriculum Learning and Graph Embedding [J]. Computer Science, 2023, 50(11A): 221100030-8.
[7]	LIN Zeyang, LAI Jun, CHEN Xiliang, WANG Jun. UAV Anti-tank Policy Training Model Based on Curriculum Reinforcement Learning [J]. Computer Science, 2023, 50(10): 214-222.
[8]	RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252.
[9]	SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[10]	YU Dong, XIE Wan-ying, GU Shu-hao, FENG Yang. Similarity-based Curriculum Learning for Multilingual Neural Machine Translation [J]. Computer Science, 2022, 49(1): 24-30.
[11]	DU Wei, DING Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(8): 1-8.