基于观测重构的多智能体强化学习方法

doi:10.11896/jsjkx.230600055

计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 280-290.doi: 10.11896/jsjkx.230600055

基于观测重构的多智能体强化学习方法

史殿习^1,3, 胡浩萌^2,3, 宋林娜^2,3, 杨焕焕^2,3, 欧阳倩滢^1,3, 谭杰夫³, 陈莹⁴

1 智能博弈与决策实验室北京100091
2 国防科技大学计算机学院长沙410073
3 天津(滨海)人工智能创新中心天津300457
4 国防科技创新研究院北京100071

收稿日期:2023-06-06 修回日期:2023-11-07 出版日期:2024-04-15 发布日期:2024-04-10
通讯作者: 陈莹(selina.ychen@foxmail.com)
作者简介:(dxshi@nudt.edu.cn)
基金资助:
科技部科技创新2030-重大项目(2020AAA0104802);国家自然科学基金(91948303)

Multi-agent Reinforcement Learning Method Based on Observation Reconstruction

SHI Dianxi^1,3, HU Haomeng^2,3, SONG Linna^2,3, YANG Huanhuan^2,3, OUYANG Qianying^1,3, TAN Jiefu³ , CHEN Ying⁴

1 Intelligent Game and Decision Lab(IGDL),Beijing 100091,China
2 College of Computer,National University of Defense Technology,Changsha 410073,China
3 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
4 National Innovation Institute of Defense Technology,Beijing 100071,China

Received:2023-06-06 Revised:2023-11-07 Online:2024-04-15 Published:2024-04-10
Supported by:
Science and Technology Innovation 2030 Major Project(2020AAA0104802) and National Natural Science Foundation of China(91948303).

摘要/Abstract

摘要： 共同知识是多智能体系统内众所周知的知识集。如何充分利用共同知识进行策略学习,是多智能体独立学习系统中的一个挑战性问题。针对这一问题,围绕共同知识提取和独立学习网络设计,提出了一种基于观测重构的多智能体强化学习方法IPPO-CKOR。首先,对智能体的观测信息进行共同知识特征的计算与融合,得到融合共同知识特征的观测信息;其次,采用基于共同知识的智能体选择算法,选择关系密切的智能体,并使用重构特征生成机制构建它们的特征信息,其与融合共同知识特征的观测信息组成重构观测信息,用于智能体策略的学习与执行;最后,设计了一个基于观测重构的独立学习网络,使用多头自注意力机制对重构观测信息进行处理,使用一维卷积和GRU层处理观测信息序列,使得智能体能够从观测信息序列中提取出更有效的特征,有效缓解了环境非平稳与部分可观测问题带来的影响。实验结果表明,相较于现有典型的采用独立学习的多智能体强化学习方法,所提方法在性能上有显著提升。

关键词: 观测重构, 多智能体协作策略, 多智能体强化学习, 独立学习

Abstract: Common knowledge is a well-known knowledge set within a multi-agent system.How to make full use of common knowledge for strategic learning is a challenging problem in multi-agent independent learning systems.In addressing this pro-blem,this paper proposes a multi-agent reinforcement learning method called IPPO-CKOR based on observation reconstruction,focusing on common knowledge extraction and independent learning network design.Firstly,the common knowledge features of agents' observation information are computed and fused to obtain fused observation information with common knowledge features.Secondly,an agent selection algorithm based on common knowledge is used to select closely related agents,and a feature generation mechanism based on reconstruction is employed to construct their feature information.The reconstructed observation information,composed of the fused observation information with common knowledge features,is utilized for learning and executing agent policies.Thirdly,a network structure based on observation reconstruction is designed,which employs multi-head self-attention mechanism to process the reconstructed observation information and uses one-dimensional convolution and GRU layers to handle observation information sequences.This enables the agents to extract more effective features from the observation information sequences,effectively alleviating the impact of non-stationary environments and partially observable problems.Experimental results demonstrate that the proposed method outperforms existing typical multi-agent reinforcement learning methods that employ independent learning in terms of performance.

Key words: Observation reconstruction, Multi-agent cooperative strategy, Multi-agent reinforcement learning, Independent learning

中图分类号:

TP391

史殿习, 胡浩萌, 宋林娜, 杨焕焕, 欧阳倩滢, 谭杰夫, 陈莹. 基于观测重构的多智能体强化学习方法[J]. 计算机科学, 2024, 51(4): 280-290. https://doi.org/10.11896/jsjkx.230600055

SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction[J]. Computer Science, 2024, 51(4): 280-290. https://doi.org/10.11896/jsjkx.230600055

参考文献

[1]LI Y,XU F,XIE G Q,et al.Survey of development and applica-tion of multi-agent technology[J].Computer Engineering and Applications,2018,54(9):13-21.
[2]CLAUS C,BOUTILIER C.The dynamics of reinforcementlearning in cooperative multiagent systems[C]//AAAI/IAAI.1998.
[3]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of theTenth International Conference on Machine Learning.1993:330-337.
[4]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances inNeural Information Processing Systems,2017,30:6382-6393.
[5]YANG Y,LUO R,LI M,et al.Mean field multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.PMLR,2018:5571-5580.
[6]SUNEHAG P,LE·VER G,GRUSLYS A,et al.Value-Decomposition Networks for Cooperative Multi-Agent Learning Based On Team Reward[C]//AAMAS.2018.
[7]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//InternationalConference on Machine Learning.PMLR,2018:4295-4304.
[8]KUBA J G,CHEN R,WEN M,et al.Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning[C]//International Conference on Learning Representations.2021.
[9]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS one,2017,12(4):e0172395.
[10]DE WITT C S,GUPTA T,MAKOVIICHUK D,et al.Is independent learning all you need in the starcraft multi-agent challenge?[J].arXiv:2011.09533,2020.
[11]CHRISTIANOS F,SCHÄFER L,ALBRECHT S.Shared experience actor-critic for multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:10707-10717.
[12]OSBORNE M J,RUBINSTEIN A.A course in game theory[M].MIT press,1994.
[13]SCHROEDER DE WITT C,FOERSTER J,FARQUHAR G,et al.Multi-agent common knowledge reinforcement learning[C]//Neural Information Processing Systems.2019.
[14]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019.
[15]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal ofArtificial Intelligence Research,1996,4:237-285.
[16]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimen-sional continuous control using generalized advantage estimation[J].arXiv:1506.02438,2015.
[17]HALPERN J Y,MOSES Y.Knowledge and common knowledge in a distributed environment[J].Journal of the ACM(JACM),1990,37(3):549-587.
[18]NAYYAR A,MAHAJAN A,TENEKETZIS D.Decentralizedstochastic control with partial history sharing:A common information approach[J].IEEE Transactions on Automatic Control,2013,58(7):1644-1658.
[19]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI/IAAI.2002:253-259.
[20]HU H,SHI D,YANG H,et al.Independent Multi-agent Reinforcement Learning Using Common Knowledge[C]//2022 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2022:2703-2708.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[22]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[23]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于观测重构的多智能体强化学习方法

Multi-agent Reinforcement Learning Method Based on Observation Reconstruction

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0