计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 280-290.doi: 10.11896/jsjkx.230600055
史殿习1,3, 胡浩萌2,3, 宋林娜2,3, 杨焕焕2,3, 欧阳倩滢1,3, 谭杰夫3, 陈莹4
SHI Dianxi1,3, HU Haomeng2,3, SONG Linna2,3, YANG Huanhuan2,3, OUYANG Qianying1,3, TAN Jiefu3 , CHEN Ying4
摘要: 共同知识是多智能体系统内众所周知的知识集。如何充分利用共同知识进行策略学习,是多智能体独立学习系统中的一个挑战性问题。针对这一问题,围绕共同知识提取和独立学习网络设计,提出了一种基于观测重构的多智能体强化学习方法IPPO-CKOR。首先,对智能体的观测信息进行共同知识特征的计算与融合,得到融合共同知识特征的观测信息;其次,采用基于共同知识的智能体选择算法,选择关系密切的智能体,并使用重构特征生成机制构建它们的特征信息,其与融合共同知识特征的观测信息组成重构观测信息,用于智能体策略的学习与执行;最后,设计了一个基于观测重构的独立学习网络,使用多头自注意力机制对重构观测信息进行处理,使用一维卷积和GRU层处理观测信息序列,使得智能体能够从观测信息序列中提取出更有效的特征,有效缓解了环境非平稳与部分可观测问题带来的影响。实验结果表明,相较于现有典型的采用独立学习的多智能体强化学习方法,所提方法在性能上有显著提升。
中图分类号:
[1]LI Y,XU F,XIE G Q,et al.Survey of development and applica-tion of multi-agent technology[J].Computer Engineering and Applications,2018,54(9):13-21. [2]CLAUS C,BOUTILIER C.The dynamics of reinforcementlearning in cooperative multiagent systems[C]//AAAI/IAAI.1998. [3]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of theTenth International Conference on Machine Learning.1993:330-337. [4]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances inNeural Information Processing Systems,2017,30:6382-6393. [5]YANG Y,LUO R,LI M,et al.Mean field multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.PMLR,2018:5571-5580. [6]SUNEHAG P,LE·VER G,GRUSLYS A,et al.Value-Decomposition Networks for Cooperative Multi-Agent Learning Based On Team Reward[C]//AAMAS.2018. [7]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//InternationalConference on Machine Learning.PMLR,2018:4295-4304. [8]KUBA J G,CHEN R,WEN M,et al.Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning[C]//International Conference on Learning Representations.2021. [9]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS one,2017,12(4):e0172395. [10]DE WITT C S,GUPTA T,MAKOVIICHUK D,et al.Is independent learning all you need in the starcraft multi-agent challenge?[J].arXiv:2011.09533,2020. [11]CHRISTIANOS F,SCHÄFER L,ALBRECHT S.Shared experience actor-critic for multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:10707-10717. [12]OSBORNE M J,RUBINSTEIN A.A course in game theory[M].MIT press,1994. [13]SCHROEDER DE WITT C,FOERSTER J,FARQUHAR G,et al.Multi-agent common knowledge reinforcement learning[C]//Neural Information Processing Systems.2019. [14]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019. [15]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal ofArtificial Intelligence Research,1996,4:237-285. [16]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimen-sional continuous control using generalized advantage estimation[J].arXiv:1506.02438,2015. [17]HALPERN J Y,MOSES Y.Knowledge and common knowledge in a distributed environment[J].Journal of the ACM(JACM),1990,37(3):549-587. [18]NAYYAR A,MAHAJAN A,TENEKETZIS D.Decentralizedstochastic control with partial history sharing:A common information approach[J].IEEE Transactions on Automatic Control,2013,58(7):1644-1658. [19]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI/IAAI.2002:253-259. [20]HU H,SHI D,YANG H,et al.Independent Multi-agent Reinforcement Learning Using Common Knowledge[C]//2022 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2022:2703-2708. [21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017. [22]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780. [23]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014. |
|