Computer Science ›› 2024, Vol. 51 ›› Issue (4): 280-290.doi: 10.11896/jsjkx.230600055

• Artificial Intelligence • Previous Articles     Next Articles

Multi-agent Reinforcement Learning Method Based on Observation Reconstruction

SHI Dianxi1,3, HU Haomeng2,3, SONG Linna2,3, YANG Huanhuan2,3, OUYANG Qianying1,3, TAN Jiefu3 , CHEN Ying4   

  1. 1 Intelligent Game and Decision Lab(IGDL),Beijing 100091,China
    2 College of Computer,National University of Defense Technology,Changsha 410073,China
    3 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
    4 National Innovation Institute of Defense Technology,Beijing 100071,China
  • Received:2023-06-06 Revised:2023-11-07 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    Science and Technology Innovation 2030 Major Project(2020AAA0104802) and National Natural Science Foundation of China(91948303).

Abstract: Common knowledge is a well-known knowledge set within a multi-agent system.How to make full use of common knowledge for strategic learning is a challenging problem in multi-agent independent learning systems.In addressing this pro-blem,this paper proposes a multi-agent reinforcement learning method called IPPO-CKOR based on observation reconstruction,focusing on common knowledge extraction and independent learning network design.Firstly,the common knowledge features of agents' observation information are computed and fused to obtain fused observation information with common knowledge features.Secondly,an agent selection algorithm based on common knowledge is used to select closely related agents,and a feature generation mechanism based on reconstruction is employed to construct their feature information.The reconstructed observation information,composed of the fused observation information with common knowledge features,is utilized for learning and executing agent policies.Thirdly,a network structure based on observation reconstruction is designed,which employs multi-head self-attention mechanism to process the reconstructed observation information and uses one-dimensional convolution and GRU layers to handle observation information sequences.This enables the agents to extract more effective features from the observation information sequences,effectively alleviating the impact of non-stationary environments and partially observable problems.Experimental results demonstrate that the proposed method outperforms existing typical multi-agent reinforcement learning methods that employ independent learning in terms of performance.

Key words: Observation reconstruction, Multi-agent cooperative strategy, Multi-agent reinforcement learning, Independent learning

CLC Number: 

  • TP391
[1]LI Y,XU F,XIE G Q,et al.Survey of development and applica-tion of multi-agent technology[J].Computer Engineering and Applications,2018,54(9):13-21.
[2]CLAUS C,BOUTILIER C.The dynamics of reinforcementlearning in cooperative multiagent systems[C]//AAAI/IAAI.1998.
[3]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of theTenth International Conference on Machine Learning.1993:330-337.
[4]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances inNeural Information Processing Systems,2017,30:6382-6393.
[5]YANG Y,LUO R,LI M,et al.Mean field multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.PMLR,2018:5571-5580.
[6]SUNEHAG P,LE·VER G,GRUSLYS A,et al.Value-Decomposition Networks for Cooperative Multi-Agent Learning Based On Team Reward[C]//AAMAS.2018.
[7]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//InternationalConference on Machine Learning.PMLR,2018:4295-4304.
[8]KUBA J G,CHEN R,WEN M,et al.Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning[C]//International Conference on Learning Representations.2021.
[9]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS one,2017,12(4):e0172395.
[10]DE WITT C S,GUPTA T,MAKOVIICHUK D,et al.Is independent learning all you need in the starcraft multi-agent challenge?[J].arXiv:2011.09533,2020.
[11]CHRISTIANOS F,SCHÄFER L,ALBRECHT S.Shared experience actor-critic for multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2020,33:10707-10717.
[12]OSBORNE M J,RUBINSTEIN A.A course in game theory[M].MIT press,1994.
[13]SCHROEDER DE WITT C,FOERSTER J,FARQUHAR G,et al.Multi-agent common knowledge reinforcement learning[C]//Neural Information Processing Systems.2019.
[14]SAMVELYAN M,RASHID T,DE WITT C S,et al.The starcraft multi-agent challenge[J].arXiv:1902.04043,2019.
[15]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[J].Journal ofArtificial Intelligence Research,1996,4:237-285.
[16]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimen-sional continuous control using generalized advantage estimation[J].arXiv:1506.02438,2015.
[17]HALPERN J Y,MOSES Y.Knowledge and common knowledge in a distributed environment[J].Journal of the ACM(JACM),1990,37(3):549-587.
[18]NAYYAR A,MAHAJAN A,TENEKETZIS D.Decentralizedstochastic control with partial history sharing:A common information approach[J].IEEE Transactions on Automatic Control,2013,58(7):1644-1658.
[19]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI/IAAI.2002:253-259.
[20]HU H,SHI D,YANG H,et al.Independent Multi-agent Reinforcement Learning Using Common Knowledge[C]//2022 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2022:2703-2708.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[22]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[23]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[1] LUO Ruiqing, ZENG Kun, ZHANG Xinjing. Curriculum Learning Framework Based on Reinforcement Learning in Sparse HeterogeneousMulti-agent Environments [J]. Computer Science, 2024, 51(1): 301-309.
[2] XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[3] LIN Xiangyang, XING Qinghua, XING Huaixi. Study on Intelligent Decision Making of Aerial Interception Combat of UAV Group Based onMADDPG [J]. Computer Science, 2023, 50(6A): 220700031-7.
[4] RONG Huan, QIAN Minfeng, MA Tinghuai, SUN Shengjie. Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration [J]. Computer Science, 2023, 50(1): 243-252.
[5] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[6] DU Wei, DING Shi-fei. Overview on Multi-agent Reinforcement Learning [J]. Computer Science, 2019, 46(8): 1-8.
Full text



No Suggested Reading articles found!