计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 330-336.doi: 10.11896/jsjkx.240700107

• 人工智能 • 上一篇    下一篇

基于图注意力的分组多智能体强化学习方法

朱士昊1, 彭可兴2, 马廷淮1,3   

  1. 1 南京信息工程大学软件学院 南京 210044
    2 南京信息工程大学计算机学院 南京 210044
    3 江苏海洋大学计算机工程学院 江苏 连云港 222005
  • 收稿日期:2024-07-16 修回日期:2024-11-04 出版日期:2025-09-15 发布日期:2025-09-11
  • 通讯作者: 马廷淮(thma@nuist.edu.cn)
  • 作者简介:(zhushihaosz@126.com)
  • 基金资助:
    国家自然科学基金(62372243,62102187)

Graph Attention-based Grouped Multi-agent Reinforcement Learning Method

ZHU Shihao1, PENG Kexing2, MA Tinghuai1,3   

  1. 1 School of Software,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044,China
    3 School of Computer Engineering,Jiangsu Ocean University,Lianyungang,Jiangsu 222005,China
  • Received:2024-07-16 Revised:2024-11-04 Online:2025-09-15 Published:2025-09-11
  • About author:ZHU Shihao,born in 1997,master.His main research interest is reinforcement learning.
    MA Tinghuai,born in 1974,Ph.D,professor,Ph.D supervisor.His main research interests include data mining,social network,privacy preserving and data sharing.
  • Supported by:
    National Natural Science Foundation of China(62372243,62102187).

摘要: 目前,多智能体强化学习在各类合作任务中被广泛应用。但在真实环境中,智能体通常只能获取部分观测值,导致合作策略的探索效率低下。此外,智能体共享奖励值,导致其难以准确衡量个体贡献。针对这些问题,提出一种基于图注意力的分组多智能体强化学习框架,其有效提高了合作效率并改善了个体贡献的衡量。首先,构建图结构的多智能体系统,通过图注意力网络学习个体与邻居的关系以进行信息共享,扩大智能体个体的感受野,从而缓解部分可观测的限制并有效衡量个体贡献。其次,设计了动作参考模块,为个体动作选择提供联合动作参考信息,使智能体在探索时更高效、多样。在两个不同规模的多智能体控制场景下,所提方法相比基线方法展现出显著的优势;同时,消融实验证明了图注意力分组方法和通信设置的有效性。

关键词: 多智能体强化学习, 图注意力网络, 集中训练分散执行, 多智能体协作, 多智能体通信

Abstract: Currently,multi-agent reinforcement learning is widely applied in various cooperative tasks.In real environments,agents always have access to only partial observations,leading to inefficient exploration of cooperative strategies.Moreover,sharing reward values among agents makes it challenging to accurately assess individual contributions.To address these issues,a novel graph attention-based grouped multi-agent reinforcement learning framework is proposed,which improves cooperation efficiency and enhances the evaluation of individual contributions.Firstly,a multi-agent system with graph structure is constructed,which learning relationships among the individual agents and their neighbors for sharing information.This approach expands individual agents’ perceptual fields to mitigate constraints from partial observability and assess individual contributions.Secondly,an action reference module is designed to provide joint action reference information for individual action selection,enabling agents to explore more efficiently and diversely.Experimental results in two different scales of multi-agent control scenarios demonstrate significant advantages over baseline methods.Detailed ablation studies further verify the effectiveness of the graph attention grouping approach and communication settings.

Key words: Multi-agent reinforcement learning, Graph attention network, Centralized training decentralized execution, Multi-agent cooperation, Multi-agent communication

中图分类号: 

  • TP391
[1]LI L,ZHAO W,WANG C,et al.Nash double Q-based multi-agent deep reinforcement learning for interactive merging strategy in mixed traffic[J].Expert Systems with Applications,2024,237:121458.
[2]OROOJLOOY A,HAJINEZHAD D.A review of cooperativemulti-agent deep reinforcement learning[J].Applied Intelligence,2023,53(11):13677-13722.
[3]LI T,ZHU K,LUONG N C,et al.Applications of multi-agent reinforcement learning in future internet:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2022,24(2):1240-1279.
[4]LIU Q,SZEPESVÁRI C,JIN C.Sample-efficient reinforcement learning of partially observable markov games[C]//Advances in Neural Information Processing Systems.2022:18296-18308.
[5]ZHANG K,YANG Z,BAŞAR T.Multi-agent reinforcementlearning:A selective overview of theories and algorithms[M]//Handbook of Reinforcement Learning and Control.2021:321-384.
[6]YARAHMADI H,SHIRI M E,NAVIDI H,et al.Bankruptcy-evolutionary games based solution for the multi-agent credit assignment problem[J].Swarm and Evolutionary Computation,2023,77:101229.
[7]JIANG K,LIU W,WANG Y,et al.Credit assignment in heterogeneous multi-agent reinforcement learning for fully cooperative tasks[J].Applied Intelligence,2023,53(23):29205-29222.
[8]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:2974-2982.
[9]KIM W,PARK J,SUNG Y.Communication in multi-agent reinforcement learning:Intention sharing[C]//International Confe-rence on Learning Representations.2020:1-15.
[10]LIU Y,WANG W,HU Y,et al.Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7211-7218.
[11]NIU Y,PALEJA R R,GOMBOLAY M C.Multi-Agent Graph-Attention Communication and Teaming[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems.2021:964-973.
[12]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].Journal of Machine Learning Research,2020,21(178):1-51.
[13]SON K,KIM D,KANG W J,et al.Qtran:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning.2019:5887-5896.
[14]NADERIALIZADEH N,HUNG F H,SOLEYMAN S,et al.Graph convolutional value decomposition in multi-agent reinforcement learning[J].arXiv:2010.04740,2020.
[15]WANG T,DONG H,LESSER V,et al.ROMA:multi-agent reinforcement learning with emergent roles[C]//Proceedings of the 37th International Conference on Machine Learning.2020:9876-9886.
[16]WANG Y,HAN B,WANG T,et al.Dop:Off-policy multi-agent decomposed policy gradients[C]//International Conference on Learning Representations.2020:1-24.
[17]DU Y,HAN L,FANG M,et al.Liir:Learning individual intrinsic reward in multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2019,32:1-12.
[18]MAHAJAN A,RASHID T,SAMVELYAN M,et al.Maven:Multi-agent variational exploration[C]//Advances in Neural Information Processing Systems.2019:1-12.
[19]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:2085-2087.
[20]WANG T,GUPTA T,MAHAJAN A,et al.Rode:Learningroles to decompose multi-agent tasks[J].arXiv:2010.01523,2020.
[21]JIANG J,LU Z.Learning attentional communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems.2018:1-11.
[22]WANG X,KE L,QIAO Z,et al.Large-scale traffic signal control using a novel multiagent reinforcement learning[J].IEEE Transactions on Cybernetics,2020,51(1):174-187.
[23]YANG S,YANG B,ZENG Z,et al.Causal inference multi-agent reinforcement learning for traffic signal control[J].Information Fusion,2023,94:243-256.
[24]SAMVELYAN M,RASHID T,SCHROEDER DE WITT C,et al.The StarCraft Multi-Agent Challenge[C]//Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems.2019:2186-2188.
[25]HAN Z R,QIAN Y H,LIU G Q.Multi Agent Communication Based on Self Attention and Reinforcement Learning[J].Journal of Chinese Computer Systems,2023,44(6):1134-1139.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!