计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 330-336.doi: 10.11896/jsjkx.240700107
朱士昊1, 彭可兴2, 马廷淮1,3
ZHU Shihao1, PENG Kexing2, MA Tinghuai1,3
摘要: 目前,多智能体强化学习在各类合作任务中被广泛应用。但在真实环境中,智能体通常只能获取部分观测值,导致合作策略的探索效率低下。此外,智能体共享奖励值,导致其难以准确衡量个体贡献。针对这些问题,提出一种基于图注意力的分组多智能体强化学习框架,其有效提高了合作效率并改善了个体贡献的衡量。首先,构建图结构的多智能体系统,通过图注意力网络学习个体与邻居的关系以进行信息共享,扩大智能体个体的感受野,从而缓解部分可观测的限制并有效衡量个体贡献。其次,设计了动作参考模块,为个体动作选择提供联合动作参考信息,使智能体在探索时更高效、多样。在两个不同规模的多智能体控制场景下,所提方法相比基线方法展现出显著的优势;同时,消融实验证明了图注意力分组方法和通信设置的有效性。
中图分类号:
[1]LI L,ZHAO W,WANG C,et al.Nash double Q-based multi-agent deep reinforcement learning for interactive merging strategy in mixed traffic[J].Expert Systems with Applications,2024,237:121458. [2]OROOJLOOY A,HAJINEZHAD D.A review of cooperativemulti-agent deep reinforcement learning[J].Applied Intelligence,2023,53(11):13677-13722. [3]LI T,ZHU K,LUONG N C,et al.Applications of multi-agent reinforcement learning in future internet:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2022,24(2):1240-1279. [4]LIU Q,SZEPESVÁRI C,JIN C.Sample-efficient reinforcement learning of partially observable markov games[C]//Advances in Neural Information Processing Systems.2022:18296-18308. [5]ZHANG K,YANG Z,BAŞAR T.Multi-agent reinforcementlearning:A selective overview of theories and algorithms[M]//Handbook of Reinforcement Learning and Control.2021:321-384. [6]YARAHMADI H,SHIRI M E,NAVIDI H,et al.Bankruptcy-evolutionary games based solution for the multi-agent credit assignment problem[J].Swarm and Evolutionary Computation,2023,77:101229. [7]JIANG K,LIU W,WANG Y,et al.Credit assignment in heterogeneous multi-agent reinforcement learning for fully cooperative tasks[J].Applied Intelligence,2023,53(23):29205-29222. [8]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:2974-2982. [9]KIM W,PARK J,SUNG Y.Communication in multi-agent reinforcement learning:Intention sharing[C]//International Confe-rence on Learning Representations.2020:1-15. [10]LIU Y,WANG W,HU Y,et al.Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7211-7218. [11]NIU Y,PALEJA R R,GOMBOLAY M C.Multi-Agent Graph-Attention Communication and Teaming[C]//Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems.2021:964-973. [12]RASHID T,SAMVELYAN M,DE WITT C S,et al.Monotonic value function factorisation for deep multi-agent reinforcement learning[J].Journal of Machine Learning Research,2020,21(178):1-51. [13]SON K,KIM D,KANG W J,et al.Qtran:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning.2019:5887-5896. [14]NADERIALIZADEH N,HUNG F H,SOLEYMAN S,et al.Graph convolutional value decomposition in multi-agent reinforcement learning[J].arXiv:2010.04740,2020. [15]WANG T,DONG H,LESSER V,et al.ROMA:multi-agent reinforcement learning with emergent roles[C]//Proceedings of the 37th International Conference on Machine Learning.2020:9876-9886. [16]WANG Y,HAN B,WANG T,et al.Dop:Off-policy multi-agent decomposed policy gradients[C]//International Conference on Learning Representations.2020:1-24. [17]DU Y,HAN L,FANG M,et al.Liir:Learning individual intrinsic reward in multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2019,32:1-12. [18]MAHAJAN A,RASHID T,SAMVELYAN M,et al.Maven:Multi-agent variational exploration[C]//Advances in Neural Information Processing Systems.2019:1-12. [19]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems.2018:2085-2087. [20]WANG T,GUPTA T,MAHAJAN A,et al.Rode:Learningroles to decompose multi-agent tasks[J].arXiv:2010.01523,2020. [21]JIANG J,LU Z.Learning attentional communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems.2018:1-11. [22]WANG X,KE L,QIAO Z,et al.Large-scale traffic signal control using a novel multiagent reinforcement learning[J].IEEE Transactions on Cybernetics,2020,51(1):174-187. [23]YANG S,YANG B,ZENG Z,et al.Causal inference multi-agent reinforcement learning for traffic signal control[J].Information Fusion,2023,94:243-256. [24]SAMVELYAN M,RASHID T,SCHROEDER DE WITT C,et al.The StarCraft Multi-Agent Challenge[C]//Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems.2019:2186-2188. [25]HAN Z R,QIAN Y H,LIU G Q.Multi Agent Communication Based on Self Attention and Reinforcement Learning[J].Journal of Chinese Computer Systems,2023,44(6):1134-1139. |
|