计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 252-261.doi: 10.11896/jsjkx.250300145
段鹏婷1,2, 温超3, 王保平1, 王珍妮1
DUAN Pengting1,2, WEN Chao3, WANG Baoping1, WANG Zhenni1
摘要: 多智能体行为决策方法,为工程应用领域,特别是协作任务下的智能体控制提供了广泛的应用前景。基于策略梯度的强化学习方法能够对智能体策略分布进行直接建模,更有利于复杂奖励机制下的策略多样性探索,在离散和连续空间中均能够提供较高的经验效能。基于策略梯度的多智能体联合策略生成通常采用参数共享等机制提升收敛效率,然而,这种机制缺乏对行为语义的建模,难以有效克服行为趋同性问题。针对该问题,从图建模的视角提出了一种基于协作语义融合(Collaborative Semantics Fusion,CSF)的行为序列预测方法。CSF方法利用图自编码器学习行为空间语义关系,获取相关性感知的行为语义嵌入;通过智能体行为特征与语义嵌入的交互实现信息融合。这种融合方式将具有协作关系的行为信息聚合于特定智能体的行为表示,实现多个智能体行为相互依赖的策略空间探索。在星际争霸和谷歌足球环境的多个复杂任务场景中开展实验,结果表明,CSF方法明显优于现有先进算法,验证了所提方法可以实现智能体间的高效协作。
中图分类号:
| [1]TROULLINOS D,CHALKIADAKIS G,PAPAMICHAIL I,et al.Collaborative multiagent decision making for lane-free auto-nomous driving[C]//Proceedings of International Conference on Autonomous Agents and Multiagent Systems.2021:1323-1331. [2]ZHANG C W,TIAN Y,ZHANG Z B,et al.Neighborhood co-operative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(12):25157-25168. [3]PIAO H Y,HAN Y,HE S M,et al.Spatiotemporal relationship cognitive learning for multi-robot air combat[J].IEEE Transactions on Cognitive and Developmental Systems,2023,15(4):2254-2268. [4]SELMONAJ A,SZEHR O,DEL RIO G,et al.Hierarchical-multi-agent reinforcement learning for air combat maneuvering[C]//International Conference on Machine Learning and Applications.2023:1031-1038. [5]DUAN H,LI P,YU Y.A Predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory[J].IEEE/CAA Journal of Automatica Sinica,2015,2(1):11-18. [6]FENG J Y,CHEN M,LI J Y,et al.Knowledge-based and data-driven integrating design methodology for air combat strategy in multi-opponent adversarial game[J].Acta Electronica Sinica,2024,52(11):3809-3822. [7]LYU X G,BAISERO A,XIAO Y C,et al.On centralized critics in multi-agent reinforcement learning[J].Journal of Artificial Intelligence Research,2023,77:295-354. [8]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Annual Conference on Neural Information Processing Systems.2017:6380-6391. [9]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[C]//Conference on Neural Information Processing Systems.2022. [10]PENG B,RASHID T,SCHROEDER De WITT C,et al.FACMAC:Factored multi-agent centralised policy gradients[C]//Conference on Neural Information Processing Systems.2021. [11]LI C H,WANG T H,WU C J,et al.Celebrating diversity inshared multi-agent reinforcement learning[C]//Annual Confe-rence on Neural Information Processing Systems.2021:3991-4002. [12]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN A,et al.Sca-ling multi-agent reinforcement learning with selective parameter sharing[C]//International Confe-rence on Machine Learning.2021:1989-1998. [13]WANG T H,WANG J H,ZHENG C Y,et al.Learning nearly decomposable value functions via communication minimization[C]//International Conference on Learning Representations.2020. [14]YUAN L,JIANG T,LI L H,et al.Robust multi-agent communication via multi-view message certification[J].Science China Information Sciences,2024,67(4):102-142. [15]SUN Y C,ZHENG R J,HASSANZADEH P,et al.Certifiably robust policy learning against adversarial multi-agent communication[C]//The International Conference on Learning Representations.2023. [16]DAS A,GERVET T,ROMOFF J,et al.TarMAC:Targetedmulti-agent communication[C]//International Conference on Machine Learning.2019:2776-2784. [17]ZHANG S Q,LIN J Y,ZHANG Q.Succinct and robust multi-agent communication with temporal message control[C]//Conference on Neural Information Processing Systems.2020. [18]YUAN L,CHEN F,ZHANG Z Z,et al.Communication-robustmulti-agent learning by adaptable auxiliary multi-agent adversary generation[J].Frontiers of Computer Science,2024,18(6):101-117. [19]XIE A,LOSEY D,TOLSMA R,et al.Learning latent representations to influence multi-agent interaction[C]//Conference on Robot Learning.2020:575-588. [20]RYU H,SHIN H,PARK J.Remax:Relational representationfor multi-agent exploration[C]//International Conference on Autonomous Agents and Multiagent Systems.2022:1137-1145. [21]GUESTRIN C,LAGOUDAKIS M G,PARR R.Coordinated reinforcement learning[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:98-105. [22]KOK J R,VLASSIS N.Collaborative multiagent reinforcement learning by payoff propagation[J].Journal of Machine Learning Research,2006,7(9):1789-1828. [23]RASHID T,SAMVELYAN M,DE WITT C S,et al.QMIX:Monotonic value function factorization for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.2018:6846-6859. [24]SON K,KIM D,KANG W J,et al.QTRAN:Learning to facto-rize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.2019:10329-10346. [25]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:17-24. [26]LEMAIGNAN S,WARNIER M,SISBOT E A,et al.Artificial cognition for social human-robot interaction:An implementation[J].Artificial Intelligence,2017(247):45-69. [27]KIM W,PARK J,SUNG Y.Communication in multi-agent reinforcement learning:Intention sharing[C]//International Confe-rence on Learning Representations.2021. [28]LIU J W,WANG H X.Graph Isomorphism Network for Speech Emotion Recognition[C]//Interspeech Conference.2021:3405-3409. [29]WEN M N,KUBA J G,LIN R J,et al.Multi-agent reinforcement learning is a sequence modeling problem[C]//Neural Information Processing Systems.2022. [30]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[C]//International Conference on Autonomous Agents and Multi-Agent Systems.2018:2085-2087. [31]ZHOU M,LIU Z,SUI P,et al.Learning implicit credit assignment for cooperative multi-agent reinforcement learning[C]//Conference on Neural Information Processing Systems.2020. [32]GAO J P,WANG G X,GAO L.LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(3):797-806. [33]WANG Y H,HAN B N,WANG T H,et al.DOP:Off-policy multi-agent decomposed policy gradients[C]//International Conference on Learning Representations.2021. [34]GE H W,GE Z X,SUN L,et al.Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning[J].Applied Intelligence,2022,52(9):9701-9716. [35]KUBA J G,CHEN R Q,WEN M N,et al.Trust region policy optimization in multi-agent reinforcement learning[C]//International Conference on Learning Representations.2022. [36]GRUVER N,SONG J M,KOCHENDERFER M,et al.Multi-agent adversarial inverse reinforcement learning with latent vari-ables[C]//International Conference on Autonomous Agents and Multiagent Systems.2020:1855-1857. [37]LASKIN M,SRINIVAS A,ABBEEL P.CURL:Contrastive unsupervised representations for reinforcement learning[C]//International Conference on Machine Learning.2020:5595-5606. [38]EYSENBACH B,ZHANG T J,LEVINE S,et al.Contrastivelearning as goal-conditioned reinforcement learning[C]//Conference on Neural Information Processing Systems.2022. [39]YARATS D,FERGUS R,KOSTRIKOV I.Image augmentation is all you need:Regularizing deep reinforcement learning from pixels[C]//International Conference on Learning Representations.2021. [40]FRANS A O,CHRISTOPHER A.A concise introduction to decentralized POMDPs[M].Cham:Springer,2016. [41]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensio-nal continuous control using generalized advantage estimation[C]//International Conference on Learning Representations.2016. |
|
||