计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 252-261.doi: 10.11896/jsjkx.250300145

• 人工智能 • 上一篇    下一篇

基于协作语义融合的多智能体行为决策方法

段鹏婷1,2, 温超3, 王保平1, 王珍妮1   

  1. 1 西北工业大学软件学院 西安 710129;
    2 北方自动控制技术研究所 太原 030006;
    3 山西大学大数据科学与产业研究院 太原 030006
  • 收稿日期:2025-03-27 修回日期:2025-06-05 发布日期:2026-01-08
  • 通讯作者: 王保平(baoping-wang@nwpu.edu.cn)
  • 作者简介:(ptduan@mail.nwpu.edu.cn)
  • 基金资助:
    陕西省重点研发计划(2022ZDLGY03-02);国家自然科学基金(62106134,62476159)

Collaborative Semantics Fusion for Multi-agent Behavior Decision-making

DUAN Pengting1,2, WEN Chao3, WANG Baoping1, WANG Zhenni1   

  1. 1 School of Software, Northwestern Polytechnical University, Xi’an 710129, China;
    2 North Automatic Control Technology Institute, Taiyuan 030006, China;
    3 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
  • Received:2025-03-27 Revised:2025-06-05 Online:2026-01-08
  • About author:DUAN Pengting,born in 1989,postgraduate.Her main research interests include artificial intelligence,reinforcement learning and intelligent decision-making.
    WANG Baoping,born in 1964,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence and radar signal processing.
  • Supported by:
    Key Research and Development Program of Shaanxi(2022ZDLGY03-02) and National Natural Science Foundation of China(62106134,62476159).

摘要: 多智能体行为决策方法,为工程应用领域,特别是协作任务下的智能体控制提供了广泛的应用前景。基于策略梯度的强化学习方法能够对智能体策略分布进行直接建模,更有利于复杂奖励机制下的策略多样性探索,在离散和连续空间中均能够提供较高的经验效能。基于策略梯度的多智能体联合策略生成通常采用参数共享等机制提升收敛效率,然而,这种机制缺乏对行为语义的建模,难以有效克服行为趋同性问题。针对该问题,从图建模的视角提出了一种基于协作语义融合(Collaborative Semantics Fusion,CSF)的行为序列预测方法。CSF方法利用图自编码器学习行为空间语义关系,获取相关性感知的行为语义嵌入;通过智能体行为特征与语义嵌入的交互实现信息融合。这种融合方式将具有协作关系的行为信息聚合于特定智能体的行为表示,实现多个智能体行为相互依赖的策略空间探索。在星际争霸和谷歌足球环境的多个复杂任务场景中开展实验,结果表明,CSF方法明显优于现有先进算法,验证了所提方法可以实现智能体间的高效协作。

关键词: 多智能体强化学习, 图自编码器, 语义关系, 特征融合, 行为决策

Abstract: Multi-agent decision-making offers extensive engineering applications,particularly in the cooperative control tasks.Po-licy gradient-based reinforcement learning methods,which directly model policy distributions,are more conducive to exploring diverse strategies in complex reward scenarios.These methods also demonstrate consistently high empirical efficiency across both discrete and continuous action spaces.Although parameter-sharing mechanisms are widely adopted in policy gradient frameworks to improve convergence efficiency for collaborative tasks,the lack of attention to action semantic modeling introduces critical limitations,especially in mitigating action homogenization among agents.To solve this issue,this paper proposes CSF method from a graph-based modeling perspective.The CSF framework employs a graph autoencoder to learn correlation-aware semantic embeddings within the action space,subsequently achieving information fusion through dynamic integration of agent-specific beha-vioral features with semantic embeddings.This fusion mechanism aggregates collaborative behavioral information into agent-specific latent representations,enabling interdependent policy space exploration across agents.Comprehensive experiments conducted on diverse complex task scenarios within the StarCraft and Google Research Football environments demonstrate that CSF achieves superior performance over state-of-the-art algorithms,thus validating its effectiveness in facilitating inter-agent collaboration.

Key words: Multi-agent reinforcement learning, Graph autoencoder, Semantic relations, Feature fusion, Behavior decision-making

中图分类号: 

  • TP181
[1]TROULLINOS D,CHALKIADAKIS G,PAPAMICHAIL I,et al.Collaborative multiagent decision making for lane-free auto-nomous driving[C]//Proceedings of International Conference on Autonomous Agents and Multiagent Systems.2021:1323-1331.
[2]ZHANG C W,TIAN Y,ZHANG Z B,et al.Neighborhood co-operative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(12):25157-25168.
[3]PIAO H Y,HAN Y,HE S M,et al.Spatiotemporal relationship cognitive learning for multi-robot air combat[J].IEEE Transactions on Cognitive and Developmental Systems,2023,15(4):2254-2268.
[4]SELMONAJ A,SZEHR O,DEL RIO G,et al.Hierarchical-multi-agent reinforcement learning for air combat maneuvering[C]//International Conference on Machine Learning and Applications.2023:1031-1038.
[5]DUAN H,LI P,YU Y.A Predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory[J].IEEE/CAA Journal of Automatica Sinica,2015,2(1):11-18.
[6]FENG J Y,CHEN M,LI J Y,et al.Knowledge-based and data-driven integrating design methodology for air combat strategy in multi-opponent adversarial game[J].Acta Electronica Sinica,2024,52(11):3809-3822.
[7]LYU X G,BAISERO A,XIAO Y C,et al.On centralized critics in multi-agent reinforcement learning[J].Journal of Artificial Intelligence Research,2023,77:295-354.
[8]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Annual Conference on Neural Information Processing Systems.2017:6380-6391.
[9]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[C]//Conference on Neural Information Processing Systems.2022.
[10]PENG B,RASHID T,SCHROEDER De WITT C,et al.FACMAC:Factored multi-agent centralised policy gradients[C]//Conference on Neural Information Processing Systems.2021.
[11]LI C H,WANG T H,WU C J,et al.Celebrating diversity inshared multi-agent reinforcement learning[C]//Annual Confe-rence on Neural Information Processing Systems.2021:3991-4002.
[12]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN A,et al.Sca-ling multi-agent reinforcement learning with selective parameter sharing[C]//International Confe-rence on Machine Learning.2021:1989-1998.
[13]WANG T H,WANG J H,ZHENG C Y,et al.Learning nearly decomposable value functions via communication minimization[C]//International Conference on Learning Representations.2020.
[14]YUAN L,JIANG T,LI L H,et al.Robust multi-agent communication via multi-view message certification[J].Science China Information Sciences,2024,67(4):102-142.
[15]SUN Y C,ZHENG R J,HASSANZADEH P,et al.Certifiably robust policy learning against adversarial multi-agent communication[C]//The International Conference on Learning Representations.2023.
[16]DAS A,GERVET T,ROMOFF J,et al.TarMAC:Targetedmulti-agent communication[C]//International Conference on Machine Learning.2019:2776-2784.
[17]ZHANG S Q,LIN J Y,ZHANG Q.Succinct and robust multi-agent communication with temporal message control[C]//Conference on Neural Information Processing Systems.2020.
[18]YUAN L,CHEN F,ZHANG Z Z,et al.Communication-robustmulti-agent learning by adaptable auxiliary multi-agent adversary generation[J].Frontiers of Computer Science,2024,18(6):101-117.
[19]XIE A,LOSEY D,TOLSMA R,et al.Learning latent representations to influence multi-agent interaction[C]//Conference on Robot Learning.2020:575-588.
[20]RYU H,SHIN H,PARK J.Remax:Relational representationfor multi-agent exploration[C]//International Conference on Autonomous Agents and Multiagent Systems.2022:1137-1145.
[21]GUESTRIN C,LAGOUDAKIS M G,PARR R.Coordinated reinforcement learning[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:98-105.
[22]KOK J R,VLASSIS N.Collaborative multiagent reinforcement learning by payoff propagation[J].Journal of Machine Learning Research,2006,7(9):1789-1828.
[23]RASHID T,SAMVELYAN M,DE WITT C S,et al.QMIX:Monotonic value function factorization for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.2018:6846-6859.
[24]SON K,KIM D,KANG W J,et al.QTRAN:Learning to facto-rize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.2019:10329-10346.
[25]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:17-24.
[26]LEMAIGNAN S,WARNIER M,SISBOT E A,et al.Artificial cognition for social human-robot interaction:An implementation[J].Artificial Intelligence,2017(247):45-69.
[27]KIM W,PARK J,SUNG Y.Communication in multi-agent reinforcement learning:Intention sharing[C]//International Confe-rence on Learning Representations.2021.
[28]LIU J W,WANG H X.Graph Isomorphism Network for Speech Emotion Recognition[C]//Interspeech Conference.2021:3405-3409.
[29]WEN M N,KUBA J G,LIN R J,et al.Multi-agent reinforcement learning is a sequence modeling problem[C]//Neural Information Processing Systems.2022.
[30]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[C]//International Conference on Autonomous Agents and Multi-Agent Systems.2018:2085-2087.
[31]ZHOU M,LIU Z,SUI P,et al.Learning implicit credit assignment for cooperative multi-agent reinforcement learning[C]//Conference on Neural Information Processing Systems.2020.
[32]GAO J P,WANG G X,GAO L.LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(3):797-806.
[33]WANG Y H,HAN B N,WANG T H,et al.DOP:Off-policy multi-agent decomposed policy gradients[C]//International Conference on Learning Representations.2021.
[34]GE H W,GE Z X,SUN L,et al.Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning[J].Applied Intelligence,2022,52(9):9701-9716.
[35]KUBA J G,CHEN R Q,WEN M N,et al.Trust region policy optimization in multi-agent reinforcement learning[C]//International Conference on Learning Representations.2022.
[36]GRUVER N,SONG J M,KOCHENDERFER M,et al.Multi-agent adversarial inverse reinforcement learning with latent vari-ables[C]//International Conference on Autonomous Agents and Multiagent Systems.2020:1855-1857.
[37]LASKIN M,SRINIVAS A,ABBEEL P.CURL:Contrastive unsupervised representations for reinforcement learning[C]//International Conference on Machine Learning.2020:5595-5606.
[38]EYSENBACH B,ZHANG T J,LEVINE S,et al.Contrastivelearning as goal-conditioned reinforcement learning[C]//Conference on Neural Information Processing Systems.2022.
[39]YARATS D,FERGUS R,KOSTRIKOV I.Image augmentation is all you need:Regularizing deep reinforcement learning from pixels[C]//International Conference on Learning Representations.2021.
[40]FRANS A O,CHRISTOPHER A.A concise introduction to decentralized POMDPs[M].Cham:Springer,2016.
[41]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensio-nal continuous control using generalized advantage estimation[C]//International Conference on Learning Representations.2016.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!