Computer Science ›› 2026, Vol. 53 ›› Issue (1): 252-261.doi: 10.11896/jsjkx.250300145

• Artificial Intelligence • Previous Articles     Next Articles

Collaborative Semantics Fusion for Multi-agent Behavior Decision-making

DUAN Pengting1,2, WEN Chao3, WANG Baoping1, WANG Zhenni1   

  1. 1 School of Software, Northwestern Polytechnical University, Xi’an 710129, China;
    2 North Automatic Control Technology Institute, Taiyuan 030006, China;
    3 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
  • Received:2025-03-27 Revised:2025-06-05 Published:2026-01-08
  • About author:DUAN Pengting,born in 1989,postgraduate.Her main research interests include artificial intelligence,reinforcement learning and intelligent decision-making.
    WANG Baoping,born in 1964,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence and radar signal processing.
  • Supported by:
    Key Research and Development Program of Shaanxi(2022ZDLGY03-02) and National Natural Science Foundation of China(62106134,62476159).

Abstract: Multi-agent decision-making offers extensive engineering applications,particularly in the cooperative control tasks.Po-licy gradient-based reinforcement learning methods,which directly model policy distributions,are more conducive to exploring diverse strategies in complex reward scenarios.These methods also demonstrate consistently high empirical efficiency across both discrete and continuous action spaces.Although parameter-sharing mechanisms are widely adopted in policy gradient frameworks to improve convergence efficiency for collaborative tasks,the lack of attention to action semantic modeling introduces critical limitations,especially in mitigating action homogenization among agents.To solve this issue,this paper proposes CSF method from a graph-based modeling perspective.The CSF framework employs a graph autoencoder to learn correlation-aware semantic embeddings within the action space,subsequently achieving information fusion through dynamic integration of agent-specific beha-vioral features with semantic embeddings.This fusion mechanism aggregates collaborative behavioral information into agent-specific latent representations,enabling interdependent policy space exploration across agents.Comprehensive experiments conducted on diverse complex task scenarios within the StarCraft and Google Research Football environments demonstrate that CSF achieves superior performance over state-of-the-art algorithms,thus validating its effectiveness in facilitating inter-agent collaboration.

Key words: Multi-agent reinforcement learning, Graph autoencoder, Semantic relations, Feature fusion, Behavior decision-making

CLC Number: 

  • TP181
[1]TROULLINOS D,CHALKIADAKIS G,PAPAMICHAIL I,et al.Collaborative multiagent decision making for lane-free auto-nomous driving[C]//Proceedings of International Conference on Autonomous Agents and Multiagent Systems.2021:1323-1331.
[2]ZHANG C W,TIAN Y,ZHANG Z B,et al.Neighborhood co-operative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions[J].IEEE Transactions on Intelligent Transportation Systems,2022,23(12):25157-25168.
[3]PIAO H Y,HAN Y,HE S M,et al.Spatiotemporal relationship cognitive learning for multi-robot air combat[J].IEEE Transactions on Cognitive and Developmental Systems,2023,15(4):2254-2268.
[4]SELMONAJ A,SZEHR O,DEL RIO G,et al.Hierarchical-multi-agent reinforcement learning for air combat maneuvering[C]//International Conference on Machine Learning and Applications.2023:1031-1038.
[5]DUAN H,LI P,YU Y.A Predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory[J].IEEE/CAA Journal of Automatica Sinica,2015,2(1):11-18.
[6]FENG J Y,CHEN M,LI J Y,et al.Knowledge-based and data-driven integrating design methodology for air combat strategy in multi-opponent adversarial game[J].Acta Electronica Sinica,2024,52(11):3809-3822.
[7]LYU X G,BAISERO A,XIAO Y C,et al.On centralized critics in multi-agent reinforcement learning[J].Journal of Artificial Intelligence Research,2023,77:295-354.
[8]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Annual Conference on Neural Information Processing Systems.2017:6380-6391.
[9]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[C]//Conference on Neural Information Processing Systems.2022.
[10]PENG B,RASHID T,SCHROEDER De WITT C,et al.FACMAC:Factored multi-agent centralised policy gradients[C]//Conference on Neural Information Processing Systems.2021.
[11]LI C H,WANG T H,WU C J,et al.Celebrating diversity inshared multi-agent reinforcement learning[C]//Annual Confe-rence on Neural Information Processing Systems.2021:3991-4002.
[12]CHRISTIANOS F,PAPOUDAKIS G,RAHMAN A,et al.Sca-ling multi-agent reinforcement learning with selective parameter sharing[C]//International Confe-rence on Machine Learning.2021:1989-1998.
[13]WANG T H,WANG J H,ZHENG C Y,et al.Learning nearly decomposable value functions via communication minimization[C]//International Conference on Learning Representations.2020.
[14]YUAN L,JIANG T,LI L H,et al.Robust multi-agent communication via multi-view message certification[J].Science China Information Sciences,2024,67(4):102-142.
[15]SUN Y C,ZHENG R J,HASSANZADEH P,et al.Certifiably robust policy learning against adversarial multi-agent communication[C]//The International Conference on Learning Representations.2023.
[16]DAS A,GERVET T,ROMOFF J,et al.TarMAC:Targetedmulti-agent communication[C]//International Conference on Machine Learning.2019:2776-2784.
[17]ZHANG S Q,LIN J Y,ZHANG Q.Succinct and robust multi-agent communication with temporal message control[C]//Conference on Neural Information Processing Systems.2020.
[18]YUAN L,CHEN F,ZHANG Z Z,et al.Communication-robustmulti-agent learning by adaptable auxiliary multi-agent adversary generation[J].Frontiers of Computer Science,2024,18(6):101-117.
[19]XIE A,LOSEY D,TOLSMA R,et al.Learning latent representations to influence multi-agent interaction[C]//Conference on Robot Learning.2020:575-588.
[20]RYU H,SHIN H,PARK J.Remax:Relational representationfor multi-agent exploration[C]//International Conference on Autonomous Agents and Multiagent Systems.2022:1137-1145.
[21]GUESTRIN C,LAGOUDAKIS M G,PARR R.Coordinated reinforcement learning[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:98-105.
[22]KOK J R,VLASSIS N.Collaborative multiagent reinforcement learning by payoff propagation[J].Journal of Machine Learning Research,2006,7(9):1789-1828.
[23]RASHID T,SAMVELYAN M,DE WITT C S,et al.QMIX:Monotonic value function factorization for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.2018:6846-6859.
[24]SON K,KIM D,KANG W J,et al.QTRAN:Learning to facto-rize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Lear-ning.2019:10329-10346.
[25]GUESTRIN C,VENKATARAMAN S,KOLLER D.Context-specific multiagent coordination and planning with factored MDPs[C]//AAAI Spring Symposium on Collaborative Learning Agents.2002:17-24.
[26]LEMAIGNAN S,WARNIER M,SISBOT E A,et al.Artificial cognition for social human-robot interaction:An implementation[J].Artificial Intelligence,2017(247):45-69.
[27]KIM W,PARK J,SUNG Y.Communication in multi-agent reinforcement learning:Intention sharing[C]//International Confe-rence on Learning Representations.2021.
[28]LIU J W,WANG H X.Graph Isomorphism Network for Speech Emotion Recognition[C]//Interspeech Conference.2021:3405-3409.
[29]WEN M N,KUBA J G,LIN R J,et al.Multi-agent reinforcement learning is a sequence modeling problem[C]//Neural Information Processing Systems.2022.
[30]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[C]//International Conference on Autonomous Agents and Multi-Agent Systems.2018:2085-2087.
[31]ZHOU M,LIU Z,SUI P,et al.Learning implicit credit assignment for cooperative multi-agent reinforcement learning[C]//Conference on Neural Information Processing Systems.2020.
[32]GAO J P,WANG G X,GAO L.LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update[J].Journal of Jilin University(Engineering and Technology Edition),2024,54(3):797-806.
[33]WANG Y H,HAN B N,WANG T H,et al.DOP:Off-policy multi-agent decomposed policy gradients[C]//International Conference on Learning Representations.2021.
[34]GE H W,GE Z X,SUN L,et al.Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning[J].Applied Intelligence,2022,52(9):9701-9716.
[35]KUBA J G,CHEN R Q,WEN M N,et al.Trust region policy optimization in multi-agent reinforcement learning[C]//International Conference on Learning Representations.2022.
[36]GRUVER N,SONG J M,KOCHENDERFER M,et al.Multi-agent adversarial inverse reinforcement learning with latent vari-ables[C]//International Conference on Autonomous Agents and Multiagent Systems.2020:1855-1857.
[37]LASKIN M,SRINIVAS A,ABBEEL P.CURL:Contrastive unsupervised representations for reinforcement learning[C]//International Conference on Machine Learning.2020:5595-5606.
[38]EYSENBACH B,ZHANG T J,LEVINE S,et al.Contrastivelearning as goal-conditioned reinforcement learning[C]//Conference on Neural Information Processing Systems.2022.
[39]YARATS D,FERGUS R,KOSTRIKOV I.Image augmentation is all you need:Regularizing deep reinforcement learning from pixels[C]//International Conference on Learning Representations.2021.
[40]FRANS A O,CHRISTOPHER A.A concise introduction to decentralized POMDPs[M].Cham:Springer,2016.
[41]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensio-nal continuous control using generalized advantage estimation[C]//International Conference on Learning Representations.2016.
[1] FAN Jiabin, WANG Baohui, CHEN Jixuan. Method for Symbol Detection in Substation Layout Diagrams Based on Text-Image MultimodalFusion [J]. Computer Science, 2026, 53(1): 206-215.
[2] ZHANG Xiaomin, ZHAO Junzhi, HE Hongjie. Screen-shooting Resilient Watermarking Method for Document Image Based on Attention Mechanism [J]. Computer Science, 2026, 53(1): 413-422.
[3] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[4] ZHU Shihao, PENG Kexing, MA Tinghuai. Graph Attention-based Grouped Multi-agent Reinforcement Learning Method [J]. Computer Science, 2025, 52(9): 330-336.
[5] GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[6] LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109.
[7] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[8] XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[9] FANG Chunying, HE Yuankun, WU Anxin. Emotion Recognition Based on Brain Network Connectivity and EEG Microstates [J]. Computer Science, 2025, 52(7): 201-209.
[10] PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[11] LI Weirong, YIN Jibin. FB-TimesNet:An Improved Multimodal Emotion Recognition Method Based on TimesNet [J]. Computer Science, 2025, 52(6A): 240900046-8.
[12] YIN Wencui, XIE Ping, YE Chengxu, HAN Jiaxin, XIA Xing. Anomaly Detection of Multi-variable Time Series Data Based on Variational Graph Auto-encoders [J]. Computer Science, 2025, 52(6A): 240700124-8.
[13] ZHANG Yongyu, GUO Chenjuan, WEI Hanyue. Deep Learning Stock Price Probability Prediction Based on Multi-modal Feature Wavelet Decomposition [J]. Computer Science, 2025, 52(6A): 240600140-11.
[14] SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9.
[15] XU Yutao, TANG Shouguo. Visual Question Answering Integrating Visual Common Sense Features and Gated Counting Module [J]. Computer Science, 2025, 52(6A): 240800086-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!