计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 317-325.doi: 10.11896/jsjkx.240900012

• 人工智能 • 上一篇    下一篇

基于拥塞感知和缓存通信的多智能体路径规划

张永良1, 李子文1, 许家豪1, 江雨宸2, 崔滢1   

  1. 1 浙江工业大学计算机科学与技术学院 杭州 310014
    2 湖州师范学院信息工程学院 浙江 湖州 313000
  • 收稿日期:2024-09-02 修回日期:2024-11-22 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 张永良(titanzhang@zjut.edu.cn)
  • 基金资助:
    国家自然科学基金青年基金(62102364);浙江省自然科学基金(LY22F020016)

Congestion-aware and Cached Communication for Multi-agent Pathfinding

ZHANG Yongliang1, LI Ziwen1, XU Jiahao 1, JIANG Yuchen 2, CUI Ying 1   

  1. 1 College of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310014,China
    2 School of Information Engineering,Huzhou University,Huzhou,Zhejiang 313000,China
  • Received:2024-09-02 Revised:2024-11-22 Online:2025-08-15 Published:2025-08-08
  • About author:ZHANG Yongliang,born in 1977,Ph.D,associate professor.His main research interests include biometric features re-cognition,artificial intelligence and multirobot system.
  • Supported by:
    National Natural Science Foundation of China(62102364) and Natural Science Foundation of Zhejiang Province(LY22F020016).

摘要: 多智能体路径规划任务(MAPF)是大规模机器人系统的重要组成部分。基于冲突搜索的传统规划器受限于计算时间,导致可扩展性低,而基于通信机制的多智能体强化学习策略显著改善了这一问题。随着任务规模的扩大,如何有效通信和避免拥塞成为基于学习方法的主要障碍。针对这些问题,提出了一种基于缓存通信并具备拥塞感知能力的分布式规划器(C3MAP),在合理降低通信频率的同时保持优异的求解成功率。具体而言,当且仅当智能体的可观测信息与上一次通信内容存在显著差异或接收到其他智能体传来的广播请求信号时,才对局部视野内的智能体进行广播通信;同时,引入拥塞信息作为局部可观测信息,以指导智能体避开拥塞区域。基准测试的实验结果表明,C3MAP在结构化场景中的求解成功率均高于90%,显著优于现有基于学习的方法,且在大规模场景实验中进一步验证了缓存通信机制优越的稳定性以及拥塞感知的有效性。

关键词: 多智能体系统, 路径规划, 深度强化学习, 拥塞感知, 缓存通信

Abstract: Multi-agent Path Finding(MAPF) is an essential component of large-scale robotic systems.Traditional planners based on conflict search are limited in scalability due to computation time,whereas multi-agent reinforcement learning strategies based on communication mechanisms significantly alleviate this issue.As task complexity and scale increase,how to effectively communicate and avoid congestion becomes significant obstacles for learning-based methods.To overcome these challenges,this paper proposes a decentralized planning method called Congestion-Aware and Cached Communication for Multi-agent Pathfinding(C3MAP),which features cache-based communication and congestion-awareness capabilities.Specifically,agents broadcast communications to neighbors only when the current environmental information significantly differs from the previous communication or when receiving request signals from other agents.Additionally,congestion information is incorporated as locally observable information to guide agents in avoiding congested areas.Experimental results on benchmarks indicate that C3MAP achieves a solution success rate of over 90% in structured scenarios,significantly outperforming existing learning-based methods.Additionally,experiments in large-scale environments confirm the greater stability of the caching communication mechanism and the effectiveness of the congestion awareness strategy.

Key words: Multi-agent system, Path planning, Deep reinforcement learning, Congestion aware, Cached communication

中图分类号: 

  • TP391
[1]WANG Z H,TONG X R.Research Progress of Multi-agentPath Finding Based on Conflict-based Search Algorithms[J].Computer Science,2023,50(6):358-368.
[2]XIN Y X,HUA D Y,ZHANG L.Multi-agent Reinforcement Learning Algorithm Based on AI Planning[J].Computer Science,2024,51(5):179-192.
[3]STERN R,STURTEVANT N,FELNER A,et al.Multi-agent pathfinding:Definitions,variants,and benchmarks[C]//Proceedings of the International Symposium on Combinatorial Search.AAAI,2019:151-158.
[4]BANFI J,BASILICO N,AMIGONI F.Intractability of time-optimal multirobot path planning on 2D grid graphs with holes[J].IEEE Robotics and Automation Letters,2017,2(4):1941-1947.
[5]SHARON G,STERN R,FELNER A,et al.Conflict-basedsearch for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219:40-66.
[6]BARERM,SHARON G,STERN R,et al.Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem[C]//Proceedings of the International Sympo-sium on Combinatorial Search.AAAI,2014:19-27.
[7]SHI D X,PENG Y X,YANG H H,et al.DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Lear-ning[J].Computer Science,2024,51(2):268-277.
[8]SARTORETTI G,KERR J,SHI Y,et al.Primal:Pathfindingvia reinforcement and imitation multi-agent learning[J].IEEE Robotics and Automation Letters,2019,4(3):2378-2385.
[9]CHUNG J,FAYY AD J,YOUNES Y A,et al.Learning team-based navigation:a review of de-ep reinforcement learning techniques for multi-agent pathfinding[J].Artificial Intelligence Review,2024,57(2):41.
[10]MA Z,LUO Y,PAN J.Learning selective communication formulti-agent path finding[J].IEEE Robotics and Automation Letters,2021,7(2):1455-1462.
[11]ZHANG S Q,ZHANG Q,LIN J.Succinct and robust multi-agent communication with temporal message control[J].Advances in Neural Information Processing Systems,2020,33:17271-17282.
[12]LI W,CHEN H,JIN B,et al.Multi-agent path finding with prioritized communication learning[C]//2022 International Conference on Robotics and Automation(ICRA).IEEE,2022:10695-10701.
[13]CHEN L,WANG Y,MIAO Z,et al.Transformer-based imitative reinforcement learning for multirobot path planning[J].IEEE Transactions on Industrial Informatics,2023,19(10):10233-10243.
[14]MA Z,LUO Y,MA H.Distributed heuristic multi-agent path finding with communication [C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8699-8705.
[15]WANG Y,XIANG B,HUANG S,et al.SCRIMP:Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2023:9301-9308.
[16]SONG Z,ZHANG R,CHENG X.HELSA:Hierarchical Rein-forcement Learning with Spatiotemporal Abstraction for Large-Scale Multi-Agent Path Finding[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2023:7318-7325.
[17]PHAN T,DRISCOLL J,ROMBERG J,et al.Confidence-Based Curriculum Learning for Multi-Agent Path Finding[J].arXiv:2401.05860,2024.
[18]SKRYNNIK A,ANDREYCHUK A,NESTEROVA M,et al.Learn to Follow:Decentralized Lifelong Multi-Agent Pathfin-ding via Planning and Learning [C]//2024 AAAI Conference on Artificial Intelligence.AAAI,2024:17541-17549.
[19]SHI D X,HU H M,SONG L N,et al.Multi-agent Reinforcement Learning Method Based on Observation Reconstruction[J].Computer Science,2024,51(4):280-290.
[20]KIM D,MOON S,HOSTALLERO D,et al.Learning to schedule communication in multi-agent reinforcement learning[J].arXiv:1902.01554.2019.
[21]ZHANG S Q,ZHANG Q,LIN J.Efficient communication inmulti-agent reinforcement learning via variance based control[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2019:3230-3239.
[22]DING Z,HUANG T,LU Z.Learning individually inferred communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2020:22069-22079.
[23]STERN R,STURTEVANT N,FELNER A,et al.Multi-agent pathfinding:Definitions,variants,and benchmarks[C]//Proceedings of the International Symposium on Combinatorial Search(SoCS).AAAI,2019:151-158.
[24]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2017:5998-6008.
[25]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.ACM,1993:330-337.
[26]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2016:2094-2100.
[27]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.ACM,2016:1995-2003.
[28]HORGAN D,QUAN J,BUDDEN D,et al.Distributed priori-tized experience replay[J].arXiv:1803.00933,2018.
[29]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:41-48.
[30]LIN Q,MA H.SACHA:Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding[J].IEEE Robotics and Automation Letters,2023,8:5100-5107.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!