计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 317-325.doi: 10.11896/jsjkx.240900012
张永良1, 李子文1, 许家豪1, 江雨宸2, 崔滢1
ZHANG Yongliang1, LI Ziwen1, XU Jiahao 1, JIANG Yuchen 2, CUI Ying 1
摘要: 多智能体路径规划任务(MAPF)是大规模机器人系统的重要组成部分。基于冲突搜索的传统规划器受限于计算时间,导致可扩展性低,而基于通信机制的多智能体强化学习策略显著改善了这一问题。随着任务规模的扩大,如何有效通信和避免拥塞成为基于学习方法的主要障碍。针对这些问题,提出了一种基于缓存通信并具备拥塞感知能力的分布式规划器(C3MAP),在合理降低通信频率的同时保持优异的求解成功率。具体而言,当且仅当智能体的可观测信息与上一次通信内容存在显著差异或接收到其他智能体传来的广播请求信号时,才对局部视野内的智能体进行广播通信;同时,引入拥塞信息作为局部可观测信息,以指导智能体避开拥塞区域。基准测试的实验结果表明,C3MAP在结构化场景中的求解成功率均高于90%,显著优于现有基于学习的方法,且在大规模场景实验中进一步验证了缓存通信机制优越的稳定性以及拥塞感知的有效性。
中图分类号:
[1]WANG Z H,TONG X R.Research Progress of Multi-agentPath Finding Based on Conflict-based Search Algorithms[J].Computer Science,2023,50(6):358-368. [2]XIN Y X,HUA D Y,ZHANG L.Multi-agent Reinforcement Learning Algorithm Based on AI Planning[J].Computer Science,2024,51(5):179-192. [3]STERN R,STURTEVANT N,FELNER A,et al.Multi-agent pathfinding:Definitions,variants,and benchmarks[C]//Proceedings of the International Symposium on Combinatorial Search.AAAI,2019:151-158. [4]BANFI J,BASILICO N,AMIGONI F.Intractability of time-optimal multirobot path planning on 2D grid graphs with holes[J].IEEE Robotics and Automation Letters,2017,2(4):1941-1947. [5]SHARON G,STERN R,FELNER A,et al.Conflict-basedsearch for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219:40-66. [6]BARERM,SHARON G,STERN R,et al.Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem[C]//Proceedings of the International Sympo-sium on Combinatorial Search.AAAI,2014:19-27. [7]SHI D X,PENG Y X,YANG H H,et al.DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Lear-ning[J].Computer Science,2024,51(2):268-277. [8]SARTORETTI G,KERR J,SHI Y,et al.Primal:Pathfindingvia reinforcement and imitation multi-agent learning[J].IEEE Robotics and Automation Letters,2019,4(3):2378-2385. [9]CHUNG J,FAYY AD J,YOUNES Y A,et al.Learning team-based navigation:a review of de-ep reinforcement learning techniques for multi-agent pathfinding[J].Artificial Intelligence Review,2024,57(2):41. [10]MA Z,LUO Y,PAN J.Learning selective communication formulti-agent path finding[J].IEEE Robotics and Automation Letters,2021,7(2):1455-1462. [11]ZHANG S Q,ZHANG Q,LIN J.Succinct and robust multi-agent communication with temporal message control[J].Advances in Neural Information Processing Systems,2020,33:17271-17282. [12]LI W,CHEN H,JIN B,et al.Multi-agent path finding with prioritized communication learning[C]//2022 International Conference on Robotics and Automation(ICRA).IEEE,2022:10695-10701. [13]CHEN L,WANG Y,MIAO Z,et al.Transformer-based imitative reinforcement learning for multirobot path planning[J].IEEE Transactions on Industrial Informatics,2023,19(10):10233-10243. [14]MA Z,LUO Y,MA H.Distributed heuristic multi-agent path finding with communication [C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8699-8705. [15]WANG Y,XIANG B,HUANG S,et al.SCRIMP:Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2023:9301-9308. [16]SONG Z,ZHANG R,CHENG X.HELSA:Hierarchical Rein-forcement Learning with Spatiotemporal Abstraction for Large-Scale Multi-Agent Path Finding[C]//2023 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS).IEEE,2023:7318-7325. [17]PHAN T,DRISCOLL J,ROMBERG J,et al.Confidence-Based Curriculum Learning for Multi-Agent Path Finding[J].arXiv:2401.05860,2024. [18]SKRYNNIK A,ANDREYCHUK A,NESTEROVA M,et al.Learn to Follow:Decentralized Lifelong Multi-Agent Pathfin-ding via Planning and Learning [C]//2024 AAAI Conference on Artificial Intelligence.AAAI,2024:17541-17549. [19]SHI D X,HU H M,SONG L N,et al.Multi-agent Reinforcement Learning Method Based on Observation Reconstruction[J].Computer Science,2024,51(4):280-290. [20]KIM D,MOON S,HOSTALLERO D,et al.Learning to schedule communication in multi-agent reinforcement learning[J].arXiv:1902.01554.2019. [21]ZHANG S Q,ZHANG Q,LIN J.Efficient communication inmulti-agent reinforcement learning via variance based control[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2019:3230-3239. [22]DING Z,HUANG T,LU Z.Learning individually inferred communication for multi-agent cooperation[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2020:22069-22079. [23]STERN R,STURTEVANT N,FELNER A,et al.Multi-agent pathfinding:Definitions,variants,and benchmarks[C]//Proceedings of the International Symposium on Combinatorial Search(SoCS).AAAI,2019:151-158. [24]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems(NeurIPS).MIT Press,2017:5998-6008. [25]TAN M.Multi-agent reinforcement learning:Independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.ACM,1993:330-337. [26]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2016:2094-2100. [27]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.ACM,2016:1995-2003. [28]HORGAN D,QUAN J,BUDDEN D,et al.Distributed priori-tized experience replay[J].arXiv:1803.00933,2018. [29]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:41-48. [30]LIN Q,MA H.SACHA:Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding[J].IEEE Robotics and Automation Letters,2023,8:5100-5107. |
|