计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 319-326.doi: 10.11896/jsjkx.230600129
王宪伟1, 冯翔1,2, 虞慧群1,2
WANG Xianwei1, FENG Xiang1,2, YU Huiqun1,2
摘要: 动态障碍物一直是阻碍智能体自主导航发展的关键因素,而躲避障碍物和清理障碍物是两种解决动态障碍物问题的有效方法。近年来,多智能体躲避动态障碍物(避障)问题受到了广大学者的关注,优秀的多智能体避障算法纷纷涌现。然而,多智能体清理动态障碍物(清障)问题却无人问津,相对应的多智能体清障算法更是屈指可数。为解决多智能体清障问题,文中提出了一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法(Multi-Agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic Policy Gradient and Attention Critic,MACOC)。首先,创建了首个多智能体协同清障的环境模型,定义了多智能体及动态障碍物的运动学模型,并根据智能体和动态障碍物数量的不同,构建了4种仿真实验环境;其次,将多智能体协同清障过程定义为马尔可夫决策过程(Markov Decision Process,MDP),构建了多智能体t的状态空间、动作空间和奖励函数;最后,提出一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法,并在多智能体协同清障仿真环境中与经典的多智能体强化学习算法进行对比。实验证明,相比对比算法,所提出的MACOC算法清障的成功率更高、速度更快,对复杂环境的适应性更好。
中图分类号:
[1]NTAKOLIA C,MOUSTAKIDIS S,SIOURAS A.Autonomouspath planning with obstacle avoidance for smart assistive systems[J].Expert Systems with Applications,2023,213:119049. [2]CORNO M,GIMONDI A,PANZANI G,et al.A non-optimization-based dynamic path planning for autonomous obstacle avoidance[J].IEEE Transactions on Control Systems Technology,2022,31(2):722-734. [3]DING J,GAO L,LIU W,et al.Monocular camera-based complex obstacle avoidance via efficient deep reinforcement learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(2):756-770. [4]LI Z,LI J,WANG W.Path Planning and Obstacle AvoidanceControl for Autonomous Multi-Axis Distributed Vehicle Based on Dynamic Constraints[J].arXiv:1312.7572,2013. [5]NAYYAR M,WAGNER A R.Aiding Emergency EvacuationsUsing Obstacle-Aware Path Clearing[C]//2021 IEEE International Conference on Advanced Robotics and Its Social Impacts(ARSO).IEEE,2021:7-14. [6]LU X,JIA Y.Scaled Event-Triggered Resilient Consensus Control of Continuous-Time Multi-Agent Systems Under Byzantine Agents[J].IEEE Transactions on Network Science and Engineering,2022,10(2):1157-1174. [7]ORR J,DUTTA A.Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications:A Survey[J].Sensors,2023,23(7):3625-3625. [8]YU Y,GUO J,CHADLI M,et al.Distributed adaptive fuzzy formation control of uncertain multiple unmanned aerial vehicles with actuator faults and switching topologies[J].IEEE Transactions on Fuzzy Systems,2022,31(3):919-929. [9]DENG Z,YANG K,SHEN W,et al.Cooperative Platoon Formation of Connected and Autonomous Vehicles:Toward Efficient Merging Coordination at Unsignalized Intersections[J].IEEE Transactions on Intelligent Transportation Systems,2023,24(5):5625-5639. [10]HAO Q,XU F,CHEN L,et al.Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information[J].ACM Transactions on Intelligent Systems and Technology,2022,14(1):1-27. [11]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control throughdeep reinforcement learning[J].Nature,2015,518(7540):529-533. [12]SILVER D,HUANG A,MADDISON C J,et al.Mastering thegame of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [13]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [14]LI P Y,TANG H Y,YANG T P,et al.Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration[C] //2022 International Conference on Machine Learning.2022:12979-12997. [15]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018. [16]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624. [17]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).2017:6382-6393. [18]FAFSAR M M,CRUMP T,FAR B.Reinforcement learningbased recommender systems:A survey[J].ACM Computing Surveys,2022,55(7):1-38. [19]ZHAO F,WANG Z,WANG L,et al.A multi-agent reinforcement learning driven artificial bee colony algorithm with the central controller[J].Expert Systems with Applications,2023,219:119672. [20]REN J,GUO S,CHEN F.Orientation-preserving rewards' ba-lancing in reinforcement learning[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(11):6458-6472. [21]OROOJLOOY A,HAJINEZHAD D.A review of cooperativemulti-agent deep reinforcement learning[J].Applied Intelligence,2023,53(11):13677-13722. |
|