计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 319-326.doi: 10.11896/jsjkx.230600129

• 人工智能 • 上一篇    下一篇

基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法

王宪伟1, 冯翔1,2, 虞慧群1,2   

  1. 1 华东理工大学计算机科学与工程系 上海 200237
    2 上海智慧能源工程技术研究中心 上海 200237
  • 收稿日期:2023-06-16 修回日期:2023-11-16 出版日期:2024-07-15 发布日期:2024-07-10
  • 通讯作者: 冯翔(xfeng@ecust.edu.cn)
  • 作者简介:(y30211041@mail.ecust.edu.cn)
  • 基金资助:
    国家自然科学基金面上项目(62276097);国家自然科学基金重点项目(62136003);国家重点研发计划(2020YFB1711700);上海市经信委“信息化发展专项资金”(XX-XXFZ-02-20-2463);上海市科技创新行动计划(21002411000)

Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic

WANG Xianwei1, FENG Xiang1,2, YU Huiqun1,2   

  1. 1 Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai,200237,China
    2 Shanghai Engineering Research Center of Smart Energy,Shanghai,200237,China
  • Received:2023-06-16 Revised:2023-11-16 Online:2024-07-15 Published:2024-07-10
  • About author:WANG Xianwei,born in 1999,postgra-duate,is a member of CCF(No.P2627G).His main research interests include reinforcement learning and robot navigation.
    FENG Xiang,born in 1977,Ph.D,professor,is a member of CCF(No.16665M).Her main research interests include distributed swarm intelligence and evolutionary computing,reinforcement learning,and big data intelligence.
  • Supported by:
    National Natural Science Foundation of China(62276097),Key Program of National Natural Science Foundation of China(62136003),National Key Research and Development Program of China( 2020YFB1711700),Special Fund for Information Development of Shanghai Economic and Information Commission(XX-XXFZ-02-20-2463) and Scientific Research Program of Shanghai Science and Technology Commission(21002411000).

摘要: 动态障碍物一直是阻碍智能体自主导航发展的关键因素,而躲避障碍物和清理障碍物是两种解决动态障碍物问题的有效方法。近年来,多智能体躲避动态障碍物(避障)问题受到了广大学者的关注,优秀的多智能体避障算法纷纷涌现。然而,多智能体清理动态障碍物(清障)问题却无人问津,相对应的多智能体清障算法更是屈指可数。为解决多智能体清障问题,文中提出了一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法(Multi-Agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic Policy Gradient and Attention Critic,MACOC)。首先,创建了首个多智能体协同清障的环境模型,定义了多智能体及动态障碍物的运动学模型,并根据智能体和动态障碍物数量的不同,构建了4种仿真实验环境;其次,将多智能体协同清障过程定义为马尔可夫决策过程(Markov Decision Process,MDP),构建了多智能体t的状态空间、动作空间和奖励函数;最后,提出一种基于深度确定性策略梯度与注意力Critic的多智能体协同清障算法,并在多智能体协同清障仿真环境中与经典的多智能体强化学习算法进行对比。实验证明,相比对比算法,所提出的MACOC算法清障的成功率更高、速度更快,对复杂环境的适应性更好。

关键词: 强化学习算法, 马尔可夫决策过程, 多智能体协同控制, 动态障碍物清除, 注意力机制

Abstract: Dynamic obstacles have always been a key factor hindering the development of autonomous navigation for agents.Obstacle avoidance and obstacle clearance are two effective methods to address the issue.In recent years,multi-agent obstacle avoi-dance(collision avoidance) has been an active research area,and there are numerous excellent multi-agent obstacle avoidance algorithms.However,the problem of multi-agent obstacle clearance remains relatively unknown,and the corresponding algorithms for multi-agent obstacle clearance are scarce.To address the issue of multi-agent obstacle clearance,a multi-agent cooperative algorithm for obstacle clearance based on deep deterministic policy gradient and attention Critic(MACOC) is proposed.Firstly,the first multi-agent cooperative environment model for obstacle clearance is created,and the kinematic models of the agents and dynamic obstacles are defined.Four simulation environments are constructed based on different numbers of agents and dynamic obstacles.Secondly,the process of obstacle clearance cooperatively by multi-agent is defined as a Markov decision process(MDP) model.The state space,action space,and reward function for multi-agent are constructed.Finally,a multi-agent cooperative algorithm for obstacle clearance based on deep deterministic policy gradient and attention critic is proposed,and it is compared with classical multi-agent algorithms in the simulated environments for obstacle clearance.Experimental results show that,the proposed MACOC algorithm has a higher success rate in obstacle clearance,faster speed,and better adaptability to complex environments compared to the compared algorithms.

Key words: Reinforcement learning algorithm, Markov decision process, Multi-agent cooperative control, Dynamic obstacle clea-rance, Attention mechanism

中图分类号: 

  • TP183
[1]NTAKOLIA C,MOUSTAKIDIS S,SIOURAS A.Autonomouspath planning with obstacle avoidance for smart assistive systems[J].Expert Systems with Applications,2023,213:119049.
[2]CORNO M,GIMONDI A,PANZANI G,et al.A non-optimization-based dynamic path planning for autonomous obstacle avoidance[J].IEEE Transactions on Control Systems Technology,2022,31(2):722-734.
[3]DING J,GAO L,LIU W,et al.Monocular camera-based complex obstacle avoidance via efficient deep reinforcement learning[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(2):756-770.
[4]LI Z,LI J,WANG W.Path Planning and Obstacle AvoidanceControl for Autonomous Multi-Axis Distributed Vehicle Based on Dynamic Constraints[J].arXiv:1312.7572,2013.
[5]NAYYAR M,WAGNER A R.Aiding Emergency EvacuationsUsing Obstacle-Aware Path Clearing[C]//2021 IEEE International Conference on Advanced Robotics and Its Social Impacts(ARSO).IEEE,2021:7-14.
[6]LU X,JIA Y.Scaled Event-Triggered Resilient Consensus Control of Continuous-Time Multi-Agent Systems Under Byzantine Agents[J].IEEE Transactions on Network Science and Engineering,2022,10(2):1157-1174.
[7]ORR J,DUTTA A.Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications:A Survey[J].Sensors,2023,23(7):3625-3625.
[8]YU Y,GUO J,CHADLI M,et al.Distributed adaptive fuzzy formation control of uncertain multiple unmanned aerial vehicles with actuator faults and switching topologies[J].IEEE Transactions on Fuzzy Systems,2022,31(3):919-929.
[9]DENG Z,YANG K,SHEN W,et al.Cooperative Platoon Formation of Connected and Autonomous Vehicles:Toward Efficient Merging Coordination at Unsignalized Intersections[J].IEEE Transactions on Intelligent Transportation Systems,2023,24(5):5625-5639.
[10]HAO Q,XU F,CHEN L,et al.Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information[J].ACM Transactions on Intelligent Systems and Technology,2022,14(1):1-27.
[11]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control throughdeep reinforcement learning[J].Nature,2015,518(7540):529-533.
[12]SILVER D,HUANG A,MADDISON C J,et al.Mastering thegame of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[13]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[14]LI P Y,TANG H Y,YANG T P,et al.Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration[C] //2022 International Conference on Machine Learning.2022:12979-12997.
[15]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[16]YU C,VELU A,VINITSKY E,et al.The surprising effectiveness of ppo in cooperative multi-agent games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624.
[17]LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).2017:6382-6393.
[18]FAFSAR M M,CRUMP T,FAR B.Reinforcement learningbased recommender systems:A survey[J].ACM Computing Surveys,2022,55(7):1-38.
[19]ZHAO F,WANG Z,WANG L,et al.A multi-agent reinforcement learning driven artificial bee colony algorithm with the central controller[J].Expert Systems with Applications,2023,219:119672.
[20]REN J,GUO S,CHEN F.Orientation-preserving rewards' ba-lancing in reinforcement learning[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(11):6458-6472.
[21]OROOJLOOY A,HAJINEZHAD D.A review of cooperativemulti-agent deep reinforcement learning[J].Applied Intelligence,2023,53(11):13677-13722.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!