计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 156-164.doi: 10.11896/jsjkx.220900031

• 人工智能 • 上一篇    下一篇

基于意图的多智能体深度强化学习运动规划方法

彭滢璇1, 史殿习1,2,3, 杨焕焕1, 胡浩萌1, 杨绍武1   

  1. 1 国防科技大学计算机学院 长沙410073
    2 军事科学院国防科技创新研究院 北京100166
    3 天津(滨海)人工智能创新中心 天津300457
  • 收稿日期:2022-09-05 修回日期:2022-12-12 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 史殿习(dxshi@nudt.edu.cn)
  • 作者简介:(pengyingxuan@nudt.edu.cn)
  • 基金资助:
    国家自然科学基金(91948303)

Intention-based Multi-agent Motion Planning Method with Deep Reinforcement Learning

PENG Yingxuan1, SHI Dianxi1,2,3, YANG Huanhuan1, HU Haomeng1, YANG Shaowu1   

  1. 1 School of Computer Science,National University of Defense Technology,Changsha 410073,China
    2 National Innovation Institute of Defense Technology,Academy of Military Sciences,Beijing 100166,China
    3 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
  • Received:2022-09-05 Revised:2022-12-12 Online:2023-10-10 Published:2023-10-10
  • About author:PENG Yingxuan,born in 1998,postgraduate.Her main research interests include artificial intelligence,multi-agent collaboration,reinforcement lear-ning and machine learning.SHI Dianxi,born in 1966,Ph.D,professor,Ph.D supervisor.His main research interests include distributed object middleware technology,adaptive software technology,artificial intelligence,and robot operating systems.
  • Supported by:
    National Natural Science Foundation of China(91948303).

摘要: 现有的多智能体运动规划任务存在缺乏有效合作方法、通信依赖要求高以及缺乏信息筛选机制等问题。针对这些问题,提出了一种基于意图的多智能体深度强化学习运动规划方法,该方法可以帮助智能体在无需显式通信的条件下无碰撞地到达目标点。首先,将意图概念引入多智能体运动规划问题,将智能体的视觉图像和历史地图相结合以预测智能体的意图,使智能体可以对其他智能体的动作做预判,从而有效协作;其次,设计了一个基于注意力机制的卷积神经网络架构,并利用该网络预测智能体的意图、选择智能体的动作,在筛选出有用的视觉输入信息的同时,减少了多智能体合作对通信的依赖;最后提出了一种基于价值的深度强化学习算法来学习运动规划策略,通过改进目标函数和Q值计算方式使策略更加稳定。在PyBullet仿真平台的6种不同的仿真场景中进行了测试,实验结果表明,相较于其他先进的多智能体运动规划方法,所提方法使多智能体团队的合作效率平均提高了10.74%,具有显著的性能优势。

关键词: 意图, 注意力机制, 多智能体系统, 运动规划, 深度强化学习

Abstract: The challenges of multi-agent motion planning lie in the lack of effective cooperative approaches,high communication dependency requirements,and the lack of information screening mechanisms.To this end,an intention-based multi-agent deep reinforcement learning motion planning method is proposed,which can help agents reach goals while avoiding collisions without explicit communication.Firstly,the concept of intention is introduced into the multi-agent motion planning problem by combining the visual images with the history maps to predict the intentions of agents,so that agents can anticipate the actions of other agents and thus collaborate effectively.Secondly,a convolutional neural network architecture based on attention mechanism is designed.This network architecture can be used to predict the intentions of agents and select the actions of agents,filtering the useful visual input information while reducing the reliance on communication for multi-agent cooperation.Thirdly,a value-based deep reinforcement learning algorithm is proposed to learn the motion planning strategy.By improving the objective function and the calculation of the Q values,the strategy is made more stable.Tested in six different PyBullet simulation scenes,the experimental results demonstrate that the proposed method improves the cooperation efficiency of multi-agent teams by an average of 10.74% with significant performance advantages compared to other advanced multi-agent motion planning methods.

Key words: Intention, Attention mechanism, Multi-agent system, Motion planning, Deep reinforcement learning

中图分类号: 

  • TP391
[1]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83.
[2]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:A survey[C]//2006 9th International Conference on Control,Automation,Robotics and Vision.IEEE,2006:1-6.
[3]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797.
[4]HOLTE R C,PEREZ M B,ZIMMER R M,et al.HierarchicalA*:Searching abstraction hierarchies efficiently[C]//AAAI/IAAI,Vol.1.1996:530-535.
[5]DORIGO M,MANIEZZO V,COLORNI A.The ant system:An autocatalytic optimizing process[J].Clustering,1991,3(12):340.
[6]KHATIB O.Real-time obstacle avoidance system for manipula-tors and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation.St.Louis,MO,USA,1985:25-28.
[7]PU Z Q,YI J Q,LIU Z,et al.A review of collaborative know-ledge and Data driven swarm intelligent decision making[J].Acta Automatica Sinica,2022,48(3):1-17.
[8]NGUYEN T T,NGUYEN N D,NAHAVANDI S.Deep reinforcement learning for multiagent systems:A review of challenges,solutions,and applications[J].IEEE Transactions on Cybernetics,2020,50(9):3826-3839.
[9]FOERSTER J,ASSAEL I A,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2016,29:2137-2145.
[10]SARTORETTI G,KERR J,SHI Y,et al.Primal:Pathfindingvia reinforcement and imitation multi-agent learning[J].IEEE Robotics and Automation Letters,2019,4(3):2378-2385.
[11]HAN R,CHEN S,HAO Q.Cooperative multi-robot navigation in dynamic environment with deep reinforcement learning[C]//2020 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2020:448-454.
[12]SINGH A,JAIN T,SUKHBAATAR S.Learning when to communicate at scale in multiagent cooperative and competitive tasks[J].arXiv:1812.09755,2018.
[13]WU J,SUN X,ZENG A,et al.Spatial intention maps for multi-agent mobile manipulation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8749-8756.
[14]JARADAT M A K,AL-ROUSAN M,QUADAN L.Reinforce-ment based mobile robot navigation in dynamic environment[J].Robotics and Computer-Integrated Manufacturing,2011,27(1):135-149.
[15]DUGULEANA M,MOGAN G.Neural networks based rein-forcement learning for mobile robots obstacle avoidance[J].Expert Systems with Applications,2016,62:104-115.
[16]XIE L,WANG S,MARKHAM A,et al.Towards monocular vision based obstacle avoidance through deep reinforcement lear-ning[J].arXiv:1706.09829,2017.
[17]KAHN G,VILLAFLOR A,DING B,et al.Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:5129-5136.
[18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[19]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decompo-sition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[20]LONG P,FAN T,LIAO X,et al.Towards optimally decentra-lized multi-robot collision avoidance via deep reinforcement lear-ning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6252-6259.
[21]LIN J,YANG X,ZHENG P,et al.End-to-end decentralizedmulti-robot navigation in unknown complex environments via deep reinforcement learning[C]//2019 IEEE International Conference on Mechatronics and Automation(ICMA).IEEE,2019:2493-2500.
[22]MOHSENI-KABIR A,ISELE D,FUJIMURA K.Interaction-aware multi-agent reinforcement learning for mobile agents with individual goals[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:3370-3376.
[23]WANG B,LIU Z,LI Q,et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Robotics and Automation Letters,2020,5(4):6932-6939.
[24]LIU Z,CHEN B,ZHOU H,et al.Mapper:Multi-agent pathplanning with evolutionary reinforcement learning in mixed dynamic environments[C]//2020 IEEE/RSJ International Confe-rence on Intelligent Robots and Systems(IROS).IEEE,2020:11748-11754.
[25]WANG D,DENG H.Multirobot coordination with deep rein-forcement learning in complex environments[J].Expert Systems with Applications,2021,180:115128.
[26]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017(30):6379-6390.
[27]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[28]FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabili-sing experience replay for deep multi-agent reinforcement lear-ning[C]//International Conference on Machine Learning.PMLR,2017:1146-1155.
[29]QI S,ZHU S C.Intent-aware multi-agent reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:7533-7540.
[30]FU J,LI W,DU J,et al.A multiscale residual pyramid attention network for medical image fusion[J].Biomedical Signal Proces-sing and Control,2021,66:102488.
[31]ZHAI Y,DING B,LIU X,et al.Decentralized multi-robot colli-sion avoidance in complex scenarios with selective communication[J].IEEE Robotics and Automation Letters,2021,6(4):8379-8386.
[32]WANG X,LIAN L,YU S X.Unsupervised visual attention and invariance for reinforcement learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6677-6687.
[33]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[34]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional blockattention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[35]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!