计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 156-164.doi: 10.11896/jsjkx.220900031
彭滢璇1, 史殿习1,2,3, 杨焕焕1, 胡浩萌1, 杨绍武1
PENG Yingxuan1, SHI Dianxi1,2,3, YANG Huanhuan1, HU Haomeng1, YANG Shaowu1
摘要: 现有的多智能体运动规划任务存在缺乏有效合作方法、通信依赖要求高以及缺乏信息筛选机制等问题。针对这些问题,提出了一种基于意图的多智能体深度强化学习运动规划方法,该方法可以帮助智能体在无需显式通信的条件下无碰撞地到达目标点。首先,将意图概念引入多智能体运动规划问题,将智能体的视觉图像和历史地图相结合以预测智能体的意图,使智能体可以对其他智能体的动作做预判,从而有效协作;其次,设计了一个基于注意力机制的卷积神经网络架构,并利用该网络预测智能体的意图、选择智能体的动作,在筛选出有用的视觉输入信息的同时,减少了多智能体合作对通信的依赖;最后提出了一种基于价值的深度强化学习算法来学习运动规划策略,通过改进目标函数和Q值计算方式使策略更加稳定。在PyBullet仿真平台的6种不同的仿真场景中进行了测试,实验结果表明,相较于其他先进的多智能体运动规划方法,所提方法使多智能体团队的合作效率平均提高了10.74%,具有显著的性能优势。
中图分类号:
[1]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83. [2]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:A survey[C]//2006 9th International Conference on Control,Automation,Robotics and Vision.IEEE,2006:1-6. [3]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. [4]HOLTE R C,PEREZ M B,ZIMMER R M,et al.HierarchicalA*:Searching abstraction hierarchies efficiently[C]//AAAI/IAAI,Vol.1.1996:530-535. [5]DORIGO M,MANIEZZO V,COLORNI A.The ant system:An autocatalytic optimizing process[J].Clustering,1991,3(12):340. [6]KHATIB O.Real-time obstacle avoidance system for manipula-tors and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation.St.Louis,MO,USA,1985:25-28. [7]PU Z Q,YI J Q,LIU Z,et al.A review of collaborative know-ledge and Data driven swarm intelligent decision making[J].Acta Automatica Sinica,2022,48(3):1-17. [8]NGUYEN T T,NGUYEN N D,NAHAVANDI S.Deep reinforcement learning for multiagent systems:A review of challenges,solutions,and applications[J].IEEE Transactions on Cybernetics,2020,50(9):3826-3839. [9]FOERSTER J,ASSAEL I A,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[J].Advances in Neural Information Processing Systems,2016,29:2137-2145. [10]SARTORETTI G,KERR J,SHI Y,et al.Primal:Pathfindingvia reinforcement and imitation multi-agent learning[J].IEEE Robotics and Automation Letters,2019,4(3):2378-2385. [11]HAN R,CHEN S,HAO Q.Cooperative multi-robot navigation in dynamic environment with deep reinforcement learning[C]//2020 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2020:448-454. [12]SINGH A,JAIN T,SUKHBAATAR S.Learning when to communicate at scale in multiagent cooperative and competitive tasks[J].arXiv:1812.09755,2018. [13]WU J,SUN X,ZENG A,et al.Spatial intention maps for multi-agent mobile manipulation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8749-8756. [14]JARADAT M A K,AL-ROUSAN M,QUADAN L.Reinforce-ment based mobile robot navigation in dynamic environment[J].Robotics and Computer-Integrated Manufacturing,2011,27(1):135-149. [15]DUGULEANA M,MOGAN G.Neural networks based rein-forcement learning for mobile robots obstacle avoidance[J].Expert Systems with Applications,2016,62:104-115. [16]XIE L,WANG S,MARKHAM A,et al.Towards monocular vision based obstacle avoidance through deep reinforcement lear-ning[J].arXiv:1706.09829,2017. [17]KAHN G,VILLAFLOR A,DING B,et al.Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:5129-5136. [18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [19]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decompo-sition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017. [20]LONG P,FAN T,LIAO X,et al.Towards optimally decentra-lized multi-robot collision avoidance via deep reinforcement lear-ning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:6252-6259. [21]LIN J,YANG X,ZHENG P,et al.End-to-end decentralizedmulti-robot navigation in unknown complex environments via deep reinforcement learning[C]//2019 IEEE International Conference on Mechatronics and Automation(ICMA).IEEE,2019:2493-2500. [22]MOHSENI-KABIR A,ISELE D,FUJIMURA K.Interaction-aware multi-agent reinforcement learning for mobile agents with individual goals[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:3370-3376. [23]WANG B,LIU Z,LI Q,et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Robotics and Automation Letters,2020,5(4):6932-6939. [24]LIU Z,CHEN B,ZHOU H,et al.Mapper:Multi-agent pathplanning with evolutionary reinforcement learning in mixed dynamic environments[C]//2020 IEEE/RSJ International Confe-rence on Intelligent Robots and Systems(IROS).IEEE,2020:11748-11754. [25]WANG D,DENG H.Multirobot coordination with deep rein-forcement learning in complex environments[J].Expert Systems with Applications,2021,180:115128. [26]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017(30):6379-6390. [27]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [28]FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabili-sing experience replay for deep multi-agent reinforcement lear-ning[C]//International Conference on Machine Learning.PMLR,2017:1146-1155. [29]QI S,ZHU S C.Intent-aware multi-agent reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2018:7533-7540. [30]FU J,LI W,DU J,et al.A multiscale residual pyramid attention network for medical image fusion[J].Biomedical Signal Proces-sing and Control,2021,66:102488. [31]ZHAI Y,DING B,LIU X,et al.Decentralized multi-robot colli-sion avoidance in complex scenarios with selective communication[J].IEEE Robotics and Automation Letters,2021,6(4):8379-8386. [32]WANG X,LIAN L,YU S X.Unsupervised visual attention and invariance for reinforcement learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:6677-6687. [33]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [34]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional blockattention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19. [35]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016. |
|