计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 268-277.doi: 10.11896/jsjkx.230500113
史殿习1,2, 彭滢璇2,3, 杨焕焕2,3, 欧阳倩滢1,2, 张玉晖2, 郝锋1
SHI Dianxi1,2, PENG Yingxuan2,3, YANG Huanhuan2,3, OUYANG Qianying1,2, ZHANG Yuhui2, HAO Feng1
摘要: DQN方法作为经典的基于价值的深度强化学习方法,在多智能体运动规划等领域得到了广泛应用。然而,DQN方法面临一系列挑战,例如,DQN会过高估计Q值,计算Q值较为复杂,神经网络没有历史记忆能力,使用ε-greedy策略进行探索效率较低等。针对这些问题,提出了一种基于DQN的多智能体深度强化学习运动规划方法,该方法可以帮助智能体学习到高效稳定的运动规划策略,无碰撞地到达目标点。首先,在DQN方法的基础上,提出了基于Dueling的Q值计算优化机制,将Q值的计算方式改进为计算状态值和优势函数值,并根据当前正在更新的Q值网络的参数选择最优动作,使得Q值的计算更加简单准确;其次,提出了基于GRU的记忆机制,引入了GRU模块,使得网络可以捕捉时序信息,具有处理智能体历史信息的能力;最后,提出了基于噪声的有效探索机制,通过引入参数化的噪声,改变了DQN中的探索方式,提高了智能体的探索效率,使得多智能体系统达到探索-利用的平衡状态。在PyBullet仿真平台的6种不同的仿真场景中进行了测试,实验结果表明,所提方法可以使多智能体团队进行高效协作,无碰撞地到达各自目标点,且策略训练过程稳定。
中图分类号:
[1]HILDEBRANDT A C,KLISCHAT M,WAHRMANN D,et al.RealTime Path Planning in Unknown Environments for Bipedal Robots[J].IEEE Robotics and Automation Letters,2017,2(4):1856-1863. [2]HOLTE R C,PEREZ M B,ZIMMER R M,et al.HierarchicalA*:Searching abstraction hierarchies efficiently[C]//AAAI.1996:530-535. [3]DORIGO M,MANIEZZO V,COLORNI A.The ant system:An autocatalytic optimizingprocess[J].Clustering,1991,3(12):340. [4]KHATIB O.Real-time obstacle avoidance system for manipulators and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation.1985:25-28. [5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atariwith deep reinforcement learning[J].arXiv:1312.5602,2013. [6]PU Z Q,YI J Q,LIU Z,et al.A review of collaborative know-ledge and Data driven swarm intelligent decision making [J].Acta Automatica Sinica,2022,48(3):1-17. [7]SHARON G,STERN R,FELNER A,et al.Conflict-basedsearch for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219:40-66. [8]FOX D,BURGARD W,THRUN S.The Dynamic Window Approach to Collision Avoidance[J].IEEE Robotics & Automation Magazine,2002,4(1):23-33. [9]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83. [10]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:A survey[C]//2006 9th International Conference on Control,Automation,Robotics and Vision.IEEE,2006:1-6. [11]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. [12]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):7293-7300. [13]WANG D,DENG H,PAN Z.Mrcdrl:Multi-robot coordination with deep reinforcement learning[J].Neurocomputing,2020,406:68-76. [14]WANG D,DENG H.Multirobot coordination with deep rein-forcement learning in complex environments[J].Expert Systems with Applications,2021,180:115128. [15]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[J].ar-Xiv:1706.05296,2017. [16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [17]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016. [18]PENG Y X,SHI D X,YANG H H,et al.Motion planningMethod for Multi-agent Deep Reinforcement Learning Based on Intention [J].Computer Science,2023,50(10):156-164. [19]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003. [20]HAUSKNECHT M,STONE P.Deep recurrent q-learning forpartially observable mdps[C]//2015 AAAI Fall Symposium Series.2015. [21]YAO S,CHEN G,PAN L,et al.Multi-robot collision avoidance with map-based deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence(ICTAI).IEEE,2020:532-539. [22]SUKHBAATAR S,FERGUS R.Learning multiagent communication with backpropagation[J].arXiv:1605.07736,2016. [23]LIU Y,WANG W,HU Y,et al.Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7211-7218. [24]MAHAJAN A,RASHID T,SAMVELYAN M,et al.Maven:Multi-agent variational exploration[J].arXiv:1910.07483,2019. [25]WU J,SUN X,ZENG A,et al.Spatial intention maps for multi-agent mobile manipulation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8749-8756. |
|