基于DQN的多智能体深度强化学习运动规划方法

doi:10.11896/jsjkx.230500113

Computer Science ›› 2024, Vol. 51 ›› Issue (2): 268-277.doi: 10.11896/jsjkx.230500113

• Artificial Intelligence • Previous Articles Next Articles

DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning

SHI Dianxi^1,2, PENG Yingxuan^2,3, YANG Huanhuan^2,3, OUYANG Qianying^1,2, ZHANG Yuhui², HAO Feng¹

1 Intelligent Game and Decision Lab(IGDL),Beijing 100091,China
2 Tianjin Artificial Intelligence Innovation Center,Tianjin 300457,China
3 College of Computer,National University of Defense Technology,Changsha 410073,China

Received:2023-05-17 Revised:2023-11-03 Online:2024-02-15 Published:2024-02-22
About author:SHI Dianxi,born in 1966,Ph.D,professor,Ph.D supervisor.His main research interests include artificial intelligence,robot operating system,distributed computing and cloud computing.HAO Feng,born in 1977,master,asso-ciate professor.His main research in-terests include artificial intelligence,mechanical and electronic engineering and computer applications.
Supported by:
Science and Technology Innovation 2030－Major Project(2020AAA0104802) and National Natural Science Foundation of China(91948303).

Abstract

Abstract: DQN as a classical value-based deep reinforcement learning method,has been widely used in the field of multi-agent motion planning.However,there are a series of challenges in DQN,such as,DQN can overestimate Q values,calculating Q values is more complicated,neural networks have no historical memory capability,using ε-greedy strategy for exploration is less efficient.To address these problems,a DQN-based multi-agent deep reinforcement learning motion planning method is proposed,which can help the agents learn an efficient and stable motion planning strategy,so as to reach the target points without collision.Firstly,based on the DQN method,an optimization mechanism for Q value calculation based on Dueling is proposed,which improves the calculation of Q value to calculate the state value and the advantage function value,and selects the optimal action based on the parameters of the Q value network that is currently being updated,making the calculation of Q value simpler and more accurate.Secondly,a memory mechanism based on GRU is proposed,and a GRU module is introduced,which enables the network to capture the temporal information and has the ability to process the historical information of the agents.Thirdly,an effective exploration mechanism based on noise is proposed,which changes the exploration mode in DQN by introducing parameterized noise,improves the exploration efficiency of the agents,and makes the multi-agent system reach the exploration-utilization equilibrium state.It is tested on PyBullet simulation platform in six different simulation scenarios,and the results show that the proposed method can enable multi-agent teams to collaborate efficiently and reach their respective target points without collision,and the strategy training process is more stable.

Key words: Multi-agent system, Motion planning, Deep reinforcement learning, DQN

CLC Number:

TP391

SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning[J].Computer Science, 2024, 51(2): 268-277.

References

[1]HILDEBRANDT A C,KLISCHAT M,WAHRMANN D,et al.RealTime Path Planning in Unknown Environments for Bipedal Robots[J].IEEE Robotics and Automation Letters,2017,2(4):1856-1863.
[2]HOLTE R C,PEREZ M B,ZIMMER R M,et al.HierarchicalA^*:Searching abstraction hierarchies efficiently[C]//AAAI.1996:530-535.
[3]DORIGO M,MANIEZZO V,COLORNI A.The ant system:An autocatalytic optimizingprocess[J].Clustering,1991,3(12):340.
[4]KHATIB O.Real-time obstacle avoidance system for manipulators and mobile robots[C]//Proceedings of the 1985 IEEE International Conference on Robotics and Automation.1985:25-28.
[5]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atariwith deep reinforcement learning[J].arXiv:1312.5602,2013.
[6]PU Z Q,YI J Q,LIU Z,et al.A review of collaborative know-ledge and Data driven swarm intelligent decision making [J].Acta Automatica Sinica,2022,48(3):1-17.
[7]SHARON G,STERN R,FELNER A,et al.Conflict-basedsearch for optimal multi-agent pathfinding[J].Artificial Intelligence,2015,219:40-66.
[8]FOX D,BURGARD W,THRUN S.The Dynamic Window Approach to Collision Avoidance[J].IEEE Robotics & Automation Magazine,2002,4(1):23-33.
[9]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83.
[10]BUSONIU L,BABUSKA R,DE SCHUTTER B.Multi-agentreinforcement learning:A survey[C]//2006 9th International Conference on Control,Automation,Robotics and Vision.IEEE,2006:1-6.
[11]HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797.
[12]WANG W,YANG T,LIU Y,et al.From few to more:Large-scale dynamic multiagent curriculum learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(5):7293-7300.
[13]WANG D,DENG H,PAN Z.Mrcdrl:Multi-robot coordination with deep reinforcement learning[J].Neurocomputing,2020,406:68-76.
[14]WANG D,DENG H.Multirobot coordination with deep rein-forcement learning in complex environments[J].Expert Systems with Applications,2021,180:115128.
[15]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-Decomposition Networks For Cooperative Multi-Agent Learning[J].ar-Xiv:1706.05296,2017.
[16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[17]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[18]PENG Y X,SHI D X,YANG H H,et al.Motion planningMethod for Multi-agent Deep Reinforcement Learning Based on Intention [J].Computer Science,2023,50(10):156-164.
[19]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[20]HAUSKNECHT M,STONE P.Deep recurrent q-learning forpartially observable mdps[C]//2015 AAAI Fall Symposium Series.2015.
[21]YAO S,CHEN G,PAN L,et al.Multi-robot collision avoidance with map-based deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence(ICTAI).IEEE,2020:532-539.
[22]SUKHBAATAR S,FERGUS R.Learning multiagent communication with backpropagation[J].arXiv:1605.07736,2016.
[23]LIU Y,WANG W,HU Y,et al.Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7211-7218.
[24]MAHAJAN A,RASHID T,SAMVELYAN M,et al.Maven:Multi-agent variational exploration[J].arXiv:1910.07483,2019.
[25]WU J,SUN X,ZENG A,et al.Spatial intention maps for multi-agent mobile manipulation[C]//2021 IEEE International Conference on Robotics and Automation(ICRA).IEEE,2021:8749-8756.

Related Articles 15

[1]	LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258.
[2]	ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310.
[3]	LIU Xingguang, ZHOU Li, ZHANG Xiaoying, CHEN Haitao, ZHAO Haitao, WEI Jibo. Edge Intelligent Sensing Based UAV Space Trajectory Planning Method [J]. Computer Science, 2023, 50(9): 311-317.
[4]	LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356.
[5]	JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67.
[6]	XIONG Liqin, CAO Lei, CHEN Xiliang, LAI Jun. Value Factorization Method Based on State Estimation [J]. Computer Science, 2023, 50(8): 202-208.
[7]	ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua. Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning [J]. Computer Science, 2023, 50(7): 308-316.
[8]	WANG Hanmo, ZHENG Shijie, XU Ruonan, GUO Bin, WU Lei. Self Reconfiguration Algorithm of Modular Robot Based on Swarm Agent Deep Reinforcement Learning [J]. Computer Science, 2023, 50(6): 266-273.
[9]	ZHANG Qiyang, CHEN Xiliang, CAO Lei, LAI Jun, SHENG Lei. Survey on Knowledge Transfer Method in Deep Reinforcement Learning [J]. Computer Science, 2023, 50(5): 201-216.
[10]	YU Ze, NING Nianwen, ZHENG Yanliu, LYU Yining, LIU Fuqiang, ZHOU Yi. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning [J]. Computer Science, 2023, 50(4): 159-171.
[11]	XU Linling, ZHOU Yuan, HUANG Hongyun, LIU Yang. Real-time Trajectory Planning Algorithm Based on Collision Criticality and Deep Reinforcement Learning [J]. Computer Science, 2023, 50(3): 323-332.
[12]	Cui ZHANG, En WANG, Funing YANG, Yong jian YANG , Nan JIANG. UAV Frequency-based Crowdsensing Using Grouping Multi-agentDeep Reinforcement Learning [J]. Computer Science, 2023, 50(2): 57-68.
[13]	XU Yapeng, LIU Quan, LI Junwei. Hierarchical Reinforcement Learning Method Based on Trajectory Information [J]. Computer Science, 2023, 50(12): 314-321.
[14]	ZHOU Tianyu, GUAN Zheng. Study on Relay Decision in Wireless Heterogeneous Networks Based on Deep ReinforcementLearning [J]. Computer Science, 2023, 50(11A): 221000088-5.
[15]	PENG Yingxuan, SHI Dianxi, YANG Huanhuan, HU Haomeng, YANG Shaowu. Intention-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2023, 50(10): 156-164.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0