计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200095-7.doi: 10.11896/jsjkx.250200095

• 人工智能 • 上一篇    下一篇

改进深度强化学习的多智能体联合导航策略研究

夏为浩1, 王金龙2   

  1. 1 国营长虹机械厂 广西 桂林 541003
    2哈尔滨工业大学信息科学与工程学院 山东 威海 264209
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 夏为浩(610452197@qq.com)
  • 基金资助:
    山东省重大科技创新工程(2021ZLGX05);国家自然科学基金(联合基金)重点支持项目(U23A20336)

Research on Multi-agent Joint Navigation Strategy Based on Improved Deep ReinforcementLearning

XIA Weihao1, WANG Jinlong2   

  1. 1 State-owned Changhong Machinery Factory,Guilin,Guangxi 541003,China
    2 School of Information Science and Engineering,Harbin Institute of Technology,Weihai,Shandong 264209,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    Major Science and Technology Innovation Project of Shandong Province(2021ZLGX05)and Key Support Project of the National Natural Science Foundation of China(Joint Fund)(U23A20336).

摘要: 在人工智能技术的快速进步推动下,多智能体系统在多个实际应用领域,如环境监测、灾难救援、自动驾驶等,展现出了其联合导航的潜力。这些任务通常可以概括为多智能体联合导航问题。随着参与任务的智能体数量增加,多智能体系统中的强化学习扩展面临效率低下和学习惰性等问题,这些问题严重制约了任务执行的性能。文中提出了一种创新的多智能体强化学习模型。该模型通过构建一个双层策略网络,使智能体能够在部分可观测的环境中考虑同伴的策略,以此加快学习进程。此外,引入了动态奖励机制,用于解决智能体联合导航效果差的问题。实验结果证明,基于双层策略网络的深度强化学习模型在多智能体联合导航任务中显著提高了联合效率,特别是在智能体数量较多的情况下,其优势更为明显。

关键词: 多智能体, 联合导航, 深度强化学习

Abstract: Driven by the rapid progress of artificial intelligence technology,multi-agent systems have shown their potential for cooperative navigation in many practical applications,such as environmental monitoring,disaster relief,and autonomous driving.These tasks can generally be summarized as the multi-agent cooperative navigation problem.However,with the increase of the number of agents involved in the task,the expansion of reinforcement learning in multi-agent systems faces problems such as inefficiency and learning inertia,which seriously restrict the performance of task execution.This paper proposes an innovative multi-agent reinforcement learning framework.The framework speeds up the learning process by building a two-tier strategy network that enables agents to consider their peers’ strategies in a partially observable environment.In addition,a dynamic reward mechanism is introduced to solve the problem of poor cooperative navigation.The experimental results show that this deep reinforcement learning model based on two-layer strategy network can significantly improve the cooperation efficiency in multi-agent cooperative navigation tasks,especially in the case of a large number of agents,its advantages are more obvious.

Key words: Multi-agent, Joint navigation, Deep reinforcement learning

中图分类号: 

  • V324.1
[1]ZHAO Y N.Research on Path Planning Problem Based on Reinforcement Learning [D].Harbin:Harbin Institute of Technology,2018.
[2]DENG W.Research and Application of Agent Obstacle Avoidance and Path Planning Based on Deep Reinforcement Learning [D].University of Electronic Science and Technology of China,2020.
[3]LI G,CAI C,CHEN Y,et al.Is Q-learning minimax optimal? a tight sample complexity analysis[J].Operations Research,2024,72(1):222-236.
[4]ZHANG L,ZHOU W,XIA J,et al.DQN-based mobile edgecomputing for smart Internet of vehicle[J].EURASIP Journal on Advances in Signal Processing,2022,2022(1):45.
[5]BRIM A.Deep reinforcement learning pairs trading with a double deep Q-network[C]//2020 10th Annual Computing and Communication Workshop and Conference(CCWC).IEEE,2020:0222-0227.
[6]XU Y H,YANG C C,HUA M,et al.Deep deterministic policy gradient(DDPG)-based resource allocation scheme for NOMA vehicular communications[J].IEEE Access,2020,8:18797-18807.
[7]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30.
[8]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counter-factual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1).
[9]JING Y,GUO B,LIN,et al.Scalable order dispatching through Federated Multi-Agent Deep Reinforcement Learning[J].Expert Systems with Applications,2025,264:125792.
[10]JU T,LI L,LIU S,et al.A multi-UAV assisted task offloading and path optimization for mobile edge computing via muti-agent deep reinforcement learning[J].Journal of Network and Computer Applications,2024:103919.
[11]YING C,CHOW A H F,YAN Y,et al.Adaptive rescheduling of rail transit services with short-turnings under disruptions via a multi-agent deep reinforcement learning approach[J].Transportation Research Part B:Methodological,2024,188:103067.
[12]MAK S,XU L,PEARCE T,et al.Fair collaborative vehicle routing:A deep multi-agent reinforcement learning approach[J].Transportation Research Part C:Emerging Technologies,2023,157:104376.
[13]LI S,WU Y,CUI X,et al.Robust multi-agent reinforcement learning via minimax deepdeterministic policy gradient[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:4213-4220.
[14]WANG Y,ZOU S.Policy gradient method for robust reinforcement learning[C]//International Conference on Machine Learning.PMLR,2022:23484-23526.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!