计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200095-7.doi: 10.11896/jsjkx.250200095
夏为浩1, 王金龙2
XIA Weihao1, WANG Jinlong2
摘要: 在人工智能技术的快速进步推动下,多智能体系统在多个实际应用领域,如环境监测、灾难救援、自动驾驶等,展现出了其联合导航的潜力。这些任务通常可以概括为多智能体联合导航问题。随着参与任务的智能体数量增加,多智能体系统中的强化学习扩展面临效率低下和学习惰性等问题,这些问题严重制约了任务执行的性能。文中提出了一种创新的多智能体强化学习模型。该模型通过构建一个双层策略网络,使智能体能够在部分可观测的环境中考虑同伴的策略,以此加快学习进程。此外,引入了动态奖励机制,用于解决智能体联合导航效果差的问题。实验结果证明,基于双层策略网络的深度强化学习模型在多智能体联合导航任务中显著提高了联合效率,特别是在智能体数量较多的情况下,其优势更为明显。
中图分类号:
| [1]ZHAO Y N.Research on Path Planning Problem Based on Reinforcement Learning [D].Harbin:Harbin Institute of Technology,2018. [2]DENG W.Research and Application of Agent Obstacle Avoidance and Path Planning Based on Deep Reinforcement Learning [D].University of Electronic Science and Technology of China,2020. [3]LI G,CAI C,CHEN Y,et al.Is Q-learning minimax optimal? a tight sample complexity analysis[J].Operations Research,2024,72(1):222-236. [4]ZHANG L,ZHOU W,XIA J,et al.DQN-based mobile edgecomputing for smart Internet of vehicle[J].EURASIP Journal on Advances in Signal Processing,2022,2022(1):45. [5]BRIM A.Deep reinforcement learning pairs trading with a double deep Q-network[C]//2020 10th Annual Computing and Communication Workshop and Conference(CCWC).IEEE,2020:0222-0227. [6]XU Y H,YANG C C,HUA M,et al.Deep deterministic policy gradient(DDPG)-based resource allocation scheme for NOMA vehicular communications[J].IEEE Access,2020,8:18797-18807. [7]LOWE R,WU Y I,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[J].Advances in Neural Information Processing Systems,2017,30. [8]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counter-factual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1). [9]JING Y,GUO B,LIN,et al.Scalable order dispatching through Federated Multi-Agent Deep Reinforcement Learning[J].Expert Systems with Applications,2025,264:125792. [10]JU T,LI L,LIU S,et al.A multi-UAV assisted task offloading and path optimization for mobile edge computing via muti-agent deep reinforcement learning[J].Journal of Network and Computer Applications,2024:103919. [11]YING C,CHOW A H F,YAN Y,et al.Adaptive rescheduling of rail transit services with short-turnings under disruptions via a multi-agent deep reinforcement learning approach[J].Transportation Research Part B:Methodological,2024,188:103067. [12]MAK S,XU L,PEARCE T,et al.Fair collaborative vehicle routing:A deep multi-agent reinforcement learning approach[J].Transportation Research Part C:Emerging Technologies,2023,157:104376. [13]LI S,WU Y,CUI X,et al.Robust multi-agent reinforcement learning via minimax deepdeterministic policy gradient[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:4213-4220. [14]WANG Y,ZOU S.Policy gradient method for robust reinforcement learning[C]//International Conference on Machine Learning.PMLR,2022:23484-23526. |
|
||