计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 306-315.doi: 10.11896/jsjkx.240500099

• 人工智能 • 上一篇    下一篇

基于改进DDPG的多AGV路径规划算法

赵学健, 叶昊, 李豪, 孙知信   

  1. 南京邮电大学现代邮政学院 南京 210003
    南京邮电大学江苏省邮政大数据技术与应用工程研究中心 南京 210003
    南京邮电大学国家邮政局邮政行业技术研发中心(物联网技术) 南京 210003
  • 收稿日期:2024-05-22 修回日期:2024-10-26 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 孙知信(sunzx@njupt.edu.cn)
  • 作者简介:(zhaoxj@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金(61972208);中国博士后科学基金(2018M640509);江苏省研究生科研与实践创新计划项目(SICX23_0303,SJCX24_0339)

Multi-AGV Path Planning Algorithm Based on Improved DDPG

ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin   

  1. Modern Postal College,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    Jiangsu Postal Big Data Technology and Application Engineering Research Center,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    State Post Bureau Postal Industry Technology Research and Development Center(Internet of Things Technology),Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Received:2024-05-22 Revised:2024-10-26 Online:2025-06-15 Published:2025-06-11
  • About author:ZHAO Xuejian,born in 1982,Ph.D,associate professor,is a member of CCF(No.88401M).His main research interests include data mining and wireless sensor networks.
    SUN Zhixin,born in 1964,Ph.D, professor,doctoral supervisor.His main research interests include the theory and technology of network communication,computer network and security.
  • Supported by:
    National Natural Science Foundation of China(61972208),China Postdoctoral Science Foundation(2018M640509) and Jiangsu Postgraduate Research and Practice Innovation Project(SICX23_0303,SJCX24_0339).

摘要: 在自动化和智能物流领域,多自动引导车(Automated Guided Vehicle,AGV)系统的路径规划是关键技术难题。针对传统深度强化学习方法在多AGV系统应用中的效率、协作竞争和动态环境适应性问题,提出了一种改进的自适应协同深度确定性策略梯度算法Improved-AC-DDPG(Improved-Adaptive Cooperative-Deep Deterministic Policy Gradient)。该算法通过环境数据采集构建状态向量,并实时规划路径,动态生成任务序列以减少AGV间的冲突,同时监测并预测调整避障策略,持续优化策略参数。实验结果表明,与常规DDPG和人工势场优化DDPG(Artificial Potential Field-Deep Deterministic Policy Gradient,APF-DDPG)算法相比,Improved-AC-DDPG在收敛速度、避障能力、路径规划效果和能耗方面均表现更佳,显著提升了多AGV系统的效率与安全性。本研究为多智能体系统在动态环境中的建模与协作提供了新思路,具有重要的理论价值和应用潜力。

关键词: AGV, 路径规划, 深度强化学习, DDPG

Abstract: In the field of intelligent logistics,the challenge of path planning and obstacle avoidance for automated guided vehicles(AGVs) is significant.Traditional deep reinforcement learning(DRL) methods exhibit limitations in efficiency,dynamic adaptability,and handling competitive-cooperative interactions among multiple AGVs.This paper presents the improved adaptive co-operative deep deterministic policy gradient(Improved-AC-DDPG) algorithm,an advancement over the standard DDPG.It leverages environmental data to construct state vectors and employs a real-time path planning strategy that dynamically creates task sequences to prevent AGV conflicts.This algorithm also includes continuous policy parameter optimization for obstacle avoidance.Experiments show that the Improved-AC-DDPG surpasses both the standard DDPG and the artificial potential field optimization DDPG(APF-DDPG) in convergence speed,obstacle avoidance,path planning,and energy efficiency,thus enhancing multi-AGV system performance.This study provides innovative insights and solutions for multi-agent system modeling and collaboration in dynamic environments,with substantial theoretical and practical implications.

Key words: AGV, Path planning, Deep reinforcement learning, DDPG

中图分类号: 

  • TP242
[1]ZHAO X J,YE H,JIA W,et al.A review of AGV path planning and obstacle avoidance algorithms [J].Microcomputer Systems,2024,45(3):529-541.
[2]AIZAT M,QISTINA N,RAHIMAN W.A Comprehensive Review of Recent Advances in Automated Guided Vehicle Techno-logies:Dynamic Obstacle Avoidance in Complex Environment Toward Autonomous Capability [J/OL].https://www.researchgate.net/publication/376154191_A_Comprehensive_Review_of_Recent_Advances_in_Automated_Guided
_Vehicle_Technologies_Dynamic_Obstacle_Avoidance_in_Complex_Environment_Toward_Autonomous_Capability.
[3]LIN Y,HU G,WANG L,et al.A multi-AGV routing planning method based on deep reinforcement learning and recurrent neural network [J].IEEE/CAA Journal of Automatica Sinica,2023,11(7):1720-1722.
[4]YE X,DENG Z,SHI Y,et al.Toward energy-efficient routing of multiple AGVs with multi-agent reinforcement learning [J].Sensors,2023,23(12):5615.
[5]GAO Y,CHEN C H,CHANG D.A Machine Learning-Based Approach for Multi-AGV Dispatching at Automated Container Terminals [J].Journal of Marine Science and Engineering,2023,11(7):1407.
[6]CHEN Y,SCHOMAKER L,CRUZ F.Boosting Reinforcement Learning Algorithms in Continuous Robotic Reaching Tasks using Adaptive Potential Functions [J].arXiv:2402.04581,2024.
[7]BHADAURIA S,PLAKU K,DESHPANDE Y,et al.Evaluation of NR-Sidelink for Cooperative Industrial AGVs [J].arXiv:2309.02949,2023.
[8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015.
[9]DOBREV D.Formal Definition of Artificial Intelligence and an Algorithm Which Satisfies This Definition [C]//XII-th International Conference.2006.
[10]SCHAUL T, QUAN J, ANTONOGLOU I,et al.Prioritizedexperience replay [J].arXiv:1511.05952,2015.
[11]KALIDINDI H T,CROSS K P,LILLICRAP T,P et al.Rotational dynamics in motor cortex are consistent with a feedback controller [J].Elife,2021,10:e67256.
[12]ZHU H,XIE Y,ZHENG S.A double Actor-Critic learning system embedding improved Monte Carlo tree search [J].Neural Computing and Applications,2024,36:8485-8550.
[13]LI C.Research on Multi-AGV Scheduling System of Intelligent Warehouse Based on Dynamic Task Chain [D].Hangzhou:Zhejiang University,2023.
[14]YAN J D.Modeling and deployment optimization of “low,slow and small” UAV bee colony counterwarfare mission chain [D].Nanjing:National University of Defense Technology,2021.
[15]HU B,TIAN X L,YANG C,et al.A Dynamic Resource Chain Task Unloading Method Based on Improved Greedy Algorithm [J].Journal of Physics:Conference Series,2021,1883(1):012021.
[16] XIONG J T,LI Z X,CHEN S M,et al.Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning [J].Journal of Agricultural Machinery,2020,51(S2):1-10.
[17]YE H,ZHANG X,FAN F.A fast mounting structure of multi-layer pallet and AGV trolley:CN220244403[P].2023-12-26.
[18]GUO S,ZHANG X,ZHENG Y,et al.An autonomous pathplanning model for unmanned ships based on deep reinforcement learning [J].Sensors,2020,20(2):426.
[19]RUPAPARA V,RAJEST S S,RAJAN R,et al.A dynamic perceptual detector module-related telemonitoring for the intertubes of health services [M]//Artificial Intelligence for Smart Healthcare.Cham:Springer International Publishing,2023:245-274.
[20]CHEN X,LIU S,ZHAO J,et al.Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework [J].Ocean & Coastal Management,2024,251:107087.
[21]GONG L,HUANG Z,XIANG X,et al.Real-time AGV scheduling optimisation method with deep reinforcement learning for energy-efficiency in the container terminal yard [J].InternationalJournal of Production Research,2024,62(21):7722-7742.
[22]ISLAM F,BALL J E,GOODIN C T.Enhancing LongitudinalVelocity Control With Attention Mechanism-Based Deep Deterministic Policy Gradient(DDPG) for Safety and Comfort [J].IEEE Access,2024,12:30765-30780.
[23]HAZARIKA B,SAIKIA P,SINGH K,et al.Enhancing Vehicular Networks With Hierarchical O-RAN Slicing and Federated DRL [J].IEEE Transactions on Green Communications and Networking,2024,8(3):1099-1117.
[24]LI H.Research on Multi-task Allocation and Path Planning of Multi-AGV [D].Nanjing:Nanjing University of Posts and Telecommunications,2019.
[25]TIAN S H,SHEN Y F,OU L Y,et al.AGV Task Assignment Optimization of Automatic Picking System Considering Load Balancing [J].Computer Application Research,2024,41(8):2366-2373.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!