Computer Science ›› 2025, Vol. 52 ›› Issue (6): 306-315.doi: 10.11896/jsjkx.240500099

• Artificial Intelligence • Previous Articles     Next Articles

Multi-AGV Path Planning Algorithm Based on Improved DDPG

ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin   

  1. Modern Postal College,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    Jiangsu Postal Big Data Technology and Application Engineering Research Center,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    State Post Bureau Postal Industry Technology Research and Development Center(Internet of Things Technology),Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Received:2024-05-22 Revised:2024-10-26 Online:2025-06-15 Published:2025-06-11
  • About author:ZHAO Xuejian,born in 1982,Ph.D,associate professor,is a member of CCF(No.88401M).His main research interests include data mining and wireless sensor networks.
    SUN Zhixin,born in 1964,Ph.D, professor,doctoral supervisor.His main research interests include the theory and technology of network communication,computer network and security.
  • Supported by:
    National Natural Science Foundation of China(61972208),China Postdoctoral Science Foundation(2018M640509) and Jiangsu Postgraduate Research and Practice Innovation Project(SICX23_0303,SJCX24_0339).

Abstract: In the field of intelligent logistics,the challenge of path planning and obstacle avoidance for automated guided vehicles(AGVs) is significant.Traditional deep reinforcement learning(DRL) methods exhibit limitations in efficiency,dynamic adaptability,and handling competitive-cooperative interactions among multiple AGVs.This paper presents the improved adaptive co-operative deep deterministic policy gradient(Improved-AC-DDPG) algorithm,an advancement over the standard DDPG.It leverages environmental data to construct state vectors and employs a real-time path planning strategy that dynamically creates task sequences to prevent AGV conflicts.This algorithm also includes continuous policy parameter optimization for obstacle avoidance.Experiments show that the Improved-AC-DDPG surpasses both the standard DDPG and the artificial potential field optimization DDPG(APF-DDPG) in convergence speed,obstacle avoidance,path planning,and energy efficiency,thus enhancing multi-AGV system performance.This study provides innovative insights and solutions for multi-agent system modeling and collaboration in dynamic environments,with substantial theoretical and practical implications.

Key words: AGV, Path planning, Deep reinforcement learning, DDPG

CLC Number: 

  • TP242
[1]ZHAO X J,YE H,JIA W,et al.A review of AGV path planning and obstacle avoidance algorithms [J].Microcomputer Systems,2024,45(3):529-541.
[2]AIZAT M,QISTINA N,RAHIMAN W.A Comprehensive Review of Recent Advances in Automated Guided Vehicle Techno-logies:Dynamic Obstacle Avoidance in Complex Environment Toward Autonomous Capability [J/OL].https://www.researchgate.net/publication/376154191_A_Comprehensive_Review_of_Recent_Advances_in_Automated_Guided
_Vehicle_Technologies_Dynamic_Obstacle_Avoidance_in_Complex_Environment_Toward_Autonomous_Capability.
[3]LIN Y,HU G,WANG L,et al.A multi-AGV routing planning method based on deep reinforcement learning and recurrent neural network [J].IEEE/CAA Journal of Automatica Sinica,2023,11(7):1720-1722.
[4]YE X,DENG Z,SHI Y,et al.Toward energy-efficient routing of multiple AGVs with multi-agent reinforcement learning [J].Sensors,2023,23(12):5615.
[5]GAO Y,CHEN C H,CHANG D.A Machine Learning-Based Approach for Multi-AGV Dispatching at Automated Container Terminals [J].Journal of Marine Science and Engineering,2023,11(7):1407.
[6]CHEN Y,SCHOMAKER L,CRUZ F.Boosting Reinforcement Learning Algorithms in Continuous Robotic Reaching Tasks using Adaptive Potential Functions [J].arXiv:2402.04581,2024.
[7]BHADAURIA S,PLAKU K,DESHPANDE Y,et al.Evaluation of NR-Sidelink for Cooperative Industrial AGVs [J].arXiv:2309.02949,2023.
[8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015.
[9]DOBREV D.Formal Definition of Artificial Intelligence and an Algorithm Which Satisfies This Definition [C]//XII-th International Conference.2006.
[10]SCHAUL T, QUAN J, ANTONOGLOU I,et al.Prioritizedexperience replay [J].arXiv:1511.05952,2015.
[11]KALIDINDI H T,CROSS K P,LILLICRAP T,P et al.Rotational dynamics in motor cortex are consistent with a feedback controller [J].Elife,2021,10:e67256.
[12]ZHU H,XIE Y,ZHENG S.A double Actor-Critic learning system embedding improved Monte Carlo tree search [J].Neural Computing and Applications,2024,36:8485-8550.
[13]LI C.Research on Multi-AGV Scheduling System of Intelligent Warehouse Based on Dynamic Task Chain [D].Hangzhou:Zhejiang University,2023.
[14]YAN J D.Modeling and deployment optimization of “low,slow and small” UAV bee colony counterwarfare mission chain [D].Nanjing:National University of Defense Technology,2021.
[15]HU B,TIAN X L,YANG C,et al.A Dynamic Resource Chain Task Unloading Method Based on Improved Greedy Algorithm [J].Journal of Physics:Conference Series,2021,1883(1):012021.
[16] XIONG J T,LI Z X,CHEN S M,et al.Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning [J].Journal of Agricultural Machinery,2020,51(S2):1-10.
[17]YE H,ZHANG X,FAN F.A fast mounting structure of multi-layer pallet and AGV trolley:CN220244403[P].2023-12-26.
[18]GUO S,ZHANG X,ZHENG Y,et al.An autonomous pathplanning model for unmanned ships based on deep reinforcement learning [J].Sensors,2020,20(2):426.
[19]RUPAPARA V,RAJEST S S,RAJAN R,et al.A dynamic perceptual detector module-related telemonitoring for the intertubes of health services [M]//Artificial Intelligence for Smart Healthcare.Cham:Springer International Publishing,2023:245-274.
[20]CHEN X,LIU S,ZHAO J,et al.Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework [J].Ocean & Coastal Management,2024,251:107087.
[21]GONG L,HUANG Z,XIANG X,et al.Real-time AGV scheduling optimisation method with deep reinforcement learning for energy-efficiency in the container terminal yard [J].InternationalJournal of Production Research,2024,62(21):7722-7742.
[22]ISLAM F,BALL J E,GOODIN C T.Enhancing LongitudinalVelocity Control With Attention Mechanism-Based Deep Deterministic Policy Gradient(DDPG) for Safety and Comfort [J].IEEE Access,2024,12:30765-30780.
[23]HAZARIKA B,SAIKIA P,SINGH K,et al.Enhancing Vehicular Networks With Hierarchical O-RAN Slicing and Federated DRL [J].IEEE Transactions on Green Communications and Networking,2024,8(3):1099-1117.
[24]LI H.Research on Multi-task Allocation and Path Planning of Multi-AGV [D].Nanjing:Nanjing University of Posts and Telecommunications,2019.
[25]TIAN S H,SHEN Y F,OU L Y,et al.AGV Task Assignment Optimization of Automatic Picking System Considering Load Balancing [J].Computer Application Research,2024,41(8):2366-2373.
[1] LIU Qingyun, YOU Xiong, ZHANG Xin, ZUO Jiwei, LI Jia. Review of Path Planning Algorithms for Mobile Robots [J]. Computer Science, 2025, 52(6A): 240900074-10.
[2] YE Mingjun, WANG Shujian. UAV Path Planning Based on Improved Dung Beetle Optimization Algorithm [J]. Computer Science, 2025, 52(6A): 240900136-6.
[3] WU Zongming, CAO Jijun, TANG Qiang. Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240900018-9.
[4] WANG Chenyuan, ZHANG Yanmei, YUAN Guan. Class Integration Test Order Generation Approach Fused with Deep Reinforcement Learning andGraph Convolutional Neural Network [J]. Computer Science, 2025, 52(6): 58-65.
[5] LI Yuanbo, HU Hongchao, YANG Xiaohan, GUO Wei, LIU Wenyan. Intrusion Tolerance Scheduling Algorithm for Microservice Workflow Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(5): 375-383.
[6] ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
[7] DU Likuan, LIU Chen, WANG Junlu, SONG Baoyan. Self-learning Star Chain Space Adaptive Allocation Method [J]. Computer Science, 2025, 52(3): 359-365.
[8] HUO Xingpeng, SHA Letian, LIU Jianwen, WU Shang, SU Ziyue. Windows Domain Penetration Testing Attack Path Generation Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(3): 400-406.
[9] XU Donghong, LI Bin, QI Yong. Task Scheduling Strategy Based on Improved A2C Algorithm for Cloud Data Center [J]. Computer Science, 2025, 52(2): 310-322.
[10] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[11] LIU Yi, QI Jie. IRRT*-APF Path Planning Algorithm Considering Kinematic Constraints of Unmanned Surface Vehicle [J]. Computer Science, 2024, 51(9): 290-298.
[12] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[13] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[14] WEI Shuxin, WANG Qunjing, LI Guoli, XU Jiazi, WEN Yan. Path Planning for Mobile Robots Based on Modified Adaptive Ant Colony Optimization Algorithm [J]. Computer Science, 2024, 51(6A): 230500145-9.
[15] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!