计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 306-315.doi: 10.11896/jsjkx.240500099
赵学健, 叶昊, 李豪, 孙知信
ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin
摘要: 在自动化和智能物流领域,多自动引导车(Automated Guided Vehicle,AGV)系统的路径规划是关键技术难题。针对传统深度强化学习方法在多AGV系统应用中的效率、协作竞争和动态环境适应性问题,提出了一种改进的自适应协同深度确定性策略梯度算法Improved-AC-DDPG(Improved-Adaptive Cooperative-Deep Deterministic Policy Gradient)。该算法通过环境数据采集构建状态向量,并实时规划路径,动态生成任务序列以减少AGV间的冲突,同时监测并预测调整避障策略,持续优化策略参数。实验结果表明,与常规DDPG和人工势场优化DDPG(Artificial Potential Field-Deep Deterministic Policy Gradient,APF-DDPG)算法相比,Improved-AC-DDPG在收敛速度、避障能力、路径规划效果和能耗方面均表现更佳,显著提升了多AGV系统的效率与安全性。本研究为多智能体系统在动态环境中的建模与协作提供了新思路,具有重要的理论价值和应用潜力。
中图分类号:
[1]ZHAO X J,YE H,JIA W,et al.A review of AGV path planning and obstacle avoidance algorithms [J].Microcomputer Systems,2024,45(3):529-541. [2]AIZAT M,QISTINA N,RAHIMAN W.A Comprehensive Review of Recent Advances in Automated Guided Vehicle Techno-logies:Dynamic Obstacle Avoidance in Complex Environment Toward Autonomous Capability [J/OL].https://www.researchgate.net/publication/376154191_A_Comprehensive_Review_of_Recent_Advances_in_Automated_Guided _Vehicle_Technologies_Dynamic_Obstacle_Avoidance_in_Complex_Environment_Toward_Autonomous_Capability. [3]LIN Y,HU G,WANG L,et al.A multi-AGV routing planning method based on deep reinforcement learning and recurrent neural network [J].IEEE/CAA Journal of Automatica Sinica,2023,11(7):1720-1722. [4]YE X,DENG Z,SHI Y,et al.Toward energy-efficient routing of multiple AGVs with multi-agent reinforcement learning [J].Sensors,2023,23(12):5615. [5]GAO Y,CHEN C H,CHANG D.A Machine Learning-Based Approach for Multi-AGV Dispatching at Automated Container Terminals [J].Journal of Marine Science and Engineering,2023,11(7):1407. [6]CHEN Y,SCHOMAKER L,CRUZ F.Boosting Reinforcement Learning Algorithms in Continuous Robotic Reaching Tasks using Adaptive Potential Functions [J].arXiv:2402.04581,2024. [7]BHADAURIA S,PLAKU K,DESHPANDE Y,et al.Evaluation of NR-Sidelink for Cooperative Industrial AGVs [J].arXiv:2309.02949,2023. [8]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuouscontrol with deep reinforcement learning [J].arXiv:1509.02971,2015. [9]DOBREV D.Formal Definition of Artificial Intelligence and an Algorithm Which Satisfies This Definition [C]//XII-th International Conference.2006. [10]SCHAUL T, QUAN J, ANTONOGLOU I,et al.Prioritizedexperience replay [J].arXiv:1511.05952,2015. [11]KALIDINDI H T,CROSS K P,LILLICRAP T,P et al.Rotational dynamics in motor cortex are consistent with a feedback controller [J].Elife,2021,10:e67256. [12]ZHU H,XIE Y,ZHENG S.A double Actor-Critic learning system embedding improved Monte Carlo tree search [J].Neural Computing and Applications,2024,36:8485-8550. [13]LI C.Research on Multi-AGV Scheduling System of Intelligent Warehouse Based on Dynamic Task Chain [D].Hangzhou:Zhejiang University,2023. [14]YAN J D.Modeling and deployment optimization of “low,slow and small” UAV bee colony counterwarfare mission chain [D].Nanjing:National University of Defense Technology,2021. [15]HU B,TIAN X L,YANG C,et al.A Dynamic Resource Chain Task Unloading Method Based on Improved Greedy Algorithm [J].Journal of Physics:Conference Series,2021,1883(1):012021. [16] XIONG J T,LI Z X,CHEN S M,et al.Obstacle avoidance planning of virtual robot picking path based on deep reinforcement learning [J].Journal of Agricultural Machinery,2020,51(S2):1-10. [17]YE H,ZHANG X,FAN F.A fast mounting structure of multi-layer pallet and AGV trolley:CN220244403[P].2023-12-26. [18]GUO S,ZHANG X,ZHENG Y,et al.An autonomous pathplanning model for unmanned ships based on deep reinforcement learning [J].Sensors,2020,20(2):426. [19]RUPAPARA V,RAJEST S S,RAJAN R,et al.A dynamic perceptual detector module-related telemonitoring for the intertubes of health services [M]//Artificial Intelligence for Smart Healthcare.Cham:Springer International Publishing,2023:245-274. [20]CHEN X,LIU S,ZHAO J,et al.Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework [J].Ocean & Coastal Management,2024,251:107087. [21]GONG L,HUANG Z,XIANG X,et al.Real-time AGV scheduling optimisation method with deep reinforcement learning for energy-efficiency in the container terminal yard [J].InternationalJournal of Production Research,2024,62(21):7722-7742. [22]ISLAM F,BALL J E,GOODIN C T.Enhancing LongitudinalVelocity Control With Attention Mechanism-Based Deep Deterministic Policy Gradient(DDPG) for Safety and Comfort [J].IEEE Access,2024,12:30765-30780. [23]HAZARIKA B,SAIKIA P,SINGH K,et al.Enhancing Vehicular Networks With Hierarchical O-RAN Slicing and Federated DRL [J].IEEE Transactions on Green Communications and Networking,2024,8(3):1099-1117. [24]LI H.Research on Multi-task Allocation and Path Planning of Multi-AGV [D].Nanjing:Nanjing University of Posts and Telecommunications,2019. [25]TIAN S H,SHEN Y F,OU L Y,et al.AGV Task Assignment Optimization of Automatic Picking System Considering Load Balancing [J].Computer Application Research,2024,41(8):2366-2373. |
|