计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 239-251.doi: 10.11896/jsjkx.250200059
刘嘉辉1, 赵一诺1, 田丰2, 齐光鹏3,4, 李江涛2, 刘驰1
LIU Jiahui1, ZHAO Yinuo1, TIAN Feng2, QI Guangpeng3,4, LI Jiangtao2, LIU Chi1
摘要: 无人船路径跟踪对海上自主作业至关重要,然而,风、浪、流以及无人船自身的控制误差等因素会影响路径跟踪的性能。强化学习算法凭借在线交互与实时反馈的特点,能够主动适应动态环境,在无人船路径跟踪任务中展现出良好的应用前景。然而,其试错训练模式在实际应用中存在安全风险,且理想仿真场景与现实复杂环境之间的差距也进一步制约了强化学习在实际应用中的效果。针对这些挑战,提出了一种视线引导与自专家克隆融合强化学习的无人船路径跟踪算法LECUP。LECUP算法首先在静水环境中训练专家策略,随后通过自专家克隆将智能体迁移至更复杂的环境中。为了确保知识能够有效传递,LECUP算法引入数据填充机制,将自专家在静水环境中积累的经验数据进行升维填充并存储,并以此初始化复杂环境中的智能体。之后,运用强化学习算法对智能体在复杂环境中进行微调,从而进一步适应复杂环境。此外,LECUP算法结合视线算法计算目标航向,将路径跟踪控制与路径几何形状解耦,增强了无人船对不同路径形状的适应能力。该方法不仅能够在复杂环境中持续优化策略,还能缓解随机初始化带来的安全风险。大量实验结果表明,相较于基线方法,LECUP算法能够更好地完成无人船路径跟踪任务。
中图分类号:
| [1]ERM J,MA C,LIU T,et al.Intelligent Motion Control of Unmanned Surface Vehicles:A Critical Review[J].Ocean Engineering,2023,280:114562. [2]ZHANG W S,LI Z,ZHENG Y.The Current Status and Re-search and Development Trend of Unmanned Ships at Home and Abroad[J].Ship Science and Technology,2024,46(15):79-83. [3]ALIM M F A,KADIR R E A,GAMAYANTI N,et al.Autopilot System Design on Monohull USV-LSS01 Using PID-Based Sliding Mode Control Method[C]//IOP Conference Series:Earth and Environmental Science.IOP Publishing,2021:012058. [4]CAI W,KORDABAD A B,ESFAHANI H N,et al.MPC-Based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles[C]//2021 60th IEEE Conference on Decision and Control(CDC).IEEE,2021:2990-2995. [5]WINURSITO A,DHEWA O A,NASUHA A,et al.IntegralState Feedback Controller with Coefficient Diagram Method for USV Heading Control[C]//2022 5th International Conference on Information and Communications Technology(ICOIACT).IEEE,2022:295-300. [6]HE S,DAI S L,ZHAO Z,et al.Uncertainty and Disturbance Es-timator-Based Distributed Synchronization Control for Multiple Marine Surface Vehicles with Prescribed Performance[J].Ocean Engineering,2022,261:111867. [7]JIANG X,XIA G.Sliding Mode Formation Control of Leaderless Unmanned Surface Vehicles with Environmental Disturbances[J].Ocean Engineering,2022,244:110301. [8]LIU Z,YU L,XIANG Q,et al.Research on USV Trajectory Tracking Method Based on LOS Algorithm[C]//2021 14th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2021:408-411. [9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-Level Control Through Deep Reinforcement Learning[J].Nature,2015,518(7540):529-533. [10]PEROLAT J,DE VYLDER B,HENNES D,et al.Mastering the Game ofStratego with Model-Free Multiagent Reinforcement Learning[J].Science,2022,378(6623):990-996. [11]CHEMIN J,HILL A,LUCET E,et al.A Study of Reinforce-ment Learning Techniques for Path Tracking in Autonomous Vehicles[C]//2024 IEEE Intelligent Vehicles Symposium(IV).IEEE,2024:1442-1449. [12]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243. [13]QIN Y,HUANG B,YIN Z H,et al.Dexpoint:GeneralizablePoint Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation[C]//Conference on Robot Learning.PMLR,2023:594-605. [14]HAN D,MULYANA B,STANKOVIC V,et al.A Survey onDeep Reinforcement Learning Algorithms for Robotic Manipulation[J].Sensors,2023,23(7):3762. [15]WEN Y,CHEN Y,GUO X.USV Trajectory Tracking Control Based on Receding Horizon Reinforcement Learning[J].Sensors,2024,24(9):2771. [16]WANG X,HONG Y,XU J,et al.PID Controller Based on Improved DDPG for Trajectory Tracking Control of USV[J].Journal of Marine Science and Engineering,2024,12(10):1771. [17]GOEL A,CHAUHAN S.Adaptive Look-Ahead Distance forPure Pursuit Controller with Deep Reinforcement Learning Techniques[C]//Proceedings of the 2021 5th International Conference on Advances in Robotics.2021:1-5. [18]FAN L,WANG G,HUANG D A,et al.SECANT:Self-Expert Cloning for Zero-Shot Generalization of Visual Policies[C]//International Conference on Machine Learning.PMLR,2021:3088-3099. [19]XIONG L,YANG X,ZHUO G R,et al.Review on Motion Control of Autonomous Vehicles[J].Journal of Mechanical Engineering,2020,56(10):127-143. [20]ZHAO H M.Method for Robot Path Tracking Based on Fuzzy Adaptive Tuning PID Control[J].Computer Measurement and Control,2024,32(12):146-152. [21]YANG K,TANG X,QIN Y,et al.Comparative Study of Trajectory Tracking Control for Automated Vehicles via Model Predictive Control and Robust H-Infinity State Feedback Control[J].Chinese Journal of Mechanical Engineering,2021,34:1-14. [22]ABDILLAH M,MELLOULI E M.A New Adaptive Second-Order Non-Singular Terminal Sliding Mode Lateral Control Combined with Neural Networks for Autonomous Vehicle[J].International Journal of Vehicle Performance,2024,10(1):50-72. [23]MANCILLA A,GARCÍA-VALDEZ M,CASTILLO O,et al.Optimal Fuzzy Controller Design for Autonomous Robot Path Tracking Using Population-Based Metaheuristics[J].Symmetry,2022,14(2):202. [24]ZHANG X,PAN W,SCATTOLINI R,et al.Robust Tube-Based Model Predictive Control with Koopman Operators[J].Automatica,2022,137:110114. [25]FOSSENT I,PETTERSEN K Y,GALEAZZI R.Line-of-Sight Path Following for Dubins Paths with Adaptive Sideslip Compensation of Drift Forces[J].IEEE Transactions on Control Systems Technology,2014,23(2):820-827. [26]AZAM S,MUNIR F,RAFIQUE M A,et al.N 2 C:Neural Network Controller Design Using Behavioral Cloning[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4744-4756. [27]WANG S,CHEN Z,ZHAO Z,et al.EscIRL:Evolving Self-Contrastive IRL for Trajectory Prediction in Autonomous Driving [C]//8th Annual Conference on Robot Learning.2024. [28]CHEN T,ZHANG Z,FANG Z,et al.Imitation Learning from Imperfect Demonstrations for AUV Path Tracking and Obstacle Avoidance[J].Ocean Engineering,2024,298:117287. [29]YANGS G,CHO E H,KIM J,et al.Deep Reinforcement Learning-Based Path-Tracking for Unmanned Vehicle Navigation Enhancement[C]//2024 International Conference on Electronics,Information,and Communication(ICEIC).IEEE,2024:1-4. [30]JIANG T M,TAN T,LI H,et al.Path Following of 6-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning[J/OL].https://doi.org/10.19678/j.issn.1000-3428.0070197. [31]WANG N,JIA W,WU H J.Path Following of Underactuated Marine Vehicles:A Finite-Time Sideslip-Tangent LOS Guidance Approach[J].Control and Decision,2025,40(1):187-195. [32]YANG Z K,ZHONG W B,FENG Y B,et al.Unmanned Surface Vehicle Track Control Based on Improved LOS and AD-RC[J].Chinese Journal of Ship Research,2021,16(1):121-127,135. [33]YANG C,JIANG X,BAI B,et al.Path Following Control of PID Controller Parameters Optimized by Genetic Algorithm[J].Manufacturing Automation,2022,44(5):78-81. [34]ZHANG J,ZHANG W,TONG S.Adaptive Neural OptimalTracking Control for Uncertain Unmanned Surface Vehicle[J].Ocean Engineering,2024,312:119031. [35]ZHU D,TAO R N,CHEN W,et al.LSTM-Based Sliding Mode Trajectory Tracking Control Algorithm for Unmanned Surface Vehicles[J].Electronic Measurement Technology,2024,47(7):61-68. [36]YANG S M,SHAN Z,DING Y,et al.Survey of Research on Deep Reinforcement Learning[J].Computer Engineering,2021,47(12):19-29. [37]LILLICRAP T P.Continuous Control with Deep Reinforcement Learning[J].arXiv:1509.02971,2015. [38]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870. [39]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:1707.06347,2017. [40]KOSTRIKOV I,NAIR A,LEVINE S.Offline ReinforcementLearning with Implicit Q-Learning[J].arXiv:2110.06169,2021. [41]NAKAMOTO M,ZHAI S,SINGH A,et al.Cal-QL:Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning[J].Advances in Neural Information Processing Systems,2023,36:62244-62269. [42]LUO Y,JI T,SUN F,et al.Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation[J].IEEE Transactions on Neural Networks and Learning Systems,2024,36(2):2705-2719. [43]YU C,VELU A,VINITSKY E,et al.The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624. [44]KARIMIH R,LU Y.Guidance and Control Methodologies for Marine Vehicles:A Survey[J].Control Engineering Practice,2021,111:104785. |
|
||