计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 239-251.doi: 10.11896/jsjkx.250200059

• 人工智能 • 上一篇    下一篇

视线引导与自专家克隆融合强化学习的无人船路径跟踪

刘嘉辉1, 赵一诺1, 田丰2, 齐光鹏3,4, 李江涛2, 刘驰1   

  1. 1 北京理工大学计算机学院 北京 100081
    2 中国民航信息网络股份有限公司北京市民航大数据工程技术研究中心 北京 100318
    3 浪潮集团有限公司 济南 250101
    4 浪潮云洲工业互联网有限公司 济南 250098
  • 收稿日期:2025-02-17 修回日期:2025-05-19 出版日期:2025-12-15 发布日期:2025-12-09
  • 通讯作者: 李江涛(lijtao@travelsky.com.cn)
  • 作者简介:(ljhjiayoua@126.com)
  • 基金资助:
    智能动态行程规划技术研究项目;国家自然科学基金(U23A20310)

Line of Sight Guided Self Expert Cloning with Reinforcement Learning for Unmanned SurfaceVehicle Path Tracking

LIU Jiahui1, ZHAO Yinuo1, TIAN Feng2, QI Guangpeng3,4, LI Jiangtao2, LIU Chi1   

  1. 1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2 Beijing Civil Aviation Big Data Engineering Technology Research Center, Travelsky Technology Limited, Beijing 100318, China
    3 INSPUR Group Co., Ltd., Jinan 250101, China
    4 INSPUR Yunzhou Industrial Internet Co., Ltd., Jinan 250098, China
  • Received:2025-02-17 Revised:2025-05-19 Published:2025-12-15 Online:2025-12-09
  • About author:LIU Jiahui,born in 1999,postgraduate.His main research interests include deep reinforcement learning and path tracking of unmanned surface vehicles.
    LI Jiangtao,born in 1982,senior engineer.His main research interest is ICT for China aviation industries.
  • Supported by:
    This work was supported by the Intelligent Dynamic Journey Planning Technology Research Project and National Natural Science Fundation of China(U23A20310).

摘要: 无人船路径跟踪对海上自主作业至关重要,然而,风、浪、流以及无人船自身的控制误差等因素会影响路径跟踪的性能。强化学习算法凭借在线交互与实时反馈的特点,能够主动适应动态环境,在无人船路径跟踪任务中展现出良好的应用前景。然而,其试错训练模式在实际应用中存在安全风险,且理想仿真场景与现实复杂环境之间的差距也进一步制约了强化学习在实际应用中的效果。针对这些挑战,提出了一种视线引导与自专家克隆融合强化学习的无人船路径跟踪算法LECUP。LECUP算法首先在静水环境中训练专家策略,随后通过自专家克隆将智能体迁移至更复杂的环境中。为了确保知识能够有效传递,LECUP算法引入数据填充机制,将自专家在静水环境中积累的经验数据进行升维填充并存储,并以此初始化复杂环境中的智能体。之后,运用强化学习算法对智能体在复杂环境中进行微调,从而进一步适应复杂环境。此外,LECUP算法结合视线算法计算目标航向,将路径跟踪控制与路径几何形状解耦,增强了无人船对不同路径形状的适应能力。该方法不仅能够在复杂环境中持续优化策略,还能缓解随机初始化带来的安全风险。大量实验结果表明,相较于基线方法,LECUP算法能够更好地完成无人船路径跟踪任务。

关键词: 无人船, 路径跟踪, 强化学习, 自专家克隆, 视线算法

Abstract: Unmanned Surface Vehicle(USV) path tracking is crucial for marine autonomous operations,as environmental factors such as wind,waves,currents,and USV’s control errors can affect tracking performance.Reinforcement learning(RL),with its online interaction and real-time feedback,offers a promising approach for actively adapting to dynamic environments.However,its trial-and-error training process poses safety risks in real-world applications,and the gap between ideal simulation environments and complex real-world conditions further limits its practical effectiveness.To address these challenges,this paper proposes LECUP(Line-of-sight-guided self-Expert Cloning for USV Path tracking),a new algorithm designed for complex marine environments.LECUP first trains an RL expert in a still water environment and then uses self-expert cloning to transfer the agent to a more complex environment.To ensure effective knowledge transfer,LECUP introduces a data filling mechanism,where the ex-periences accumulated by the self-expert in the still-water environment are dimensionally padded and stored for initializing the agent in the complex environment.Then,reinforcement learning is used to fine-tune the agent in the complex environment,further enabling adaptation to the complexities of the environment.Moreover,LECUP incorporates a line-of-sight guidance module to calculate the target heading,decoupling the path tracking control from the specific geometry of the path and enhancing the USV’s adaptability to various path shapes.This method enables ongoing policy refinement in complex environments while mitigating safety risks associated with random initialization.Extensive experimental results show that LECUP performs better than baseline methods in path tracking tasks,especially under challenging conditions.

Key words: Unmanned surface vehicle, Path tracking, Reinforcement learning, Self-expert cloning, Line of sight

中图分类号: 

  • TP181
[1]ERM J,MA C,LIU T,et al.Intelligent Motion Control of Unmanned Surface Vehicles:A Critical Review[J].Ocean Engineering,2023,280:114562.
[2]ZHANG W S,LI Z,ZHENG Y.The Current Status and Re-search and Development Trend of Unmanned Ships at Home and Abroad[J].Ship Science and Technology,2024,46(15):79-83.
[3]ALIM M F A,KADIR R E A,GAMAYANTI N,et al.Autopilot System Design on Monohull USV-LSS01 Using PID-Based Sliding Mode Control Method[C]//IOP Conference Series:Earth and Environmental Science.IOP Publishing,2021:012058.
[4]CAI W,KORDABAD A B,ESFAHANI H N,et al.MPC-Based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles[C]//2021 60th IEEE Conference on Decision and Control(CDC).IEEE,2021:2990-2995.
[5]WINURSITO A,DHEWA O A,NASUHA A,et al.IntegralState Feedback Controller with Coefficient Diagram Method for USV Heading Control[C]//2022 5th International Conference on Information and Communications Technology(ICOIACT).IEEE,2022:295-300.
[6]HE S,DAI S L,ZHAO Z,et al.Uncertainty and Disturbance Es-timator-Based Distributed Synchronization Control for Multiple Marine Surface Vehicles with Prescribed Performance[J].Ocean Engineering,2022,261:111867.
[7]JIANG X,XIA G.Sliding Mode Formation Control of Leaderless Unmanned Surface Vehicles with Environmental Disturbances[J].Ocean Engineering,2022,244:110301.
[8]LIU Z,YU L,XIANG Q,et al.Research on USV Trajectory Tracking Method Based on LOS Algorithm[C]//2021 14th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2021:408-411.
[9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-Level Control Through Deep Reinforcement Learning[J].Nature,2015,518(7540):529-533.
[10]PEROLAT J,DE VYLDER B,HENNES D,et al.Mastering the Game ofStratego with Model-Free Multiagent Reinforcement Learning[J].Science,2022,378(6623):990-996.
[11]CHEMIN J,HILL A,LUCET E,et al.A Study of Reinforce-ment Learning Techniques for Path Tracking in Autonomous Vehicles[C]//2024 IEEE Intelligent Vehicles Symposium(IV).IEEE,2024:1442-1449.
[12]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243.
[13]QIN Y,HUANG B,YIN Z H,et al.Dexpoint:GeneralizablePoint Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation[C]//Conference on Robot Learning.PMLR,2023:594-605.
[14]HAN D,MULYANA B,STANKOVIC V,et al.A Survey onDeep Reinforcement Learning Algorithms for Robotic Manipulation[J].Sensors,2023,23(7):3762.
[15]WEN Y,CHEN Y,GUO X.USV Trajectory Tracking Control Based on Receding Horizon Reinforcement Learning[J].Sensors,2024,24(9):2771.
[16]WANG X,HONG Y,XU J,et al.PID Controller Based on Improved DDPG for Trajectory Tracking Control of USV[J].Journal of Marine Science and Engineering,2024,12(10):1771.
[17]GOEL A,CHAUHAN S.Adaptive Look-Ahead Distance forPure Pursuit Controller with Deep Reinforcement Learning Techniques[C]//Proceedings of the 2021 5th International Conference on Advances in Robotics.2021:1-5.
[18]FAN L,WANG G,HUANG D A,et al.SECANT:Self-Expert Cloning for Zero-Shot Generalization of Visual Policies[C]//International Conference on Machine Learning.PMLR,2021:3088-3099.
[19]XIONG L,YANG X,ZHUO G R,et al.Review on Motion Control of Autonomous Vehicles[J].Journal of Mechanical Engineering,2020,56(10):127-143.
[20]ZHAO H M.Method for Robot Path Tracking Based on Fuzzy Adaptive Tuning PID Control[J].Computer Measurement and Control,2024,32(12):146-152.
[21]YANG K,TANG X,QIN Y,et al.Comparative Study of Trajectory Tracking Control for Automated Vehicles via Model Predictive Control and Robust H-Infinity State Feedback Control[J].Chinese Journal of Mechanical Engineering,2021,34:1-14.
[22]ABDILLAH M,MELLOULI E M.A New Adaptive Second-Order Non-Singular Terminal Sliding Mode Lateral Control Combined with Neural Networks for Autonomous Vehicle[J].International Journal of Vehicle Performance,2024,10(1):50-72.
[23]MANCILLA A,GARCÍA-VALDEZ M,CASTILLO O,et al.Optimal Fuzzy Controller Design for Autonomous Robot Path Tracking Using Population-Based Metaheuristics[J].Symmetry,2022,14(2):202.
[24]ZHANG X,PAN W,SCATTOLINI R,et al.Robust Tube-Based Model Predictive Control with Koopman Operators[J].Automatica,2022,137:110114.
[25]FOSSENT I,PETTERSEN K Y,GALEAZZI R.Line-of-Sight Path Following for Dubins Paths with Adaptive Sideslip Compensation of Drift Forces[J].IEEE Transactions on Control Systems Technology,2014,23(2):820-827.
[26]AZAM S,MUNIR F,RAFIQUE M A,et al.N 2 C:Neural Network Controller Design Using Behavioral Cloning[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4744-4756.
[27]WANG S,CHEN Z,ZHAO Z,et al.EscIRL:Evolving Self-Contrastive IRL for Trajectory Prediction in Autonomous Driving [C]//8th Annual Conference on Robot Learning.2024.
[28]CHEN T,ZHANG Z,FANG Z,et al.Imitation Learning from Imperfect Demonstrations for AUV Path Tracking and Obstacle Avoidance[J].Ocean Engineering,2024,298:117287.
[29]YANGS G,CHO E H,KIM J,et al.Deep Reinforcement Learning-Based Path-Tracking for Unmanned Vehicle Navigation Enhancement[C]//2024 International Conference on Electronics,Information,and Communication(ICEIC).IEEE,2024:1-4.
[30]JIANG T M,TAN T,LI H,et al.Path Following of 6-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning[J/OL].https://doi.org/10.19678/j.issn.1000-3428.0070197.
[31]WANG N,JIA W,WU H J.Path Following of Underactuated Marine Vehicles:A Finite-Time Sideslip-Tangent LOS Guidance Approach[J].Control and Decision,2025,40(1):187-195.
[32]YANG Z K,ZHONG W B,FENG Y B,et al.Unmanned Surface Vehicle Track Control Based on Improved LOS and AD-RC[J].Chinese Journal of Ship Research,2021,16(1):121-127,135.
[33]YANG C,JIANG X,BAI B,et al.Path Following Control of PID Controller Parameters Optimized by Genetic Algorithm[J].Manufacturing Automation,2022,44(5):78-81.
[34]ZHANG J,ZHANG W,TONG S.Adaptive Neural OptimalTracking Control for Uncertain Unmanned Surface Vehicle[J].Ocean Engineering,2024,312:119031.
[35]ZHU D,TAO R N,CHEN W,et al.LSTM-Based Sliding Mode Trajectory Tracking Control Algorithm for Unmanned Surface Vehicles[J].Electronic Measurement Technology,2024,47(7):61-68.
[36]YANG S M,SHAN Z,DING Y,et al.Survey of Research on Deep Reinforcement Learning[J].Computer Engineering,2021,47(12):19-29.
[37]LILLICRAP T P.Continuous Control with Deep Reinforcement Learning[J].arXiv:1509.02971,2015.
[38]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870.
[39]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:1707.06347,2017.
[40]KOSTRIKOV I,NAIR A,LEVINE S.Offline ReinforcementLearning with Implicit Q-Learning[J].arXiv:2110.06169,2021.
[41]NAKAMOTO M,ZHAI S,SINGH A,et al.Cal-QL:Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning[J].Advances in Neural Information Processing Systems,2023,36:62244-62269.
[42]LUO Y,JI T,SUN F,et al.Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation[J].IEEE Transactions on Neural Networks and Learning Systems,2024,36(2):2705-2719.
[43]YU C,VELU A,VINITSKY E,et al.The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624.
[44]KARIMIH R,LU Y.Guidance and Control Methodologies for Marine Vehicles:A Survey[J].Control Engineering Practice,2021,111:104785.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!