Computer Science ›› 2025, Vol. 52 ›› Issue (12): 239-251.doi: 10.11896/jsjkx.250200059

• Artificial Intelligence • Previous Articles     Next Articles

Line of Sight Guided Self Expert Cloning with Reinforcement Learning for Unmanned SurfaceVehicle Path Tracking

LIU Jiahui1, ZHAO Yinuo1, TIAN Feng2, QI Guangpeng3,4, LI Jiangtao2, LIU Chi1   

  1. 1 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2 Beijing Civil Aviation Big Data Engineering Technology Research Center, Travelsky Technology Limited, Beijing 100318, China
    3 INSPUR Group Co., Ltd., Jinan 250101, China
    4 INSPUR Yunzhou Industrial Internet Co., Ltd., Jinan 250098, China
  • Received:2025-02-17 Revised:2025-05-19 Online:2025-12-15 Published:2025-12-09
  • About author:LIU Jiahui,born in 1999,postgraduate.His main research interests include deep reinforcement learning and path tracking of unmanned surface vehicles.
    LI Jiangtao,born in 1982,senior engineer.His main research interest is ICT for China aviation industries.
  • Supported by:
    This work was supported by the Intelligent Dynamic Journey Planning Technology Research Project and National Natural Science Fundation of China(U23A20310).

Abstract: Unmanned Surface Vehicle(USV) path tracking is crucial for marine autonomous operations,as environmental factors such as wind,waves,currents,and USV’s control errors can affect tracking performance.Reinforcement learning(RL),with its online interaction and real-time feedback,offers a promising approach for actively adapting to dynamic environments.However,its trial-and-error training process poses safety risks in real-world applications,and the gap between ideal simulation environments and complex real-world conditions further limits its practical effectiveness.To address these challenges,this paper proposes LECUP(Line-of-sight-guided self-Expert Cloning for USV Path tracking),a new algorithm designed for complex marine environments.LECUP first trains an RL expert in a still water environment and then uses self-expert cloning to transfer the agent to a more complex environment.To ensure effective knowledge transfer,LECUP introduces a data filling mechanism,where the ex-periences accumulated by the self-expert in the still-water environment are dimensionally padded and stored for initializing the agent in the complex environment.Then,reinforcement learning is used to fine-tune the agent in the complex environment,further enabling adaptation to the complexities of the environment.Moreover,LECUP incorporates a line-of-sight guidance module to calculate the target heading,decoupling the path tracking control from the specific geometry of the path and enhancing the USV’s adaptability to various path shapes.This method enables ongoing policy refinement in complex environments while mitigating safety risks associated with random initialization.Extensive experimental results show that LECUP performs better than baseline methods in path tracking tasks,especially under challenging conditions.

Key words: Unmanned surface vehicle, Path tracking, Reinforcement learning, Self-expert cloning, Line of sight

CLC Number: 

  • TP181
[1]ERM J,MA C,LIU T,et al.Intelligent Motion Control of Unmanned Surface Vehicles:A Critical Review[J].Ocean Engineering,2023,280:114562.
[2]ZHANG W S,LI Z,ZHENG Y.The Current Status and Re-search and Development Trend of Unmanned Ships at Home and Abroad[J].Ship Science and Technology,2024,46(15):79-83.
[3]ALIM M F A,KADIR R E A,GAMAYANTI N,et al.Autopilot System Design on Monohull USV-LSS01 Using PID-Based Sliding Mode Control Method[C]//IOP Conference Series:Earth and Environmental Science.IOP Publishing,2021:012058.
[4]CAI W,KORDABAD A B,ESFAHANI H N,et al.MPC-Based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles[C]//2021 60th IEEE Conference on Decision and Control(CDC).IEEE,2021:2990-2995.
[5]WINURSITO A,DHEWA O A,NASUHA A,et al.IntegralState Feedback Controller with Coefficient Diagram Method for USV Heading Control[C]//2022 5th International Conference on Information and Communications Technology(ICOIACT).IEEE,2022:295-300.
[6]HE S,DAI S L,ZHAO Z,et al.Uncertainty and Disturbance Es-timator-Based Distributed Synchronization Control for Multiple Marine Surface Vehicles with Prescribed Performance[J].Ocean Engineering,2022,261:111867.
[7]JIANG X,XIA G.Sliding Mode Formation Control of Leaderless Unmanned Surface Vehicles with Environmental Disturbances[J].Ocean Engineering,2022,244:110301.
[8]LIU Z,YU L,XIANG Q,et al.Research on USV Trajectory Tracking Method Based on LOS Algorithm[C]//2021 14th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2021:408-411.
[9]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-Level Control Through Deep Reinforcement Learning[J].Nature,2015,518(7540):529-533.
[10]PEROLAT J,DE VYLDER B,HENNES D,et al.Mastering the Game ofStratego with Model-Free Multiagent Reinforcement Learning[J].Science,2022,378(6623):990-996.
[11]CHEMIN J,HILL A,LUCET E,et al.A Study of Reinforce-ment Learning Techniques for Path Tracking in Autonomous Vehicles[C]//2024 IEEE Intelligent Vehicles Symposium(IV).IEEE,2024:1442-1449.
[12]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243.
[13]QIN Y,HUANG B,YIN Z H,et al.Dexpoint:GeneralizablePoint Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation[C]//Conference on Robot Learning.PMLR,2023:594-605.
[14]HAN D,MULYANA B,STANKOVIC V,et al.A Survey onDeep Reinforcement Learning Algorithms for Robotic Manipulation[J].Sensors,2023,23(7):3762.
[15]WEN Y,CHEN Y,GUO X.USV Trajectory Tracking Control Based on Receding Horizon Reinforcement Learning[J].Sensors,2024,24(9):2771.
[16]WANG X,HONG Y,XU J,et al.PID Controller Based on Improved DDPG for Trajectory Tracking Control of USV[J].Journal of Marine Science and Engineering,2024,12(10):1771.
[17]GOEL A,CHAUHAN S.Adaptive Look-Ahead Distance forPure Pursuit Controller with Deep Reinforcement Learning Techniques[C]//Proceedings of the 2021 5th International Conference on Advances in Robotics.2021:1-5.
[18]FAN L,WANG G,HUANG D A,et al.SECANT:Self-Expert Cloning for Zero-Shot Generalization of Visual Policies[C]//International Conference on Machine Learning.PMLR,2021:3088-3099.
[19]XIONG L,YANG X,ZHUO G R,et al.Review on Motion Control of Autonomous Vehicles[J].Journal of Mechanical Engineering,2020,56(10):127-143.
[20]ZHAO H M.Method for Robot Path Tracking Based on Fuzzy Adaptive Tuning PID Control[J].Computer Measurement and Control,2024,32(12):146-152.
[21]YANG K,TANG X,QIN Y,et al.Comparative Study of Trajectory Tracking Control for Automated Vehicles via Model Predictive Control and Robust H-Infinity State Feedback Control[J].Chinese Journal of Mechanical Engineering,2021,34:1-14.
[22]ABDILLAH M,MELLOULI E M.A New Adaptive Second-Order Non-Singular Terminal Sliding Mode Lateral Control Combined with Neural Networks for Autonomous Vehicle[J].International Journal of Vehicle Performance,2024,10(1):50-72.
[23]MANCILLA A,GARCÍA-VALDEZ M,CASTILLO O,et al.Optimal Fuzzy Controller Design for Autonomous Robot Path Tracking Using Population-Based Metaheuristics[J].Symmetry,2022,14(2):202.
[24]ZHANG X,PAN W,SCATTOLINI R,et al.Robust Tube-Based Model Predictive Control with Koopman Operators[J].Automatica,2022,137:110114.
[25]FOSSENT I,PETTERSEN K Y,GALEAZZI R.Line-of-Sight Path Following for Dubins Paths with Adaptive Sideslip Compensation of Drift Forces[J].IEEE Transactions on Control Systems Technology,2014,23(2):820-827.
[26]AZAM S,MUNIR F,RAFIQUE M A,et al.N 2 C:Neural Network Controller Design Using Behavioral Cloning[J].IEEE Transactions on Intelligent Transportation Systems,2021,22(7):4744-4756.
[27]WANG S,CHEN Z,ZHAO Z,et al.EscIRL:Evolving Self-Contrastive IRL for Trajectory Prediction in Autonomous Driving [C]//8th Annual Conference on Robot Learning.2024.
[28]CHEN T,ZHANG Z,FANG Z,et al.Imitation Learning from Imperfect Demonstrations for AUV Path Tracking and Obstacle Avoidance[J].Ocean Engineering,2024,298:117287.
[29]YANGS G,CHO E H,KIM J,et al.Deep Reinforcement Learning-Based Path-Tracking for Unmanned Vehicle Navigation Enhancement[C]//2024 International Conference on Electronics,Information,and Communication(ICEIC).IEEE,2024:1-4.
[30]JIANG T M,TAN T,LI H,et al.Path Following of 6-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning[J/OL].https://doi.org/10.19678/j.issn.1000-3428.0070197.
[31]WANG N,JIA W,WU H J.Path Following of Underactuated Marine Vehicles:A Finite-Time Sideslip-Tangent LOS Guidance Approach[J].Control and Decision,2025,40(1):187-195.
[32]YANG Z K,ZHONG W B,FENG Y B,et al.Unmanned Surface Vehicle Track Control Based on Improved LOS and AD-RC[J].Chinese Journal of Ship Research,2021,16(1):121-127,135.
[33]YANG C,JIANG X,BAI B,et al.Path Following Control of PID Controller Parameters Optimized by Genetic Algorithm[J].Manufacturing Automation,2022,44(5):78-81.
[34]ZHANG J,ZHANG W,TONG S.Adaptive Neural OptimalTracking Control for Uncertain Unmanned Surface Vehicle[J].Ocean Engineering,2024,312:119031.
[35]ZHU D,TAO R N,CHEN W,et al.LSTM-Based Sliding Mode Trajectory Tracking Control Algorithm for Unmanned Surface Vehicles[J].Electronic Measurement Technology,2024,47(7):61-68.
[36]YANG S M,SHAN Z,DING Y,et al.Survey of Research on Deep Reinforcement Learning[J].Computer Engineering,2021,47(12):19-29.
[37]LILLICRAP T P.Continuous Control with Deep Reinforcement Learning[J].arXiv:1509.02971,2015.
[38]HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor[C]//International Conference on Machine Learning.PMLR,2018:1861-1870.
[39]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:1707.06347,2017.
[40]KOSTRIKOV I,NAIR A,LEVINE S.Offline ReinforcementLearning with Implicit Q-Learning[J].arXiv:2110.06169,2021.
[41]NAKAMOTO M,ZHAI S,SINGH A,et al.Cal-QL:Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning[J].Advances in Neural Information Processing Systems,2023,36:62244-62269.
[42]LUO Y,JI T,SUN F,et al.Goal-Conditioned Hierarchical Reinforcement Learning with High-Level Model Approximation[J].IEEE Transactions on Neural Networks and Learning Systems,2024,36(2):2705-2719.
[43]YU C,VELU A,VINITSKY E,et al.The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games[J].Advances in Neural Information Processing Systems,2022,35:24611-24624.
[44]KARIMIH R,LU Y.Guidance and Control Methodologies for Marine Vehicles:A Survey[J].Control Engineering Practice,2021,111:104785.
[1] DONG Min, TAN Haoyu, BI Sheng. Robot Indoor Navigation System for Narrow Environments [J]. Computer Science, 2025, 52(9): 320-329.
[2] ZHU Shihao, PENG Kexing, MA Tinghuai. Graph Attention-based Grouped Multi-agent Reinforcement Learning Method [J]. Computer Science, 2025, 52(9): 330-336.
[3] CHEN Jintao, LIN Bing, LIN Song, CHEN Jing, CHEN Xing. Dynamic Pricing and Energy Scheduling Strategy for Photovoltaic Storage Charging Stations Based on Multi-agent Deep Reinforcement Learning [J]. Computer Science, 2025, 52(9): 337-345.
[4] ZHANG Yongliang, LI Ziwen, XU Jiahao, JIANG Yuchen, CUI Ying. Congestion-aware and Cached Communication for Multi-agent Pathfinding [J]. Computer Science, 2025, 52(8): 317-325.
[5] HUO Dan, YU Fuping, SHEN Di, HAN Xueyan. Research on Multi-machine Conflict Resolution Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(7): 271-278.
[6] PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[7] XU Dan, WANG Jiangtao. Design of Autonomous Decision for Trajectory Optimization of Intelligent Morphing Aircraft [J]. Computer Science, 2025, 52(6A): 240600068-7.
[8] WU Zongming, CAO Jijun, TANG Qiang. Online Parallel SDN Routing Optimization Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240900018-9.
[9] ZHAO Chanchan, YANG Xingchen, SHI Bao, LYU Fei, LIU Libin. Optimization Strategy of Task Offloading Based on Meta Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240800050-8.
[10] ZHAO Xuejian, YE Hao, LI Hao, SUN Zhixin. Multi-AGV Path Planning Algorithm Based on Improved DDPG [J]. Computer Science, 2025, 52(6): 306-315.
[11] WANG Chenyuan, ZHANG Yanmei, YUAN Guan. Class Integration Test Order Generation Approach Fused with Deep Reinforcement Learning andGraph Convolutional Neural Network [J]. Computer Science, 2025, 52(6): 58-65.
[12] LI Yuanbo, HU Hongchao, YANG Xiaohan, GUO Wei, LIU Wenyan. Intrusion Tolerance Scheduling Algorithm for Microservice Workflow Based on Deep Reinforcement Learning [J]. Computer Science, 2025, 52(5): 375-383.
[13] KONG Chao, WANG Wei, HUANG Subin, ZHANG Yi, MENG Dan. Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on ImprovedProximal Policy Optimization [J]. Computer Science, 2025, 52(4): 40-48.
[14] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[15] ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!