计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 40-48.doi: 10.11896/jsjkx.241000084

• 智能嵌入式系统 • 上一篇    下一篇

基于改进近端策略优化的无人艇自主避障方法

孔超1, 王维1, 皇苏斌1, 张义1, 孟丹2   

  1. 1 安徽工程大学计算机与信息学院 安徽 芜湖 241000
    2 OPPO研究院 广东 深圳 518000
  • 收稿日期:2024-10-17 修回日期:2025-02-18 出版日期:2025-04-15 发布日期:2025-04-14
  • 通讯作者: 孟丹(mengdan90@163.com)
  • 作者简介:(kongchao@ahpu.edu.cn)
  • 基金资助:
    安徽省高等学校科学研究项目(2023AH050914,2024AH052239);安徽省高等学校省级质量工程项目(2023zybj018);安徽省自然科学基金(2308085MF220);芜湖市科技计划项目(2023pt07,2023ly13);安徽工程大学本科教学质量提升计划项目(2022lzyybj02,2023jyxm15,2024jyxm76)

Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on ImprovedProximal Policy Optimization

KONG Chao1, WANG Wei1, HUANG Subin1, ZHANG Yi1, MENG Dan2   

  1. 1 School of Computer and Information,Anhui Polytechnic University,Wuhu,Anhui 241000,China
    2 Oppo Research Institute,Shenzhen,Guangdong 518000,China
  • Received:2024-10-17 Revised:2025-02-18 Online:2025-04-15 Published:2025-04-14
  • About author:KONG Chao,born in 1986,Ph.D,professor.His main research interests include massive data mining,smart education,and recommender systems.
    MENG Dan,born in 1990,Ph.D,senior research expert.Her main research interests include multimodal machine learning,trustworthy AI,federated learning,and cloud-edge-IoT.
  • Supported by:
    Science Research Project of Anhui Higher Education Institutions(2023AH050914,2024AH052239),Quality Engineering Project of Anhui Higher Education Institutions(2023zybj018),Anhui Provincial Natural Science Foundation(2308085MF220),Science and Technology Project of Wuhu City(2023pt07,2023ly13) and Quality Improvement Program of Anhui Polytechnic University(2022lzyybj02,2023jyxm15,2024jyxm76).

摘要: 无人艇自主避障已成为其拓展应用场景的一项关键挑战。传统方法下无人艇避障主要依赖于对环境的精细建模,然而,复杂海洋环境下无人艇难以获取完整的感知状态,导致模型精度不足。针对上述问题,提出了一种改进近端策略优化的无人艇自主避障方法。首先,构建了基于马尔可夫决策过程的无人艇自主避障决策框架;然后,在近端策略优化算法中融合了循环神经网络的感知表征增强模块,提高无人艇对时序环境感知的记忆能力;最后,结合奖励重塑机制设计一套自主避障奖励函数,提升无人艇避障策略的优化速度。为了验证算法的有效性,在三维仿真平台下构建了典型无人艇自主避障算法的验证场景。实验结果表明,基于改进近端策略优化方法能够实现无人艇无碰撞自主航行,在模型收敛速度、碰撞率与超时率上均优于传统近端策略算法。

关键词: 无人艇, 自主避障, 近端策略优化, 时序决策, 奖励重塑

Abstract: Autonomous obstacle avoidance has become a critical challenge for expanding the application scenarios of unmanned surface vehicles(USVs).Traditional methods for USVs obstacle avoidance mainly rely on fine-grained environmental modeling.However,in complex marine environments,USVs have difficulty obtaining complete perception states,leading to insufficient model accuracy.To address this issue,we propose an improved proximal policy optimization(PPO)-based autonomous obstacle avoidance method for USVs.First,a perception and decision framework for USVs based on Markov decision process is constructed.Then,a feature-sharing representation optimization module is designed by fusing recurrent neural networks to enhance the USV’s memory ability for temporal environmental perception.Finally,an autonomous obstacle avoidance reward function is designed by combining reward reshaping mechanisms to improve the optimization speed of the USV obstacle avoidance strategy.To verify the effectiveness of the proposed algorithm,a typical USV autonomous obstacle avoidance algorithm verification scenario is constructed on a three-dimensional simulation platform.Experimental results show that the improved PPO-based method can achieve collision-free autonomous navigation for USVs and outperforms the traditional PPO algorithm in terms of model convergence speed,collision rate,and timeout rate.

Key words: Unmanned surface vehicles, Autonomous obstacle avoidance, Proximal policy optimization, Temporal perception, Reward shaping

中图分类号: 

  • U664.82
[1]BARRERA C,PADRON I,LUIS F S,et al.Trends and challenges in unmanned surface vehicles(USV):From survey to shipping[J].TransNav:International Journal on Marine Navigation and Safety of Sea Transportation,2021,15(1):135-142.
[2]YAN R,PANG S,SUN H,et al.Development and missions of unmanned surface vehicle[J].Journal of Marine Science and Application,2010,9:451-457.
[3]POLVARA R,SHARMA S,WAN J,et al.Obstacle avoidance approaches for autonomous navigation of unmanned surface vehicles[J].The Journal of Navigation,2018,71(1):241-256.
[4]GUAN W,WANG K.Autonomous collision avoidance of un-manned surface vehicles based on improved A-star and dynamic window approach algorithms[J].IEEE Intelligent Transportation Systems Magazine,2023,15(3):36-50.
[5]ZHANG T,LI Q,ZHANG C,et al.Current trends in the deve-lopment of intelligent unmanned autonomous systems[J].Frontiers of information technology & electronic engineering,2017,18:68-85.
[6]MA Y,WANG Z,YANG H,et al.Artificial intelligence applications in the development of autonomous vehicles:A survey[J].IEEE/CAA Journal of Automatica Sinica,2020,7(2):315-329.
[7]DONG S,WANG P,ABBAS K.A survey on deep learning and its applications[J].Computer Science Review,2021,40:100379.
[8]YE D,LIU Z,SUN M,et al.Mastering complex control in moba games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6672-6679.
[9]LU J,HAN L,WEI Q,et al.Event-triggered deep reinforcement learning using parallel control:A case study in autonomous dri-ving[J].IEEE Transactions on Intelligent Vehicles,2023,8(4):2821-2831.
[10]SINGH B,KUMAR R,SINGH V P.Reinforcement learning in robotic applications:a comprehensive survey[J].Artificial Intelligence Review,2022,55(2):945-990.
[11]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[12]GUAN W,WANG K.Autonomous collision avoidance of unmanned surface vehicles based on improved A-star and dynamic window approach algorithms[J].IEEE Intelligent Transportation Systems Magazine,2023,15(3):36-50.
[13]BAI X,LI B,XU X,et al.USV path planning algorithm based on plant growth[J].Ocean Engineering,2023,273:113965.
[14]YU J,YANG M,ZHAO Z,et al.Path planning of unmanned surface vessel in an unknown environment based on improved D* Lite algorithm[J].Ocean Engineering,2022,266:112873.
[15]OUYANG Z,WANG H,HUANG Y,et al.Path planning technologies for USV formation based on improved RRT[J].Chinese Journal of Ship Research,2020,15(3):18-24.
[16]ZHAO Y,MA Y,HU S.USV formation and path-followingcontrol via deep reinforcement learning with random braking[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(12):5468-5478.
[17]WU X,CHEN H,CHEN C,et al.The autonomous navigationand obstacle avoidance for USVs with ANOA deep reinforcement learning method[J].Knowledge-Based Systems,2020,196:105201.
[18]XU X,LU Y,LIU X,et al.Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs[J].Ocean Engineering,2020,217:107704.
[19]GAN W,QU X,SONG D,et al.Multi-usv cooperative chasing strategy based on obstacles assistance and deep reinforcement learning[J].IEEE Transactions on Automation Science and Engineering,2023,21(4)::5895-5910.
[20]WANG W,LUO X,LI Y,et al.Unmanned surface vessel obstacle avoidance with prior knowledge‐based reward shaping[J].Concurrency and Computation:Practice and Experience,2021,33(9):e6110.
[21]RAMACHANDRAN P,ZOPH B,LE Q V.Searching for activation functions[J].arXiv:1710.05941,2017.
[22]PHANICHRAKSAPHONG V,TSAI W H.An Empirical Ge-neration Technique on Background Music Using Gated Recurrent Neural Networks[C]//2023 International Conference on Consumer Electronics-Taiwan.IEEE,2023:691-692.
[23]NG A Y,HARADA D,RUSSELL S.Policy invariance under reward transformations:Theory and application to reward shaping[C]//Proceedings of the Sixteenth International Conference on Machine Learning.1999:278-287.
[24]ALMÓN-MANZANO L,PASTOR-VARGAS R,TRONCOSO J M C.Deep reinforcement learning in agents’ training:Unity ML-agents[C]//International Work-Conference on the Interplay Between Natural and Artificial Computation.Cham:SpringerInternational Publishing,2022:391-400.
[25]LILLICRAP T P.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!