基于改进近端策略优化的无人艇自主避障方法

doi:10.11896/jsjkx.241000084

Computer Science ›› 2025, Vol. 52 ›› Issue (4): 40-48.doi: 10.11896/jsjkx.241000084

• Smart Embedded Systems • Previous Articles Next Articles

Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on ImprovedProximal Policy Optimization

KONG Chao¹, WANG Wei¹, HUANG Subin¹, ZHANG Yi¹, MENG Dan²

1 School of Computer and Information,Anhui Polytechnic University,Wuhu,Anhui 241000,China
2 Oppo Research Institute,Shenzhen,Guangdong 518000,China

Received:2024-10-17 Revised:2025-02-18 Online:2025-04-15 Published:2025-04-14
About author:KONG Chao,born in 1986,Ph.D,professor.His main research interests include massive data mining,smart education,and recommender systems.
MENG Dan,born in 1990,Ph.D,senior research expert.Her main research interests include multimodal machine learning,trustworthy AI,federated learning,and cloud-edge-IoT.
Supported by:
Science Research Project of Anhui Higher Education Institutions(2023AH050914,2024AH052239),Quality Engineering Project of Anhui Higher Education Institutions(2023zybj018),Anhui Provincial Natural Science Foundation(2308085MF220),Science and Technology Project of Wuhu City(2023pt07,2023ly13) and Quality Improvement Program of Anhui Polytechnic University(2022lzyybj02,2023jyxm15,2024jyxm76).

Abstract

Abstract: Autonomous obstacle avoidance has become a critical challenge for expanding the application scenarios of unmanned surface vehicles(USVs).Traditional methods for USVs obstacle avoidance mainly rely on fine-grained environmental modeling.However,in complex marine environments,USVs have difficulty obtaining complete perception states,leading to insufficient model accuracy.To address this issue,we propose an improved proximal policy optimization(PPO)-based autonomous obstacle avoidance method for USVs.First,a perception and decision framework for USVs based on Markov decision process is constructed.Then,a feature-sharing representation optimization module is designed by fusing recurrent neural networks to enhance the USV’s memory ability for temporal environmental perception.Finally,an autonomous obstacle avoidance reward function is designed by combining reward reshaping mechanisms to improve the optimization speed of the USV obstacle avoidance strategy.To verify the effectiveness of the proposed algorithm,a typical USV autonomous obstacle avoidance algorithm verification scenario is constructed on a three-dimensional simulation platform.Experimental results show that the improved PPO-based method can achieve collision-free autonomous navigation for USVs and outperforms the traditional PPO algorithm in terms of model convergence speed,collision rate,and timeout rate.

Key words: Unmanned surface vehicles, Autonomous obstacle avoidance, Proximal policy optimization, Temporal perception, Reward shaping

CLC Number:

U664.82

KONG Chao, WANG Wei, HUANG Subin, ZHANG Yi, MENG Dan. Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on ImprovedProximal Policy Optimization[J].Computer Science, 2025, 52(4): 40-48.

References

[1]BARRERA C,PADRON I,LUIS F S,et al.Trends and challenges in unmanned surface vehicles(USV):From survey to shipping[J].TransNav:International Journal on Marine Navigation and Safety of Sea Transportation,2021,15(1):135-142.
[2]YAN R,PANG S,SUN H,et al.Development and missions of unmanned surface vehicle[J].Journal of Marine Science and Application,2010,9:451-457.
[3]POLVARA R,SHARMA S,WAN J,et al.Obstacle avoidance approaches for autonomous navigation of unmanned surface vehicles[J].The Journal of Navigation,2018,71(1):241-256.
[4]GUAN W,WANG K.Autonomous collision avoidance of un-manned surface vehicles based on improved A-star and dynamic window approach algorithms[J].IEEE Intelligent Transportation Systems Magazine,2023,15(3):36-50.
[5]ZHANG T,LI Q,ZHANG C,et al.Current trends in the deve-lopment of intelligent unmanned autonomous systems[J].Frontiers of information technology & electronic engineering,2017,18:68-85.
[6]MA Y,WANG Z,YANG H,et al.Artificial intelligence applications in the development of autonomous vehicles:A survey[J].IEEE/CAA Journal of Automatica Sinica,2020,7(2):315-329.
[7]DONG S,WANG P,ABBAS K.A survey on deep learning and its applications[J].Computer Science Review,2021,40:100379.
[8]YE D,LIU Z,SUN M,et al.Mastering complex control in moba games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:6672-6679.
[9]LU J,HAN L,WEI Q,et al.Event-triggered deep reinforcement learning using parallel control:A case study in autonomous dri-ving[J].IEEE Transactions on Intelligent Vehicles,2023,8(4):2821-2831.
[10]SINGH B,KUMAR R,SINGH V P.Reinforcement learning in robotic applications:a comprehensive survey[J].Artificial Intelligence Review,2022,55(2):945-990.
[11]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[12]GUAN W,WANG K.Autonomous collision avoidance of unmanned surface vehicles based on improved A-star and dynamic window approach algorithms[J].IEEE Intelligent Transportation Systems Magazine,2023,15(3):36-50.
[13]BAI X,LI B,XU X,et al.USV path planning algorithm based on plant growth[J].Ocean Engineering,2023,273:113965.
[14]YU J,YANG M,ZHAO Z,et al.Path planning of unmanned surface vessel in an unknown environment based on improved D^* Lite algorithm[J].Ocean Engineering,2022,266:112873.
[15]OUYANG Z,WANG H,HUANG Y,et al.Path planning technologies for USV formation based on improved RRT[J].Chinese Journal of Ship Research,2020,15(3):18-24.
[16]ZHAO Y,MA Y,HU S.USV formation and path-followingcontrol via deep reinforcement learning with random braking[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(12):5468-5478.
[17]WU X,CHEN H,CHEN C,et al.The autonomous navigationand obstacle avoidance for USVs with ANOA deep reinforcement learning method[J].Knowledge-Based Systems,2020,196:105201.
[18]XU X,LU Y,LIU X,et al.Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs[J].Ocean Engineering,2020,217:107704.
[19]GAN W,QU X,SONG D,et al.Multi-usv cooperative chasing strategy based on obstacles assistance and deep reinforcement learning[J].IEEE Transactions on Automation Science and Engineering,2023,21(4)::5895-5910.
[20]WANG W,LUO X,LI Y,et al.Unmanned surface vessel obstacle avoidance with prior knowledge‐based reward shaping[J].Concurrency and Computation:Practice and Experience,2021,33(9):e6110.
[21]RAMACHANDRAN P,ZOPH B,LE Q V.Searching for activation functions[J].arXiv:1710.05941,2017.
[22]PHANICHRAKSAPHONG V,TSAI W H.An Empirical Ge-neration Technique on Background Music Using Gated Recurrent Neural Networks[C]//2023 International Conference on Consumer Electronics-Taiwan.IEEE,2023:691-692.
[23]NG A Y,HARADA D,RUSSELL S.Policy invariance under reward transformations:Theory and application to reward shaping[C]//Proceedings of the Sixteenth International Conference on Machine Learning.1999:278-287.
[24]ALMÓN-MANZANO L,PASTOR-VARGAS R,TRONCOSO J M C.Deep reinforcement learning in agents’ training:Unity ML-agents[C]//International Work-Conference on the Interplay Between Natural and Artificial Computation.Cham:SpringerInternational Publishing,2022:391-400.
[25]LILLICRAP T P.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Autonomous Obstacle Avoidance Method for Unmanned Surface Vehicles Based on ImprovedProximal Policy Optimization

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 5

Metrics

Comments

Recommended 0

[1]	YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232.
[2]	LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[3]	WANG Ziyang, WANG Jia, XIONG Mingliang, WANG Wentao. Intelligent Penetration Path Based on Improved PPO Algorithm [J]. Computer Science, 2024, 51(11A): 231200165-6.
[4]	XING Linquan, XIAO Yingmin, YANG Zhibin, WEI Zhengmin, ZHOU Yong, GAO Saijun. Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning [J]. Computer Science, 2023, 50(8): 271-279.
[5]	SHEN Yi, LIU Quan. Proximal Policy Optimization Based on Self-directed Action Selection [J]. Computer Science, 2021, 48(12): 297-303.