计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 269-281.doi: 10.11896/jsjkx.221000131

• 人工智能 • 上一篇    下一篇

改进双延迟深度确定性策略梯度的多船协调避碰决策

黄仁贤1,2,3, 罗亮1,2, 杨萌4, 刘维勤1   

  1. 1 武汉理工大学船海与能源动力工程学院 武汉 430064
    2 高性能船舶技术教育部重点实验室(武汉理工大学) 武汉 430064
    3 武汉理工大学三亚科教创新园 海南 三亚 572019
    4 中国舰船研究设计中心 武汉 430060
  • 收稿日期:2022-10-17 修回日期:2023-03-14 出版日期:2023-11-15 发布日期:2023-11-06
  • 通讯作者: 罗亮(luoliang610@163.com)
  • 作者简介:(hrx751770645@163.com)
  • 基金资助:
    国防基础科学研究计划(JCKY2020206B037)

Multi-ship Coordinated Collision Avoidance Decision Based on Improved Twin Delayed Deep Deterministic Policy Gradient

HUANG Renxian1,2,3, LUO Liang1,2, YANG Meng4, LIU Weiqin1   

  1. 1 School of Naval Architechure,Ocean and Energy Power Engineering,Wuhan University of Technology,Wuhan 430064,China
    2 Key Laboratory of High Performance Ship Technology(Wuhan University of Technology),Ministry of Education,Wuhan 430064,China
    3 Sanya Science and Education Innovation Park of Wuhan University of Technology,Sanya,Hainan 572019,China
    4 China Ship Development and Design Center,Wuhan 430060,China
  • Received:2022-10-17 Revised:2023-03-14 Online:2023-11-15 Published:2023-11-06
  • About author:HUANG Renxian,born in 1998,postgraduate.His main research interests include artificial intelligence and data processing.LUO Liang,born in 1980,Ph.D,asso-ciate professor,Ph.D supervisor.His main research interests include system simulation integration and ship-related digital technology and high-performance computing.
  • Supported by:
    National Defense Basic Scientific Research Program of China(JCKY2020206B037).

摘要: 目前,多数海上避碰模型都是将船舶作为单智能体进行避碰决策,未考虑船舶间的协调避让,在多船会遇场景下仅靠单船进行避碰操作会导致避让效果不佳。为此,提出了一种改进双延迟深度确定性策略梯度算法(TD3)的Softmax深层双确定性策略梯度(SD3)多船协调避碰模型。从考虑船舶航行安全的时空因素出发构建时间碰撞模型、空间碰撞模型,对船舶碰撞风险进行定量分析,在此基础上采用根据会遇态势和船速矢量动态变化的船域模型对船舶碰撞风险进行定性分析。综合船舶目标导向、航向角改变、航向保持、碰撞风险和《国际海上避碰规则》(COLREGS)的约束设计奖励函数,结合COLREGS中的典型相遇情况构造对遇、追越和交叉相遇多局面共存的会遇场景进行避碰模拟仿真。消融实验显示softmax运算符提升了SD3算法的性能,使其在船舶协调避碰中拥有更好的决策效果,并与其他强化学习算法进行学习效率和学习效果的比较。实验结果表明,SD3算法在多局面共存的复杂场景下能高效做出准确的避碰决策,并且性能优于其他强化学习算法。

关键词: 多船会遇, 协调避碰, 智能决策, 双延迟深度确定性策略梯度(TD3), Softmax深层双确定性策略梯度(SD3), 强化学习

Abstract: At present,most models of collision avoidance algorithms take ships as single agent to make collision avoidance decisions,without considering the coordinated avoidance between ships.In the scenario of multi-ship meeting,it will lead to poor avoidance effect by relying on single ships.Therefore,this paper proposes a softmax deep double deterministic policy gradients(SD3) multi-ship cooperative collision avoidance model based on improved twin delayed deep deterministic policy gradient(TD3).The time collision model and space collision model are constructed to quantitatively analyze the ship collision risk based on the time and space factors of ship navigation safety.On this basis,the ship domain model based on the situation of collision and the dynamic change of ship speed vector is used to qualitatively analyze the ship collision risk.The reward function is designed using the constraints of ship objective guidance,course angle change,course keeping,collision risk and international regulations for preventing collisions at sea(COLREGs),combined with the typical encounter situation in COLREGS,the collision avoidance simulation is carried out for the encounter scene with multi-situation coexistence of encounter,head-on,chase and cross encounter.Ablation experiment shows that the softmax operator improves the performance of SD3 algorithm,making it have better decision-ma-king effect in ship coordinated collision avoidance and compared with other reinforcement learning algorithms for learning efficiency and learning effect.Experimental results show that the SD3 algorithm can effectively make accurate collision avoidance decisions and outperform other reinforcement learning algorithms in performance in complex multi-situation encounter scenarios.

Key words: Vessel encounter, Coordinated collision avoidance, Intelligent decision-making, Twin delayed deep deterministic policy gradient(TD3), Softmax deep double deterministic policy gradients(SD3), Reinforcement learning

中图分类号: 

  • TP391.9
[1]SONG Y.Research on Ship Path Planning Algorithm [D].Wuhan:Wuhan University of Technology,2018.
[2]ZHAO Y X,LI W,SHI P.A real-time collision avoidance lear-ning system for Unmanned Surface Vessels[J].Neurocomputing,2016,182:255-266.
[3]LAZAROWSKA A.A new deterministic approach in a decision support system for ship's trajectory planning[J].Expert Systems with Applications,2017,71:469-478.
[4]LISOWSKI J,MOHAMED-SEGHIR M.Comparison of Computational Intelligence Methods Based on Fuzzy Sets and Game Theory in the Synthesis of Safe Ship Control Based on Information from a Radar ARPA System[J].Remote Sensing,2019,11(1):82.
[5]LI S J,LIU J L,NEGENBORN R R.Distributed coordination for collision avoidance of multiple ships considering ship maneuverability[J].Ocean Engineering,2019,181:212-226.
[6]ZHANG J F,ZHANG D,YAN,X P,et al.A distributed anti-collision decision support formulation in multi-ship encounter situations under COLREGs[J].Ocean Engineering,2015,105:336-348.
[7]OUYANG Z L,WANG H D,WANG J Y,et al.Automatic collision avoidance algorithm for unmanned surface craft based on improved Bi-RRT [J].China Ship Research,2019,14(6):8-14.
[8]WANG C B,ZHANG X Y,ZHANG J W,et al.Intelligent Collision avoidance Decision method for Unmanned Ships in Unknown Environment [J].China Ship Research,2018,13(6):72-77.
[9]SHEN H Q,HASHIMOTO H,MATSUDA A,et al.Automatic collision avoidance of multiple ships based on deep Q-learning[J].Applied Ocean Research.,2019Vol.86:268-288.
[10]CHENG Y,ZHANG W D.Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels[J].Neurocomputing,2018,272:63-73.
[11]ZHOU Y,YUAN C P,XIE H C,et al.Collision avoidancepath planning of tourist ship based on DDPG algorithm[J].Chinese Journal of Ship Research,2021,16(6):19-26,60.
[12]XIE S,CHU X M,ZHENG M,et al.A composite learning me-thod for multi-ship collision avoidance based on reinforcement learning and inverse control[J].Neurocomputing,2020(411):375-392.
[13]LIU Z,ZHOU Z Z,ZHANG M Y,et al.A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships [J].Traffic Information and Safety,2022,40(3):60-74.
[14]XU Z.Research and Application of Ship Collision Avoidance Decision Simulation Platform [D].Dalian:Dalian Maritime University,2015.
[15]REN P.Research on Collision Avoidance Decision Based on Ship Collision Risk [D].Dalian:Dalian Maritime University,2015.
[16]TAM C,BUCKNALL R.Collision risk assessment for ships[J].Journal of Marine Science and Technology,2010,15(3):257-270.
[17]LING P,CAI Q P,HUANG L B.Softmax Deep Double Deterministic Policy Gradients[J].arXiv:2010.09177,2020.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!