计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 271-279.doi: 10.11896/jsjkx.220700210

• 信息安全 • 上一篇    下一篇

基于安全强化学习的航天器交会制导方法

幸林泉1,2, 肖应民1,2, 杨志斌1,2, 韦正旻1,2, 周勇1,2, 高赛军3   

  1. 1 南京航空航天大学计算机科学与技术学院 南京 211106
    2 高安全系统的软件开发与验证技术工信部重点实验室 南京 211106
    3 上海航天电子技术研究所 上海 201109
  • 收稿日期:2022-07-24 修回日期:2022-11-04 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 杨志斌(yangzhibin168@163.com)
  • 作者简介:(xinglq@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(62072233);国防基础科学研究计划(JCKY2020205C006);南京航空航天大学科研与实践创新计划(xcxjh20211604)

Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning

XING Linquan1,2, XIAO Yingmin1,2, YANG Zhibin1,2, WEI Zhengmin1,2, ZHOU Yong1,2, GAO Saijun3   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Key Laboratory of Safety-critical Software,Ministry of Industry and Information Technology,Nanjing 211106,China
    3 Shanghai Aerospace Electronic Technology Institute,Shanghai 201109,China
  • Received:2022-07-24 Revised:2022-11-04 Online:2023-08-15 Published:2023-08-02
  • About author:XING Linquan,born in 1998,postgra-duate.His main research interests include reinforcement learning and safety-critical software.
    YANG Zhibin,born in 1982,Ph.D,professor,postdoctoral researcher.His main research interests include safety-critical system,formal verification and AI software engineering.
  • Supported by:
    National Natural Science Foundation of China(62072233),National Defense Basic Scientific Research Program of China(JCKY2020205C006) and Postgraduate Research & Practice Innovation Program of NUAA(xcxjh20211604).

摘要: 随着航天器交会对接任务越来越复杂,对其高效性、自主性和安全性的要求急剧增加。近年来,引入强化学习技术来解决航天器交会制导问题已经成为国际前沿热点。障碍物避撞对于确保航天器安全交会对接至关重要,而一般的强化学习算法没有对探索空间进行安全限制,这使得航天器交会制导策略设计面临挑战。为此,提出了基于安全强化学习的航天器交会制导方法。首先,设计避撞场景下航天器自主交会的马尔可夫模型,提出基于障碍预警与避撞约束的奖励机制,从而建立用于求解航天器交会制导策略的安全强化学习框架;其次,在该安全强化学习框架下,基于近端策略优化算法(PPO)和深度确定性策略梯度算法(DDPG)这两种深度强化学习算法生成了制导策略。实验结果表明,该方法能有效地进行障碍物避撞并以较高的精度完成交会。另外,通过分析两种算法的性能优劣和泛化能力,进一步证明了所提方法的有效性。

关键词: 航天器交会制导, 障碍物避撞, 安全强化学习, 近端策略优化, 深度确定性策略梯度

Abstract: With the increasing complexity of spacecraft rendezvous and docking tasks,the requirements for its efficiency,autonomy and reliability are highly demanded.In recent years,the introduction of reinforcement learning technology to solve the problem of spacecraft rendezvous and guidance has become an international frontier hotspot.Obstacle avoidance is critical for safe spacecraft rendezvous,and the general reinforcement learning algorithm does not impose safety restrictions on space exploration,which make the design of spacecraft rendezvous guidance policy challenging.This paper proposes a spacecraft rendezvous guidance method based on safe reinforcement learning.First,a Markov model of autonomous spacecraft rendezvous in collision avoidance scenarios is designed,a reward mechanism based on obstacle warning and collision avoidance restraint is proposed,and thus a safe reinforcement learning framework for solving spacecraft rendezvous guidance strategy is established.Second,with the framework of safe reinforcement learning,guidance policies are generated based on two deep reinforcement learning algorithms,proximal po-licy optimization(PPO) and deep deterministic policy gradient(DDPG).Experimental results show that the method can effectively avoid obstacle and complete the rendezvous with high accuracy.In addition,the performance and generalization ability of the two algorithms are analyzed,which proves the effectiveness of the proposed method.

Key words: Spacecraft rendezvous guidance, Obstacle avoidance, Safe reinforcement learning, Proximal policy optimization, Deep deterministic policy gradient

中图分类号: 

  • TP311
[1]BOYARKO G,YAKIMENKO O,ROMANO M.Optimal ren-dezvous trajectories of a controlled spacecraft and a tumbling object[J].Journal of Guidance,Control,and dynamics,2011,34(4):1239-1252.
[2]WEISS A,BALDWIN M,ERWIN R S,et al.Model predictive control for spacecraft rendezvous and docking:Strategies for handling constraints and case studies[J].IEEE Transactions on Control Systems Technology,2015,23(4):1638-1647.
[3]XU D D,ZHANG J.A collision-avoidance control algorithm for spacecraft proximity operations based on improved artificial potential function[J].Chinese Journal of Theoretical and Applied Mechanics,2020,52(6):1581-1589.
[4]DUTTA S,MISRA A K.Convex optimization of collision avoi-dance maneuvers in the presence of uncertainty[J].Acta Astronautica,2022,197:257-268.
[5]BROIDA J,LINARES R.Spacecraft rendezvous guidance incluttered environments via reinforcement learning[C]//29th AAS/AIAA Space Flight Mechanics Meeting.American Astronautical Society Ka'anapali,Hawaii,2019:1-15.
[6]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243.
[7]XIE W C,LI B,DAI Y Y.PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing[J].Computer Science,2022,49(6):3-11.
[8]HONG Z L,LAI J,CAO L,et al.Study on Intelligent Recom-mendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration[J].Computer Science,2022,49(6):149-157.
[9]LI B B,SONG J R,DU Q Y,et al.DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things[J].Computer Science,2021,48(7):47-54.
[10]WANG X,WANG G,CHEN Y,et al.Autonomous Rendezvous Guidance via Deep Reinforcement Learning[C]//2020 Chinese Control and Decision Conference(CCDC).IEEE,2020:1848-1853.
[11]HOVELL K,ULRICH S.Deep reinforcement learning forspacecraft proximity operations guidance[J].Journal of Spacecraft and Rockets,2021,58(2):254-264.
[12]FEDERICI L,BENEDIKTER B,ZAVOLI A.Machine Learning Techniques for Autonomous Spacecraft Guidance during Pro-ximity Operations[C]//AIAA Scitech 2021 Forum.2021.
[13]GARCIA J,FERNANDEZ F.A comprehensive survey on safereinforcement learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480.
[14]YANG Z B,XING L Q,GU Z H,et al.Model-based Reinforcement Learning and Neural Network-based Policy Compression for Spacecraft Rendezvous On Resource-Constrained Embedded Systems[J].IEEE Transactions on Industrial Informatics,2022,19(1):1107-1116.
[15]ZHOU J P.Space rendezvous and docking technology[M].National Defense Industry Press,2013.
[16]SCHAUB H,JUNKINS J L.Analytical mechanics of space systems[M].AIAA,2003.
[17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[18]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015.
[19]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[21]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensionalcontinuous control using generalized advantage estimation[J].arXiv:1506.02438,2015.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!