计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 271-279.doi: 10.11896/jsjkx.220700210
幸林泉1,2, 肖应民1,2, 杨志斌1,2, 韦正旻1,2, 周勇1,2, 高赛军3
XING Linquan1,2, XIAO Yingmin1,2, YANG Zhibin1,2, WEI Zhengmin1,2, ZHOU Yong1,2, GAO Saijun3
摘要: 随着航天器交会对接任务越来越复杂,对其高效性、自主性和安全性的要求急剧增加。近年来,引入强化学习技术来解决航天器交会制导问题已经成为国际前沿热点。障碍物避撞对于确保航天器安全交会对接至关重要,而一般的强化学习算法没有对探索空间进行安全限制,这使得航天器交会制导策略设计面临挑战。为此,提出了基于安全强化学习的航天器交会制导方法。首先,设计避撞场景下航天器自主交会的马尔可夫模型,提出基于障碍预警与避撞约束的奖励机制,从而建立用于求解航天器交会制导策略的安全强化学习框架;其次,在该安全强化学习框架下,基于近端策略优化算法(PPO)和深度确定性策略梯度算法(DDPG)这两种深度强化学习算法生成了制导策略。实验结果表明,该方法能有效地进行障碍物避撞并以较高的精度完成交会。另外,通过分析两种算法的性能优劣和泛化能力,进一步证明了所提方法的有效性。
中图分类号:
[1]BOYARKO G,YAKIMENKO O,ROMANO M.Optimal ren-dezvous trajectories of a controlled spacecraft and a tumbling object[J].Journal of Guidance,Control,and dynamics,2011,34(4):1239-1252. [2]WEISS A,BALDWIN M,ERWIN R S,et al.Model predictive control for spacecraft rendezvous and docking:Strategies for handling constraints and case studies[J].IEEE Transactions on Control Systems Technology,2015,23(4):1638-1647. [3]XU D D,ZHANG J.A collision-avoidance control algorithm for spacecraft proximity operations based on improved artificial potential function[J].Chinese Journal of Theoretical and Applied Mechanics,2020,52(6):1581-1589. [4]DUTTA S,MISRA A K.Convex optimization of collision avoi-dance maneuvers in the presence of uncertainty[J].Acta Astronautica,2022,197:257-268. [5]BROIDA J,LINARES R.Spacecraft rendezvous guidance incluttered environments via reinforcement learning[C]//29th AAS/AIAA Space Flight Mechanics Meeting.American Astronautical Society Ka'anapali,Hawaii,2019:1-15. [6]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243. [7]XIE W C,LI B,DAI Y Y.PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing[J].Computer Science,2022,49(6):3-11. [8]HONG Z L,LAI J,CAO L,et al.Study on Intelligent Recom-mendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration[J].Computer Science,2022,49(6):149-157. [9]LI B B,SONG J R,DU Q Y,et al.DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things[J].Computer Science,2021,48(7):47-54. [10]WANG X,WANG G,CHEN Y,et al.Autonomous Rendezvous Guidance via Deep Reinforcement Learning[C]//2020 Chinese Control and Decision Conference(CCDC).IEEE,2020:1848-1853. [11]HOVELL K,ULRICH S.Deep reinforcement learning forspacecraft proximity operations guidance[J].Journal of Spacecraft and Rockets,2021,58(2):254-264. [12]FEDERICI L,BENEDIKTER B,ZAVOLI A.Machine Learning Techniques for Autonomous Spacecraft Guidance during Pro-ximity Operations[C]//AIAA Scitech 2021 Forum.2021. [13]GARCIA J,FERNANDEZ F.A comprehensive survey on safereinforcement learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480. [14]YANG Z B,XING L Q,GU Z H,et al.Model-based Reinforcement Learning and Neural Network-based Policy Compression for Spacecraft Rendezvous On Resource-Constrained Embedded Systems[J].IEEE Transactions on Industrial Informatics,2022,19(1):1107-1116. [15]ZHOU J P.Space rendezvous and docking technology[M].National Defense Industry Press,2013. [16]SCHAUB H,JUNKINS J L.Analytical mechanics of space systems[M].AIAA,2003. [17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017. [18]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015. [19]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897. [20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [21]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensionalcontinuous control using generalized advantage estimation[J].arXiv:1506.02438,2015. |
|