基于安全强化学习的航天器交会制导方法

doi:10.11896/jsjkx.220700210

Computer Science ›› 2023, Vol. 50 ›› Issue (8): 271-279.doi: 10.11896/jsjkx.220700210

• Information Security • Previous Articles Next Articles

Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning

XING Linquan^1,2, XIAO Yingmin^1,2, YANG Zhibin^1,2, WEI Zhengmin^1,2, ZHOU Yong^1,2, GAO Saijun³

1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
2 Key Laboratory of Safety-critical Software,Ministry of Industry and Information Technology,Nanjing 211106,China
3 Shanghai Aerospace Electronic Technology Institute,Shanghai 201109,China

Received:2022-07-24 Revised:2022-11-04 Online:2023-08-15 Published:2023-08-02
About author:XING Linquan,born in 1998,postgra-duate.His main research interests include reinforcement learning and safety-critical software.
YANG Zhibin,born in 1982,Ph.D,professor,postdoctoral researcher.His main research interests include safety-critical system,formal verification and AI software engineering.
Supported by:
National Natural Science Foundation of China(62072233),National Defense Basic Scientific Research Program of China(JCKY2020205C006) and Postgraduate Research & Practice Innovation Program of NUAA(xcxjh20211604).

Abstract

Abstract: With the increasing complexity of spacecraft rendezvous and docking tasks,the requirements for its efficiency,autonomy and reliability are highly demanded.In recent years,the introduction of reinforcement learning technology to solve the problem of spacecraft rendezvous and guidance has become an international frontier hotspot.Obstacle avoidance is critical for safe spacecraft rendezvous,and the general reinforcement learning algorithm does not impose safety restrictions on space exploration,which make the design of spacecraft rendezvous guidance policy challenging.This paper proposes a spacecraft rendezvous guidance method based on safe reinforcement learning.First,a Markov model of autonomous spacecraft rendezvous in collision avoidance scenarios is designed,a reward mechanism based on obstacle warning and collision avoidance restraint is proposed,and thus a safe reinforcement learning framework for solving spacecraft rendezvous guidance strategy is established.Second,with the framework of safe reinforcement learning,guidance policies are generated based on two deep reinforcement learning algorithms,proximal po-licy optimization(PPO) and deep deterministic policy gradient(DDPG).Experimental results show that the method can effectively avoid obstacle and complete the rendezvous with high accuracy.In addition,the performance and generalization ability of the two algorithms are analyzed,which proves the effectiveness of the proposed method.

Key words: Spacecraft rendezvous guidance, Obstacle avoidance, Safe reinforcement learning, Proximal policy optimization, Deep deterministic policy gradient

CLC Number:

TP311

XING Linquan, XIAO Yingmin, YANG Zhibin, WEI Zhengmin, ZHOU Yong, GAO Saijun. Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning[J].Computer Science, 2023, 50(8): 271-279.

References

[1]BOYARKO G,YAKIMENKO O,ROMANO M.Optimal ren-dezvous trajectories of a controlled spacecraft and a tumbling object[J].Journal of Guidance,Control,and dynamics,2011,34(4):1239-1252.
[2]WEISS A,BALDWIN M,ERWIN R S,et al.Model predictive control for spacecraft rendezvous and docking:Strategies for handling constraints and case studies[J].IEEE Transactions on Control Systems Technology,2015,23(4):1638-1647.
[3]XU D D,ZHANG J.A collision-avoidance control algorithm for spacecraft proximity operations based on improved artificial potential function[J].Chinese Journal of Theoretical and Applied Mechanics,2020,52(6):1581-1589.
[4]DUTTA S,MISRA A K.Convex optimization of collision avoi-dance maneuvers in the presence of uncertainty[J].Acta Astronautica,2022,197:257-268.
[5]BROIDA J,LINARES R.Spacecraft rendezvous guidance incluttered environments via reinforcement learning[C]//29th AAS/AIAA Space Flight Mechanics Meeting.American Astronautical Society Ka'anapali,Hawaii,2019:1-15.
[6]DAI S S,LIU Q.Action Constrained Deep Reinforcement Lear-ning Based Safe Automatic Driving Method[J].Computer Science,2021,48(9):235-243.
[7]XIE W C,LI B,DAI Y Y.PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing[J].Computer Science,2022,49(6):3-11.
[8]HONG Z L,LAI J,CAO L,et al.Study on Intelligent Recom-mendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration[J].Computer Science,2022,49(6):149-157.
[9]LI B B,SONG J R,DU Q Y,et al.DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things[J].Computer Science,2021,48(7):47-54.
[10]WANG X,WANG G,CHEN Y,et al.Autonomous Rendezvous Guidance via Deep Reinforcement Learning[C]//2020 Chinese Control and Decision Conference(CCDC).IEEE,2020:1848-1853.
[11]HOVELL K,ULRICH S.Deep reinforcement learning forspacecraft proximity operations guidance[J].Journal of Spacecraft and Rockets,2021,58(2):254-264.
[12]FEDERICI L,BENEDIKTER B,ZAVOLI A.Machine Learning Techniques for Autonomous Spacecraft Guidance during Pro-ximity Operations[C]//AIAA Scitech 2021 Forum.2021.
[13]GARCIA J,FERNANDEZ F.A comprehensive survey on safereinforcement learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480.
[14]YANG Z B,XING L Q,GU Z H,et al.Model-based Reinforcement Learning and Neural Network-based Policy Compression for Spacecraft Rendezvous On Resource-Constrained Embedded Systems[J].IEEE Transactions on Industrial Informatics,2022,19(1):1107-1116.
[15]ZHOU J P.Space rendezvous and docking technology[M].National Defense Industry Press,2013.
[16]SCHAUB H,JUNKINS J L.Analytical mechanics of space systems[M].AIAA,2003.
[17]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximalpolicy optimization algorithms[J].arXiv:1707.06347,2017.
[18]LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509.02971,2015.
[19]SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region po-licy optimization[C]//International Conference on Machine Learning.PMLR,2015:1889-1897.
[20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[21]SCHULMAN J,MORITZ P,LEVINE S,et al.High-dimensionalcontinuous control using generalized advantage estimation[J].arXiv:1506.02438,2015.

Related Articles 14

[1]	HUANG Yuzhou, WANG Lisong, QIN Xiaolin. Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning [J]. Computer Science, 2023, 50(1): 194-204.
[2]	CHEN Bo-chen, TANG Wen-bing, HUANG Hong-yun, DING Zuo-hua. Pop-up Obstacles Avoidance for UAV Formation Based on Improved Artificial Potential Field [J]. Computer Science, 2022, 49(6A): 686-693.
[3]	DU Wan-ru, WANG Xiao-yin, TIAN Tao, ZHANG Yue. Artificial Potential Field Path Planning Algorithm for Unknown Environment and Dynamic Obstacles [J]. Computer Science, 2021, 48(2): 250-256.
[4]	SHEN Yi, LIU Quan. Proximal Policy Optimization Based on Self-directed Action Selection [J]. Computer Science, 2021, 48(12): 297-303.
[5]	ZHANG Jian-hang, LIU Quan. Deep Deterministic Policy Gradient with Episode Experience Replay [J]. Computer Science, 2021, 48(10): 37-43.
[6]	ZHOU Jun and WANG Tian-qi. Single Departure and Arrival Procedure Optimization in Airport Terminal Area Based on Branch and Bound Method [J]. Computer Science, 2020, 47(6A): 552-555.
[7]	WANG Wei-guang, YIN Jian, QIAN Xiang-li, ZHOU Zi-hang. Realtime Multi-obstacle Avoidance Algorithm Based on Dynamic System [J]. Computer Science, 2020, 47(11A): 111-115.
[8]	ZHANG Hao-yu, XIONG Kai. Improved Deep Deterministic Policy Gradient Algorithm and Its Application in Control [J]. Computer Science, 2019, 46(6A): 555-557.
[9]	CHENG Hao-hao, YANG Sen, QI Xiao-hui. Online Obstacle Avoidance and Path Planning of Quadrotor Oriented to Urban Environment [J]. Computer Science, 2019, 46(4): 241-246.
[10]	TAI Ying-peng, XING Ke-xin, LIN Ye-gui and ZHANG Wen-an. Research of Path Planning in Multi-AGV System [J]. Computer Science, 2017, 44(Z11): 84-87.
[11]	XU Fei. Research on Robot Obstacle Avoidance and Path Planning Based on Improved Artificial Potential Field Method [J]. Computer Science, 2016, 43(12): 293-296.
[12]	XU Teng-fei, LUO Qi and WANG Hai. Dynamic Path Planning for Mobile Robot Based on Vector Field [J]. Computer Science, 2015, 42(5): 237-244.
[13]	. Research on Dynamic Obstacle Avoidance and Path [J]. Computer Science, 2012, 39(3): 223-227.
[14]	. Cooperative Obstacle Avoidance Approach in Mobile Wireless Sensor Network:Mobile Obstacle [J]. Computer Science, 2012, 39(2): 95-100.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Spacecraft Rendezvous Guidance Method Based on Safe Reinforcement Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 14

Metrics

Comments

Recommended 0