基于改进近端策略优化算法的智能渗透路径研究

doi:10.11896/jsjkx.231200165

Abstract

Abstract: Penetration path planning is the first step of penetration testing,which is important for the intelligent penetration testing.Existing studies on penetration path planning always model penetration testing as a full observable process,which is difficult to describe the actual penetration testing with partial observability accurately.With the wide application of reinforcement learning in penetration testing,this paper models the penetration testing as a partially observable Markov decision process to simulate the practical penetration testing accurately.In general,the full connection of policy network and evaluation network in PPO cannot extract features effectively in penetration testing with partial observability.This paper proposes an improved PPO algorithm RPPO,which integrating of full connection and long short term memory(LSTM) in the policy network and evaluation network.In addition,a new objective function updating is designed to improve the robustness and convergence.Experimental results show that,the proposed RPPO converges faster than A2C,PPO and NDSPI-DQN algorithms.Especially,the convergence iterations is reduced by 21.21%,28.64% and 22.85% respectively.Meanwhile RPPO gains more cumulative reward about 66.01%,58.61% and 132.64%,which is more suitable for larger-scale network environments with more than fifty hosts.

Key words: Penetration testing, Penetration path planning, Reinforcement learning, Proximal policy optimization, Long and short term memory networks

CLC Number:

TP309

WANG Ziyang, WANG Jia, XIONG Mingliang, WANG Wentao. Intelligent Penetration Path Based on Improved PPO Algorithm[J].Computer Science, 2024, 51(11A): 231200165-6.

References

[1]ARKIN B,STENDER S,MCGRAW G.Software PenetrationTesting[J].IEEE Security & Privacy,2005,3(1):84-87.
[2]SARRAUTE C,RICHARTE G,LUCÁNGELI OBES J.An Algorithm to Find Optimal Attack Paths in Nondeterministic Scenarios[C]//Proc.of the 4th ACM Workshop on Security and Artificial Intelligence.Chicago,US,2011:71-80.
[3]SILVER D,HUANG A,MADDISON C J,et al.Mastering the Game of Go with Deep Neural Networks and Tree Search[J].Nature,2016,529(7587):484-489.
[4]WARRINGTON A,LAVINGTON J W,SCIBIOR A,et al.Ro-bust Asymmetric Learning in Pomdps[C]// Proc.of the 38th International Conference on Machine Learning(PMLR).New York,US,2021:11013-11023.
[5]VAN OTTERLO M,WIERING M.Reinforcement Learning and Markov Decision Processes[M]//Reinforcement learning:State-of-the-art.Berlin,Heidelberg:Springer,2012:3-42.
[6]MCKINNEL D R,DARGAHI T,DEHGHANTANHA A,et al.A Systematic Literature Review and Meta-analysis on Artificial Intelligence in Penetration Testing and Vulnerability Assessment[J].Computers & Electrical Engineering,2019,75:175-188.
[7]MAEDA R,MIMURA M.Automating Post-exploitation withDeep Reinforcement Learning[J].Computers & Security,2021,100:102-108.
[8]LADOSZ P,BEN-IWHIWHU E,DICK J,et al.Deep reinforcement learning with modulated hebbian plus Q-network architecture[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(5):2045-2056.
[9]SCHWARTZ J,KURNIAWATI H.Autonomous PenetrationTesting Using Reinforcement Learning[J].arXiv:1905.05965,2019.
[10]ZENNARO F M,ERDÖDI L.Modelling Penetration Testingwith Reinforcement Learning Using Capture-the-flag Challenges:Trade-offs between Model-free Learning and A Priori knowledge[J].IET Information Security,2023,17(3):441-457.
[11]ZHOU S,LIU J,HOU D,et al.Autonomous penetration testing based on improved deep q-network[J].Applied Sciences,2021,11(19):8823.
[12]ZHANG G M,ZHANG S Y,ZHANG J W.Attack Path Disco-very and Optimization Method Based on PPO Algorithm[J].Information Network Security,2023,23(9):47-57.
[13]CHEN J,HU S,ZHENG H,et al.GAIL-PT:An IntelligentPenetration Testing Framework with Generative Adversarial Imitation Learning[J].Computers & Security,2023,126:103055.
[14]ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:A New Method For Automated Penetration Testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288.
[15]CODY T.A Layered Reference Model for Penetration Testing with Reinforcement Learning and Attack Graphs[C]//Proc.of 2022 IEEE 29th Annual Software Technology Conference(STC).Gaithersburg,MD,USA,IEEE,2022:41-50.
[16]NGUYEN H V,TEERAKANOK S,INOMATA A,et al.The Proposal of Double Agent Architecture using Actor-critic Algorithm for Penetration Testing[C]//ICISSP.2021:440-449.
[17]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs Make Better Hackers:Accounting for Uncertainty in Penetration Testing[C]//Proc.of the 26th AAAI Conference on Artificial Intelligence.Toronto,Ontario,Canada,2012:1816-1824.
[18]ZHANG Y,LIU J,ZHOU S,et al.Improved Deep Recurrent Q-Network of POMDPs for Automated Penetration Testing[J].Applied Sciences,2022,12(20):10339.
[19]GHANEM M C,CHEN T M.Reinforcement Learning for Efficient Network Penetration Testing[J].Information,2019,11(1):6.
[20]GHANEM M C,CHEN T M,NEPOMUCENO E G.Hierarchical Reinforcement Learning for Efficient and Effective Automated Penetration Testing of Large Networks[J].Journal of Intelligent Information Systems,2023,60(2):281-303.
[21]KORONIOTIS N,MOUSTAFA N,TURNBULL B,et al.Adeep learning-based penetration testing framework for vulnerability identification in internet of things environments[C]//2021 IEEE 20th International Conference on Trust,Security and Privacy in Computing and Communications(TrustCom).IEEE,2021:887-894.
[22]SCHWARTZ J,KURNIAWATI H,EL-MAHASSNI E.Pomdp+information-decay:Incorporating defender's behaviour in autonomous penetration testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020,30:235-243.
[23]ZHOU S,LIU J,HOU D,et al.Autonomous Penetration Tes-ting Based on Improved Deep Q-network[J].Applied Sciences,2021,11(19):8823.

Related Articles 15

[1]	YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232.
[2]	WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[3]	ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[4]	LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271.
[5]	WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[6]	WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[7]	HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[8]	GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[9]	ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[10]	LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[11]	XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
[12]	YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[13]	ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352.
[14]	SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290.
[15]	WANG Yao, LUO Junren, ZHOU Yanzhong, GU Xueqiang, ZHANG Wanpeng. Review of Reinforcement Learning and Evolutionary Computation Methods for StrategyExploration [J]. Computer Science, 2024, 51(3): 183-197.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Intelligent Penetration Path Based on Improved PPO Algorithm

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0