Computer Science ›› 2024, Vol. 51 ›› Issue (11): 329-339.doi: 10.11896/jsjkx.231000207

• Information Security • Previous Articles     Next Articles

Intelligent Penetration Path Planning and Solution Optimization Based on Reinforcement Learning

LI Cheng’en1, ZHU Dongjun1, HE Jieyan1, HAN Lansheng1,2   

  1. 1 School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430000,China
    2 Wuhan Jinyinhu Laboratory,Wuhan 430000,China
  • Received:2023-10-30 Revised:2024-04-16 Online:2024-11-15 Published:2024-11-06
  • About author:LI Cheng’en,born in 2001,postgra-duate.His main research interest is cyberspace security.
    HAN Lansheng,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include network security protection,malicious code analysis and big data security.
  • Supported by:
    National Key Research and Development Program of China(2022YFB3103402) and National Natural Science Foundation of China(62072200,62172176,62127808).

Abstract: In the background of the widespread application of big data technology,the problems that traditional penetration testing overly relies on expert experience and manual operation have become more significant.Automated penetration testing aims to solve the above problems,so as to discover system security vulnerabilities more accurately and comprehensively.Finding the optimal penetration path is the most important task in automated penetration testing.However,current mainstream research suffers from the following problems:1)seeking the optimal path in the original solution space,which contains numberous redundant paths,significantly increases the complexity of problem-solving;2)evaluation of vulnerability exploitation and positive reward obtainment actions is not enough.The problem-solving can be optimized by eliminating a significant number of redundant penetration paths and employing exploit sample enhancement and positive reward sample enhancement methods.Therefore,this paper proposes the MASK-SALT-DQN algorithm by integrating solution space transformation and sample enhancement methods.It qualitatively and quantitatively analyzes the influence of the proposed algorithm on the model solving process,proposing the compression ratio to measure the benefits of solution space transformation.Experiments indicate that the proportion of redundant solution paths in the original solution space consistently remains over 83%,proving the necessity of solution space transformation.In addition,in standard experiment scenario,the theoretical compression ratio is 57.2,and the error between the experimental compression ratio and theoretical value is only 1.40%.Moreover,in comparison to baseline methods,MASK-SALT-DQN has the optimal performance in all experiment scenarios,which confirms its the effectiveness and superiority.

Key words: Penetration path planning, Reinforcement learning, Solution space transformation, Sample enhancement, Compression ratio

CLC Number: 

  • TP393
[1] CUI Y,ZHANG L J,WU H.Automatic Generation Method for Penetration Test Programs Based on attack graph[J].Journal of Computer Applications,2010,30(8):2146-2150.
[2] ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316.
[3] SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs makebetter hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012,26(1):1816-1824.
[4] SCHNEIER B.Attack trees[J].Dr.Dobb’s Journal,1999,24(12):21-29.
[5] PHILLIPS C,SWILER L P.A graph-based system for network-vulnerability analysis[C]//Proceedings of the 1998 Workshop on New Security Paradigms.1998:71-79.
[6] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018.
[7] WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8:279-292.
[8] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]//Neural Information Processing Systems Deep Learning Workshops.NIPS,2013.
[9] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[10] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937.
[11] HUANG S,ONTAÑÓN S.A closer look at invalid action mas-king in policy gradient algorithms[C]//Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference.FLAIRS,2022.
[12] YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191.
[13] JONATHON S,HANNA K.NASim:Network Attack Simulator[EB/OL].https://networkattacksimulator.readthedocs.io/.
[14] SCHWARTZ J,KURNIAWATI H,El-MAHASSNI E.Pomdp+information-decay:Incorporating defender′s behaviour in autonomous penetration testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243.
[15] SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[C]//Working Notes for the 2011 IJCAI Workshop on Intelligent Security(SecArt).2011.
[16] SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017.
[17] ZENNARO F M,ERDO″DI L.Modelling penetration testing with reinforcement learning using capture-the-flag challenges:Trade-offs between model-free learning and a priori knowledge[J].IET Information Security,2023,17(3):441-457.
[18] YOUSEFI M,MTETWA N,ZHANG Y,et al.A reinforcement learning approach for attack graph analysis[C]//2018 17th IEEE International Conference On Trust,Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering(TrustCom/BigDataSE).IEEE,2018:212-217.
[19] OU X,GOVINDAVAJHALA S,APPEL A W.MulVAL:A lo-gic-based network security analyzer[C]//USENIX Security Symposium.2005,8:113-128.
[20] HU Z,BEURAN R,TAN Y.Automated penetration testingusing deep reinforcement learning[C]//2020 IEEE European Symposium on Security and Privacy Workshops(EuroS&PW).IEEE,2020:2-10.
[21] ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288.
[22] ZHOU S,LIU J,HOU D,et al.Autonomous penetration testing based on improved deep q-network[J].Applied Sciences,2021,11(19):8823.
[23] NGUYEN H V,NGUYEN H N,UEHARA T.Multiple levelaction embedding for penetration testing[C]//Proceedings of the 4th International Conference on Future Networks and Distributed Systems.2020:1-9.
[24] SULTANA M,TAYLOR A,LI L.Autonomous network cyber offence strategy through deep reinforcement learning[C]//Artificial Intelligence and Machine Learning for Multi-Domain Ope-rations Applications III.SPIE,2021:490-502.
[25] LI Q,ZHANG M,SHEN Y,et al.A Hierarchical Deep Reinforcement Learning Model with Expert Prior Knowledge for Intelligent Penetration Testing[J].Computers & Security,2023,132:103358.
[26] BACKES M,HOFFMANN J,KÜNNEMANN R,et al.To-wards automated network mitigation analysis[C]//Proceedings of the 34th ACM/SIG APP Symposium on Applied Computing.2019:1971-1978.
[27] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]//International Conference on Learning Representations.ICLR,2016.
[1] YAN Xin, HUANG Zhiqiu, SHI Fan, XU Heng. Study on Following Car Model with Different Driving Styles Based on Proximal PolicyOptimization Algorithm [J]. Computer Science, 2024, 51(9): 223-232.
[2] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[3] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[4] LI Jingwen, YE Qi, RUAN Tong, LIN Yupian, XUE Wandong. Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning [J]. Computer Science, 2024, 51(8): 263-271.
[5] WANG Xianwei, FENG Xiang, YU Huiqun. Multi-agent Cooperative Algorithm for Obstacle Clearance Based on Deep Deterministic PolicyGradient and Attention Critic [J]. Computer Science, 2024, 51(7): 319-326.
[6] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[7] ZHONG Yuang, YUAN Weiwei, GUAN Donghai. Weighted Double Q-Learning Algorithm Based on Softmax [J]. Computer Science, 2024, 51(6A): 230600235-5.
[8] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[9] WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[10] HUANG Feihu, LI Peidong, PENG Jian, DONG Shilei, ZHAO Honglei, SONG Weiping, LI Qiang. Multi-agent Based Bidding Strategy Model Considering Wind Power [J]. Computer Science, 2024, 51(6A): 230600179-8.
[11] YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[12] XIN Yuanxia, HUA Daoyang, ZHANG Li. Multi-agent Reinforcement Learning Algorithm Based on AI Planning [J]. Computer Science, 2024, 51(5): 179-192.
[13] ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352.
[14] SHI Dianxi, HU Haomeng, SONG Linna, YANG Huanhuan, OUYANG Qianying, TAN Jiefu , CHEN Ying. Multi-agent Reinforcement Learning Method Based on Observation Reconstruction [J]. Computer Science, 2024, 51(4): 280-290.
[15] WANG Yao, LUO Junren, ZHOU Yanzhong, GU Xueqiang, ZHANG Wanpeng. Review of Reinforcement Learning and Evolutionary Computation Methods for StrategyExploration [J]. Computer Science, 2024, 51(3): 183-197.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!