Computer Science ›› 2023, Vol. 50 ›› Issue (7): 308-316.doi: 10.11896/jsjkx.220500101

• Information Security • Previous Articles     Next Articles

Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning

ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua   

  1. College of Command and Control Engineering,Army Engineering University,Nanjing 210007,China
  • Received:2022-05-12 Revised:2022-08-18 Online:2023-07-15 Published:2023-07-05
  • About author:ZENG Qingwei,born in 1995,postgra-duate.His main research interest is cyberspace security.ZHANG Guomin,born in 1979,Ph.D,professor,master supervisor.His main research interests include software-defined networking,network security,network measurement,and distributed systems.   
  • Supported by:
    National Natural Science Foundation of China(62172432).

Abstract: Intelligent attack path discovery is a key technology for automated penetration testing,but existing methods face the problems of exponential growth of state and action space and sparse rewards,which make the algorithm difficult to converge.To this end,an intelligent attack path discovery method(iPathD) based on hierarchical reinforcement learning is proposed.iPathD constructs the attack path discovery process as a layered Markov decision process to describe the upper-layer inter-host penetration path discovery and the lower-layer single-host internal attack path discovery,respectively.On this basis,an attack path discovery algorithm based on hierarchical reinforcement learning is proposed and implemented.Experimental results show that compared with the traditional method based on deep Q learning(DQN) and its improved algorithm,the iPathD path discovery method is faster and more effective.With the increase of the number of vulnerabilities in the host,the effect of iPathD is better,and it is suitable for large-scale network scenarios.

Key words: Penetration testing, Markov decision process, Hierarchical reinforcement learning, Attack path discovery, DQN algorithm

CLC Number: 

  • TP393
[1]ARCE I,MCGRAW G.Guest editors’ introduction:Why atta-cking systems is a good idea[J].IEEE Security & Privacy,2004,2(4):17-19.
[2]ARKIN B,STENDER S,MCGRAW G.Software penetrationtesting[J].IEEE Security & Privacy,2005,3(1):84-87.
[3]SUTTON R S,BARTO A G.Reinforcement learling:An introduction[M].MIT press,2018.
[4]SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[J].arXiv:1306.4714,2013.
[5]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017.
[6]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.2012.
[7]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020.
[8]ZHOU T Y,ZANG Y C,ZHU J H,et al.NIG-AP:a new me-thod for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering.2019,20(9):1277-1288.
[9]HU Z,BEURAN R,TAN Y.Automated Penetration TestingUsing Deep Reinforcement Learning[C]//IEEE European Symposium on Security and Privacy Workshops.2020.
[10]ZHOU S,LIU J,HU D.,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Appl.Sci.2021,11,8823.
[11]SCHWARTZ J,KURNIAWATTI H.NASim:Network Attack Simulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019.
[12]SEIFERT C,BSTSER M,BLUM W,et al.CyberBattleSim[Z/OL].https://github.com/microsoft/ cyberbattlesim,2021.
[13]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[14]BARTO A G,MAHADEVAN S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems,2003,13(1/2):341-379.
[15]DAYAN P,HINTON G.Feudal Reinforcement Learning[C]//Proceedings of Advances in Neural Information Processing Systems.San Francisco:Morgan Kaufmann,1993:271-278.
[16]SINGH S.Transfer of Learning by Composing Solutions of Elemental Sequential Tasks[J].Machine Learning,1992,8:323-339.
[17]TAKAHASHI Y,ASADA M.Multi-controller Fusion in Multi-layered Reinforcement Learning[C]//International Conference on Multisensor Fusion and Integration for Intelligent Systems(MFI2001).Baden Baden,Germany,2001:7-12.
[18]CHEN T,LU J.Towards analysis of semi-Markov decisionprocesses[C]//Artificial Intelligence and Computational Intelligence(AICI 2010).Berlin,Heidelberg:Springer,2010:41-48.
[19]MAHADEVAN S,MARCHALLECK N,DAS T,et al.Slef-improving Factory Simulation Using Continuous-time Average-reward Reinforcement Learning[C]//Proceedings of the 14th Internatioanl Conference on Machine Learning.Nashville,Tennessee,USA,1997:202-210.
[20]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].arXiv:1705.05088.
[21]CHOWDHARY A,HUANG D,MAHENDRAN J S,et al.Autonomous security analysis and penetration testing[C]//2020 16th International Conference on Mobility,Sensing and Networking(MSN).2020:508-515.
[1] GAO Wen-long, ZHOU Tian-yang, ZHU Jun-hu, ZHAO Zi-heng. Network Attack Path Discovery Method Based on Bidirectional Ant Colony Algorithm [J]. Computer Science, 2022, 49(6A): 516-522.
[2] ZHOU Qin, LUO Fei, DING Wei-chao, GU Chun-hua, ZHENG Shuai. Double Speedy Q-Learning Based on Successive Over Relaxation [J]. Computer Science, 2022, 49(3): 239-245.
[3] QIAN Jing, WU Ke-yu, CHEN Chao, HU Xing-chen. Optimal Order Acceptance Decision Based on After-state Reinforcement Learning [J]. Computer Science, 2022, 49(11A): 210800261-9.
[4] ZHANG Fan, GONG Ao-yu, DENG Lei, LIU Fang, LIN Yan, ZHANG Yi-jin. Wireless Downlink Scheduling with Deadline Constraint for Realistic Channel Observation Environment [J]. Computer Science, 2021, 48(9): 264-270.
[5] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[6] FANG Ting, GONG Ao-yu, ZHANG Fan, LIN Yan, JIA Lin-qiong, ZHANG Yi-jin. Dynamic Broadcasting Strategy in Cognitive Radio Networks Under Delivery Deadline [J]. Computer Science, 2021, 48(7): 340-346.
[7] ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju. Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning [J]. Computer Science, 2021, 48(7): 40-46.
[8] YU Li, DU Qi-han, YUE Bo-yan, XIANG Jun-yao, XU Guan-yu, LENG You-fang. Survey of Reinforcement Learning Based Recommender Systems [J]. Computer Science, 2021, 48(10): 1-18.
[9] WANG Zheng-ning, ZHOU Yang, LV Xia, ZENG Fan-wei, ZHANG Xiang, ZHANG Feng-jun. Improved MDP Tracking Method by Combining 2D and 3D Information [J]. Computer Science, 2019, 46(3): 97-102.
[10] CHAI Ye-sheng, ZHU Xue-yang, YAN Rong-jie and ZHANG Guang-quan. MARTE Models Based System Reliability Prediction [J]. Computer Science, 2015, 42(12): 82-86.
[11] HUANG Zhen-jin,LU Yang,YANG Juan and FANG Huan. Property Patterns of Markov Decision Process Nondeterministic Choice Scheduler [J]. Computer Science, 2013, 40(4): 263-266.
[12] . Penetration Test Techniques Shallow [J]. Computer Science, 2012, 39(Z6): 86-88.
[13] NIU Jun,ZENG Guo-sun, LU Xin-rong,XU Chang. Stochastic Model Checking Continuous Time Markov Process [J]. Computer Science, 2011, 38(9): 112-115.
[14] WANG Guan-jun,WANG Mao-li,ZHAO Ying. Research on Novel Test Vector Ordering Approach Based on Markov Decision Processes [J]. Computer Science, 2010, 37(5): 287-290.
[15] WANG Zhen-zhen, XING Han-cheng. Associated Model of Bi-Markov Decision Processes [J]. Computer Science, 2009, 36(9): 161-166.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!