计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 308-316.doi: 10.11896/jsjkx.220500101

• 信息安全 • 上一篇    下一篇

基于分层强化学习的智能化攻击路径发现方法

曾庆伟, 张国敏, 邢长友, 宋丽华   

  1. 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2022-05-12 修回日期:2022-08-18 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 张国敏(40519667@qq.com)
  • 作者简介:(943919527@qq.com)
  • 基金资助:
    国家自然科学基金面上项目(62172432)

Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning

ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua   

  1. College of Command and Control Engineering,Army Engineering University,Nanjing 210007,China
  • Received:2022-05-12 Revised:2022-08-18 Online:2023-07-15 Published:2023-07-05
  • About author:ZENG Qingwei,born in 1995,postgra-duate.His main research interest is cyberspace security.ZHANG Guomin,born in 1979,Ph.D,professor,master supervisor.His main research interests include software-defined networking,network security,network measurement,and distributed systems.   
  • Supported by:
    National Natural Science Foundation of China(62172432).

摘要: 智能化攻击路径发现是开展自动化渗透测试的一项关键技术,但现有方法面临着状态、动作空间呈指数型增长和奖励稀疏等问题,导致算法难以收敛。为此,提出了一种基于分层强化学习的智能化攻击路径发现方法iPathD(Intelligent Path Discovery)。iPathD将攻击路径发现过程构建为一个分层的马尔可夫决策过程,以分别描述上层的主机间渗透路径发现和下层的单主机内部攻击路径发现,并在此基础上提出并实现了一种基于分层强化学习的攻击路径发现算法。实验结果表明,与传统基于DQN(Deep Q Learning)及其改进算法的方法相比,iPathD路径发现方法更加快速有效,并且随着主机中漏洞数目的增加,iPathD的效果更好,且适用于大规模的网络场景。

关键词: 渗透测试, 马尔可夫决策过程, 分层强化学习, 攻击路径发现, DQN算法

Abstract: Intelligent attack path discovery is a key technology for automated penetration testing,but existing methods face the problems of exponential growth of state and action space and sparse rewards,which make the algorithm difficult to converge.To this end,an intelligent attack path discovery method(iPathD) based on hierarchical reinforcement learning is proposed.iPathD constructs the attack path discovery process as a layered Markov decision process to describe the upper-layer inter-host penetration path discovery and the lower-layer single-host internal attack path discovery,respectively.On this basis,an attack path discovery algorithm based on hierarchical reinforcement learning is proposed and implemented.Experimental results show that compared with the traditional method based on deep Q learning(DQN) and its improved algorithm,the iPathD path discovery method is faster and more effective.With the increase of the number of vulnerabilities in the host,the effect of iPathD is better,and it is suitable for large-scale network scenarios.

Key words: Penetration testing, Markov decision process, Hierarchical reinforcement learning, Attack path discovery, DQN algorithm

中图分类号: 

  • TP393
[1]ARCE I,MCGRAW G.Guest editors’ introduction:Why atta-cking systems is a good idea[J].IEEE Security & Privacy,2004,2(4):17-19.
[2]ARKIN B,STENDER S,MCGRAW G.Software penetrationtesting[J].IEEE Security & Privacy,2005,3(1):84-87.
[3]SUTTON R S,BARTO A G.Reinforcement learling:An introduction[M].MIT press,2018.
[4]SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[J].arXiv:1306.4714,2013.
[5]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017.
[6]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.2012.
[7]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020.
[8]ZHOU T Y,ZANG Y C,ZHU J H,et al.NIG-AP:a new me-thod for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering.2019,20(9):1277-1288.
[9]HU Z,BEURAN R,TAN Y.Automated Penetration TestingUsing Deep Reinforcement Learning[C]//IEEE European Symposium on Security and Privacy Workshops.2020.
[10]ZHOU S,LIU J,HU D.,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Appl.Sci.2021,11,8823.
[11]SCHWARTZ J,KURNIAWATTI H.NASim:Network Attack Simulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019.
[12]SEIFERT C,BSTSER M,BLUM W,et al.CyberBattleSim[Z/OL].https://github.com/microsoft/ cyberbattlesim,2021.
[13]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[14]BARTO A G,MAHADEVAN S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems,2003,13(1/2):341-379.
[15]DAYAN P,HINTON G.Feudal Reinforcement Learning[C]//Proceedings of Advances in Neural Information Processing Systems.San Francisco:Morgan Kaufmann,1993:271-278.
[16]SINGH S.Transfer of Learning by Composing Solutions of Elemental Sequential Tasks[J].Machine Learning,1992,8:323-339.
[17]TAKAHASHI Y,ASADA M.Multi-controller Fusion in Multi-layered Reinforcement Learning[C]//International Conference on Multisensor Fusion and Integration for Intelligent Systems(MFI2001).Baden Baden,Germany,2001:7-12.
[18]CHEN T,LU J.Towards analysis of semi-Markov decisionprocesses[C]//Artificial Intelligence and Computational Intelligence(AICI 2010).Berlin,Heidelberg:Springer,2010:41-48.
[19]MAHADEVAN S,MARCHALLECK N,DAS T,et al.Slef-improving Factory Simulation Using Continuous-time Average-reward Reinforcement Learning[C]//Proceedings of the 14th Internatioanl Conference on Machine Learning.Nashville,Tennessee,USA,1997:202-210.
[20]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].arXiv:1705.05088.
[21]CHOWDHARY A,HUANG D,MAHENDRAN J S,et al.Autonomous security analysis and penetration testing[C]//2020 16th International Conference on Mobility,Sensing and Networking(MSN).2020:508-515.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!