计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 308-316.doi: 10.11896/jsjkx.220500101
曾庆伟, 张国敏, 邢长友, 宋丽华
ZENG Qingwei, ZHANG Guomin, XING Changyou, SONG Lihua
摘要: 智能化攻击路径发现是开展自动化渗透测试的一项关键技术,但现有方法面临着状态、动作空间呈指数型增长和奖励稀疏等问题,导致算法难以收敛。为此,提出了一种基于分层强化学习的智能化攻击路径发现方法iPathD(Intelligent Path Discovery)。iPathD将攻击路径发现过程构建为一个分层的马尔可夫决策过程,以分别描述上层的主机间渗透路径发现和下层的单主机内部攻击路径发现,并在此基础上提出并实现了一种基于分层强化学习的攻击路径发现算法。实验结果表明,与传统基于DQN(Deep Q Learning)及其改进算法的方法相比,iPathD路径发现方法更加快速有效,并且随着主机中漏洞数目的增加,iPathD的效果更好,且适用于大规模的网络场景。
中图分类号:
[1]ARCE I,MCGRAW G.Guest editors’ introduction:Why atta-cking systems is a good idea[J].IEEE Security & Privacy,2004,2(4):17-19. [2]ARKIN B,STENDER S,MCGRAW G.Software penetrationtesting[J].IEEE Security & Privacy,2005,3(1):84-87. [3]SUTTON R S,BARTO A G.Reinforcement learling:An introduction[M].MIT press,2018. [4]SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[J].arXiv:1306.4714,2013. [5]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017. [6]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Twenty-Sixth AAAI Conference on Artificial Intelligence.2012. [7]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020. [8]ZHOU T Y,ZANG Y C,ZHU J H,et al.NIG-AP:a new me-thod for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering.2019,20(9):1277-1288. [9]HU Z,BEURAN R,TAN Y.Automated Penetration TestingUsing Deep Reinforcement Learning[C]//IEEE European Symposium on Security and Privacy Workshops.2020. [10]ZHOU S,LIU J,HU D.,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Appl.Sci.2021,11,8823. [11]SCHWARTZ J,KURNIAWATTI H.NASim:Network Attack Simulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019. [12]SEIFERT C,BSTSER M,BLUM W,et al.CyberBattleSim[Z/OL].https://github.com/microsoft/ cyberbattlesim,2021. [13]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019. [14]BARTO A G,MAHADEVAN S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems,2003,13(1/2):341-379. [15]DAYAN P,HINTON G.Feudal Reinforcement Learning[C]//Proceedings of Advances in Neural Information Processing Systems.San Francisco:Morgan Kaufmann,1993:271-278. [16]SINGH S.Transfer of Learning by Composing Solutions of Elemental Sequential Tasks[J].Machine Learning,1992,8:323-339. [17]TAKAHASHI Y,ASADA M.Multi-controller Fusion in Multi-layered Reinforcement Learning[C]//International Conference on Multisensor Fusion and Integration for Intelligent Systems(MFI2001).Baden Baden,Germany,2001:7-12. [18]CHEN T,LU J.Towards analysis of semi-Markov decisionprocesses[C]//Artificial Intelligence and Computational Intelligence(AICI 2010).Berlin,Heidelberg:Springer,2010:41-48. [19]MAHADEVAN S,MARCHALLECK N,DAS T,et al.Slef-improving Factory Simulation Using Continuous-time Average-reward Reinforcement Learning[C]//Proceedings of the 14th Internatioanl Conference on Machine Learning.Nashville,Tennessee,USA,1997:202-210. [20]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].arXiv:1705.05088. [21]CHOWDHARY A,HUANG D,MAHENDRAN J S,et al.Autonomous security analysis and penetration testing[C]//2020 16th International Conference on Mobility,Sensing and Networking(MSN).2020:508-515. |
|