计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 400-406.doi: 10.11896/jsjkx.231200074

• 信息安全 • 上一篇    

基于深度强化学习的Windows域渗透攻击路径生成方法

霍兴鹏, 沙乐天, 刘建文, 吴尚, 苏子悦   

  1. 南京邮电大学计算机学院 南京 210023
  • 收稿日期:2023-12-11 修回日期:2024-04-26 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 沙乐天(1528652674@qq.com)
  • 作者简介:(1021041524@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金面上项目(62072253)

Windows Domain Penetration Testing Attack Path Generation Based on Deep Reinforcement Learning

HUO Xingpeng, SHA Letian, LIU Jianwen, WU Shang, SU Ziyue   

  1. College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2023-12-11 Revised:2024-04-26 Online:2025-03-15 Published:2025-03-07
  • Supported by:
    National Natural Science Foundation of China(62072253).

摘要: Windows域被视作内网渗透测试的重点目标,然而Windows域渗透测试的场景和方法与常规的内网渗透有很大差异。因此,当前常规的智能化路径发现研究并不适用于Windows域环境。为了增强Windows域的安全防护,提出了一种基于深度强化学习的Windows域渗透测试路径自动化生成方法。首先,将Windows域渗透测试场景建模为马尔可夫决策过程,通过OpenAI的Gymnasium设计了一个适用于强化学习的模拟器;其次,为了解决在大动作空间和观察空间下的探索不充分问题,提出了通过先验知识对冗余动作进行削减并对无效观察空间进行压缩的方法;最后,在小型服务器中利用虚拟机技术部署Windows域环境,以NDD-DQN作为基础算法,实现了在真实环境中从信息收集、模型构建到路径生成的全流程自动化。实验结果表明,所提方法在真实的Windows复杂环境中具有良好的模拟和训练效果。

关键词: 渗透测试, Windows域, 深度强化学习, DQN算法, 攻击路径

Abstract: Windows domain is a prime target for intranet penetration.However,the scenarios and methods of Windows domain penetration testing are very different from those of conventional intranet penetration..Existing research on intelligent path discovery is not suitable for the intricacies of Windows domain environments.Therefore,the current conventional intelligent path discovery research is not applicable to the Windows domain environment.In order to enhance the security protection of Windows domain,an automatic generation method of Windows domain penetration testing path based on deep reinforcement learning is proposed.Firstly,Windows domain penetration testing scenario is modeled as Markov decision process,and a simulator suitable for reinforcement learning is designed through Gymnasium of OpenAI.Secondly,in response to the challenge of limited exploration in large action and observation spaces,prior knowledge is leveraged to eliminate redundant actions and streamline the observation space.Lastly,the virtual machine technology is used to deploy the Windows domain environment in the small server,and the NDD-DQN is used as the basic algorithm to realize the whole process automation from information collection,model construction to path generation in the real environment.Experimental results show that the proposed method exhibit effective simulation and training effect in complex,real-world Windows domain environments.

Key words: Penetration testing, Windows domain, Deep reinforcement learning, DQN algorithm, Attack path

中图分类号: 

  • TP393
[1]IBM Documentation[EB/OL].(2023-03-15)[2023-05-28].https://www.ibm.com/docs/en/informix-servers/14.10?topic=architecture-windows-network-domain.
[2]ENGEBRETSON P.The Basics of Hacking and PenetrationTesting:Ethical Hacking and Penetration Testing Made Easy[M].Elsevier,2013:1-14.
[3]BAILLIE C,STANDEN M,SCHWARTZ J,et al.CybORG:An Autonomous Cyber Operations Research Gym[J].arXiv:2002.10667,2020.
[4]LI L,FAYAD R,TAYLOR A.CyGIL:A Cyber Gym for Trai-ning Autonomous Agents over Emulated Network Systems[J].arXiv:2109.03331,2021.
[5]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016.
[6]SCHWARTZ J,KURNIAWATTI H.NASim:Network AttackSimulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019.
[7]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[8]MAEDA R,MIMURA M.Automating post-exploitation withdeep reinforcement learning[J].Computers & Security,2021,100:102108.
[9]ZHOU S C,LIU J J,ZHONG X F,et al.Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning[J].Computer Science,2021,48(7):40-46.
[10]ZHOU S,LIU J,HOU D,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Applied Sciences,2021,11(19):8823.
[11]ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316.
[12]BELLMAN R.A Markovian Decision Process[J].Indiana University Mathematics Journal,1957,6(4):679-684.
[13]BELLMAN R.Dynamic programming[J].Science,AmericanAssociation for the Advancement of Science,1966,153(3731):34-37.
[14]FRANÇOIS-LAVET V,HENDERSON P,ISLAM R,et al.An Introduction to Deep Reinforcement Learning[J].Foundations and Trends© in Machine Learning,2018,11(3/4):219-354.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[16]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:2094-2100.
[17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning.PMLR,2016:1995-2003.
[18]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017.
[19]NVD-CVSS v3 Calculator[EB/OL].[2023-04-27].https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator.
[20]Archiveddocs.Active Directory Structure and Storage Technologies:Active Directory[EB/OL].(2014-11-19)[2023-04-25].https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc759186(v=ws.10).
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!