计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 400-406.doi: 10.11896/jsjkx.231200074
• 信息安全 • 上一篇
霍兴鹏, 沙乐天, 刘建文, 吴尚, 苏子悦
HUO Xingpeng, SHA Letian, LIU Jianwen, WU Shang, SU Ziyue
摘要: Windows域被视作内网渗透测试的重点目标,然而Windows域渗透测试的场景和方法与常规的内网渗透有很大差异。因此,当前常规的智能化路径发现研究并不适用于Windows域环境。为了增强Windows域的安全防护,提出了一种基于深度强化学习的Windows域渗透测试路径自动化生成方法。首先,将Windows域渗透测试场景建模为马尔可夫决策过程,通过OpenAI的Gymnasium设计了一个适用于强化学习的模拟器;其次,为了解决在大动作空间和观察空间下的探索不充分问题,提出了通过先验知识对冗余动作进行削减并对无效观察空间进行压缩的方法;最后,在小型服务器中利用虚拟机技术部署Windows域环境,以NDD-DQN作为基础算法,实现了在真实环境中从信息收集、模型构建到路径生成的全流程自动化。实验结果表明,所提方法在真实的Windows复杂环境中具有良好的模拟和训练效果。
中图分类号:
[1]IBM Documentation[EB/OL].(2023-03-15)[2023-05-28].https://www.ibm.com/docs/en/informix-servers/14.10?topic=architecture-windows-network-domain. [2]ENGEBRETSON P.The Basics of Hacking and PenetrationTesting:Ethical Hacking and Penetration Testing Made Easy[M].Elsevier,2013:1-14. [3]BAILLIE C,STANDEN M,SCHWARTZ J,et al.CybORG:An Autonomous Cyber Operations Research Gym[J].arXiv:2002.10667,2020. [4]LI L,FAYAD R,TAYLOR A.CyGIL:A Cyber Gym for Trai-ning Autonomous Agents over Emulated Network Systems[J].arXiv:2109.03331,2021. [5]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016. [6]SCHWARTZ J,KURNIAWATTI H.NASim:Network AttackSimulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019. [7]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019. [8]MAEDA R,MIMURA M.Automating post-exploitation withdeep reinforcement learning[J].Computers & Security,2021,100:102108. [9]ZHOU S C,LIU J J,ZHONG X F,et al.Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning[J].Computer Science,2021,48(7):40-46. [10]ZHOU S,LIU J,HOU D,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Applied Sciences,2021,11(19):8823. [11]ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316. [12]BELLMAN R.A Markovian Decision Process[J].Indiana University Mathematics Journal,1957,6(4):679-684. [13]BELLMAN R.Dynamic programming[J].Science,AmericanAssociation for the Advancement of Science,1966,153(3731):34-37. [14]FRANÇOIS-LAVET V,HENDERSON P,ISLAM R,et al.An Introduction to Deep Reinforcement Learning[J].Foundations and Trends© in Machine Learning,2018,11(3/4):219-354. [15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [16]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:2094-2100. [17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning.PMLR,2016:1995-2003. [18]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017. [19]NVD-CVSS v3 Calculator[EB/OL].[2023-04-27].https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator. [20]Archiveddocs.Active Directory Structure and Storage Technologies:Active Directory[EB/OL].(2014-11-19)[2023-04-25].https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc759186(v=ws.10). |
|