Computer Science ›› 2025, Vol. 52 ›› Issue (3): 400-406.doi: 10.11896/jsjkx.231200074

• Information Security • Previous Articles    

Windows Domain Penetration Testing Attack Path Generation Based on Deep Reinforcement Learning

HUO Xingpeng, SHA Letian, LIU Jianwen, WU Shang, SU Ziyue   

  1. College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2023-12-11 Revised:2024-04-26 Online:2025-03-15 Published:2025-03-07
  • Supported by:
    National Natural Science Foundation of China(62072253).

Abstract: Windows domain is a prime target for intranet penetration.However,the scenarios and methods of Windows domain penetration testing are very different from those of conventional intranet penetration..Existing research on intelligent path discovery is not suitable for the intricacies of Windows domain environments.Therefore,the current conventional intelligent path discovery research is not applicable to the Windows domain environment.In order to enhance the security protection of Windows domain,an automatic generation method of Windows domain penetration testing path based on deep reinforcement learning is proposed.Firstly,Windows domain penetration testing scenario is modeled as Markov decision process,and a simulator suitable for reinforcement learning is designed through Gymnasium of OpenAI.Secondly,in response to the challenge of limited exploration in large action and observation spaces,prior knowledge is leveraged to eliminate redundant actions and streamline the observation space.Lastly,the virtual machine technology is used to deploy the Windows domain environment in the small server,and the NDD-DQN is used as the basic algorithm to realize the whole process automation from information collection,model construction to path generation in the real environment.Experimental results show that the proposed method exhibit effective simulation and training effect in complex,real-world Windows domain environments.

Key words: Penetration testing, Windows domain, Deep reinforcement learning, DQN algorithm, Attack path

CLC Number: 

  • TP393
[1]IBM Documentation[EB/OL].(2023-03-15)[2023-05-28].https://www.ibm.com/docs/en/informix-servers/14.10?topic=architecture-windows-network-domain.
[2]ENGEBRETSON P.The Basics of Hacking and PenetrationTesting:Ethical Hacking and Penetration Testing Made Easy[M].Elsevier,2013:1-14.
[3]BAILLIE C,STANDEN M,SCHWARTZ J,et al.CybORG:An Autonomous Cyber Operations Research Gym[J].arXiv:2002.10667,2020.
[4]LI L,FAYAD R,TAYLOR A.CyGIL:A Cyber Gym for Trai-ning Autonomous Agents over Emulated Network Systems[J].arXiv:2109.03331,2021.
[5]BROCKMAN G,CHEUNG V,PETTERSSON L,et al.Openai gym[J].arXiv:1606.01540,2016.
[6]SCHWARTZ J,KURNIAWATTI H.NASim:Network AttackSimulator[Z/OL].https://networkattacksimulator.readthedocs.io/.2019.
[7]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[8]MAEDA R,MIMURA M.Automating post-exploitation withdeep reinforcement learning[J].Computers & Security,2021,100:102108.
[9]ZHOU S C,LIU J J,ZHONG X F,et al.Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning[J].Computer Science,2021,48(7):40-46.
[10]ZHOU S,LIU J,HOU D,et al.Autonomous Penetration Testing Based on Improved Deep Q-Network[J].Applied Sciences,2021,11(19):8823.
[11]ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316.
[12]BELLMAN R.A Markovian Decision Process[J].Indiana University Mathematics Journal,1957,6(4):679-684.
[13]BELLMAN R.Dynamic programming[J].Science,AmericanAssociation for the Advancement of Science,1966,153(3731):34-37.
[14]FRANÇOIS-LAVET V,HENDERSON P,ISLAM R,et al.An Introduction to Deep Reinforcement Learning[J].Foundations and Trends© in Machine Learning,2018,11(3/4):219-354.
[15]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[16]HASSELT H V,GUEZ A,SILVER D.Deep ReinforcementLearning with Double Q-Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:2094-2100.
[17]WANG Z,SCHAUL T,HESSEL M,et al.Dueling Network Architectures for Deep Reinforcement Learning[C]//Proceedings of The 33rd International Conference on Machine Learning.PMLR,2016:1995-2003.
[18]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017.
[19]NVD-CVSS v3 Calculator[EB/OL].[2023-04-27].https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator.
[20]Archiveddocs.Active Directory Structure and Storage Technologies:Active Directory[EB/OL].(2014-11-19)[2023-04-25].https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc759186(v=ws.10).
[1] ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
[2] DU Likuan, LIU Chen, WANG Junlu, SONG Baoyan. Self-learning Star Chain Space Adaptive Allocation Method [J]. Computer Science, 2025, 52(3): 359-365.
[3] XU Donghong, LI Bin, QI Yong. Task Scheduling Strategy Based on Improved A2C Algorithm for Cloud Data Center [J]. Computer Science, 2025, 52(2): 310-322.
[4] WANG Tianjiu, LIU Quan, WU Lan. Offline Reinforcement Learning Algorithm for Conservative Q-learning Based on Uncertainty Weight [J]. Computer Science, 2024, 51(9): 265-272.
[5] ZHOU Wenhui, PENG Qinghua, XIE Lei. Study on Adaptive Cloud-Edge Collaborative Scheduling Methods for Multi-object State Perception [J]. Computer Science, 2024, 51(9): 319-330.
[6] LI Danyang, WU Liangji, LIU Hui, JIANG Jingqing. Deep Reinforcement Learning Based Thermal Awareness Energy Consumption OptimizationMethod for Data Centers [J]. Computer Science, 2024, 51(6A): 230500109-8.
[7] GAO Yuzhao, NIE Yiming. Survey of Multi-agent Deep Reinforcement Learning Based on Value Function Factorization [J]. Computer Science, 2024, 51(6A): 230300170-9.
[8] WANG Shuanqi, ZHAO Jianxin, LIU Chi, WU Wei, LIU Zhao. Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning [J]. Computer Science, 2024, 51(6A): 230800078-7.
[9] YANG Xiuwen, CUI Yunhe, QIAN Qing, GUO Chun, SHEN Guowei. COURIER:Edge Computing Task Scheduling and Offloading Method Based on Non-preemptivePriorities Queuing and Prioritized Experience Replay DRL [J]. Computer Science, 2024, 51(5): 293-305.
[10] LI Junwei, LIU Quan, XU Yapeng. Option-Critic Algorithm Based on Mutual Information Optimization [J]. Computer Science, 2024, 51(2): 252-258.
[11] SHI Dianxi, PENG Yingxuan, YANG Huanhuan, OUYANG Qianying, ZHANG Yuhui, HAO Feng. DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning [J]. Computer Science, 2024, 51(2): 268-277.
[12] ZHAO Xiaoyan, ZHAO Bin, ZHANG Junna, YUAN Peiyan. Study on Cache-oriented Dynamic Collaborative Task Migration Technology [J]. Computer Science, 2024, 51(2): 300-310.
[13] WANG Kewen, ZHANG Weiting, LIAO Peixi. Deterministic Transmission Scheduling Mechanism for Mixed Traffic Flows Towards Digital Twin Networks [J]. Computer Science, 2024, 51(12): 37-45.
[14] GU Zhaojun, YANG Wen, SUI He, LI Zhiping. Threat Assessment of Air Traffic Control Information System Based on Knowledge Graph [J]. Computer Science, 2024, 51(11A): 240200052-11.
[15] LU Yue, WANG Qiong, LIU Shun, LI Qingtao, LIU Yang, WANG Hongbiao. Reinforcement Learning Algorithm for Charging/Discharging Control of Electric Vehicles Considering Battery Loss [J]. Computer Science, 2024, 51(11A): 231200147-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!