Computer Science ›› 2021, Vol. 48 ›› Issue (7): 40-46.doi: 10.11896/jsjkx.210400057

Special Issue: Artificial Intelligence Security

• Artificial Intelligence Security • Previous Articles     Next Articles

Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning

ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju   

  1. College of Electronic Engineering,National University of Defense Technology,Hefei 230037,China
    Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation,Hefei 230037,China
  • Received:2021-04-06 Revised:2021-05-26 Online:2021-07-15 Published:2021-07-02
  • About author:ZHOU Shi-cheng,born in 1995,postgraduate.His main research interests include cyberspace security and reinforcement learning.(zhoushicheng@nudt.edu.cn)
    LIU Jing-ju,born in 1974,professor.Her main research interests include cyberspace security and machine learning.

Abstract: Penetration testing is a general method for network security testing by simulating hacker attacks.Traditional penetration testing methods mainly rely on manual operations,which have high time and labor costs.Intelligent penetration testing is the future direction of development,aiming at more efficient and low-cost network security protection.Penetration testing path discovery is a key issue in the research of intelligent penetration testing,the purpose of which is to discover vulnerabilities in the network and possible attackers’ penetration testing path in time and achieve targeted defense.In this paper,deep reinforcement learning and penetration testing are combined,the agent is trained in simulated network scenarios,the penetration testing process is modeled as a Markov decision process model,and an improved deep reinforcement learning algorithm Noisy-Double-Dueling DQNper is proposed.The algorithm integrates prioritized experience replay mechanism,double DQN,dueling DQN and noise net mechanism.Different scale network scenarios are used for comparative experiments.The algorithm is better than the traditional DQN (Deep Q Network) algorithm and its improved version in convergence speed and can be applied to larger scale network scenarios.

Key words: Cybersecurity, Deep reinforcement learning, DQN algorithm, Path discovery, Penetration testing

CLC Number: 

  • TP393
[1]XIONG Y.Design and Implementation of Automatic Penetration Testing Platform[D].Beijing:Beijing University of Posts and Telecommunications,2019.
[2]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912.06680,2019.
[3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[4]YE D,CHEN G,ZHANG W,et al.Towards playing full moba games with deep reinforcement learning[J].arXiv:2011.12692,2020.
[5]ZANG Y C,ZHOU T Y,ZHU J H,et al.Domain-Independent Intelligent Planning Technology and Its Application to Automated Penetration Testing Oriented Attack Path Discovery[J].Journal of Electronics & Information Technology,2020,42(9):2095-2107.
[6]ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method forautomated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288.
[7]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Simulated penetration testing as contingent planning[C]//Proceedings of the International Conference on Automated Planning and Sche-duling.2018.
[8]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012.
[9]SCHWARTZ J,KURNIAWATI H,EL-MAHASSNI E.POMDP+ Information-Decay:Incorporating Defender’s Behaviour in Autonomous Penetration Testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243.
[10]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020.
[11]LI T,CAO S J,YIN S W,et al.Optimal method for the generation of the attack path based on the Q-Learning decision[J].Journal of Xidian University,2021,48(1):160-167.
[12]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[13]BAILLIE C,STANDEN M,SCHWARTZ J,et al.Cyborg:An autonomous cyber operations research gym[J].arXiv:2002.10667,2020.
[14]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018.
[15]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6.
[16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[17]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015.
[19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[20]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[21]WUNDER M,LITTMAN M L,BABES M.Classes of multia-gent q-learning dynamics with epsilon-greedy exploration[C]//ICML.2010.
[22]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017.
[23]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].ArXiv,abs/1705.05088.
[24]YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191.
[1] YU Bin, LI Xue-hua, PAN Chun-yu, LI Na. Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning [J]. Computer Science, 2022, 49(7): 248-253.
[2] LI Meng-fei, MAO Ying-chi, TU Zi-jian, WANG Xuan, XU Shu-fang. Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient [J]. Computer Science, 2022, 49(7): 271-279.
[3] GAO Wen-long, ZHOU Tian-yang, ZHU Jun-hu, ZHAO Zi-heng. Network Attack Path Discovery Method Based on Bidirectional Ant Colony Algorithm [J]. Computer Science, 2022, 49(6A): 516-522.
[4] XIE Wan-cheng, LI Bin, DAI Yue-yue. PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing [J]. Computer Science, 2022, 49(6): 3-11.
[5] HONG Zhi-li, LAI Jun, CAO Lei, CHEN Xi-liang, XU Zhi-xiong. Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration [J]. Computer Science, 2022, 49(6): 149-157.
[6] LI Peng, YI Xiu-wen, QI De-kang, DUAN Zhe-wen, LI Tian-rui. Heating Strategy Optimization Method Based on Deep Learning [J]. Computer Science, 2022, 49(4): 263-268.
[7] OUYANG Zhuo, ZHOU Si-yuan, LYU Yong, TAN Guo-ping, ZHANG Yue, XIANG Liang-liang. DRL-based Vehicle Control Strategy for Signal-free Intersections [J]. Computer Science, 2022, 49(3): 46-51.
[8] DAI Shan-shan, LIU Quan. Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method [J]. Computer Science, 2021, 48(9): 235-243.
[9] CHENG Zhao-wei, SHEN Hang, WANG Yue, WANG Min, BAI Guang-wei. Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast [J]. Computer Science, 2021, 48(9): 271-277.
[10] LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing [J]. Computer Science, 2021, 48(7): 316-323.
[11] WANG Ying-kai, WANG Qing-shan. Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting [J]. Computer Science, 2021, 48(7): 333-339.
[12] LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang. DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things [J]. Computer Science, 2021, 48(7): 47-54.
[13] FAN Jia-kuan, WANG Hao-yue, ZHAO Sheng-yu, ZHOU Tian-yi, WANG Wei. Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions [J]. Computer Science, 2021, 48(5): 45-50.
[14] FAN Yan-fang, YUAN Shuang, CAI Ying, CHEN Ruo-yu. Deep Reinforcement Learning-based Collaborative Computation Offloading Scheme in VehicularEdge Computing [J]. Computer Science, 2021, 48(5): 270-276.
[15] HUANG Zhi-yong, WU Hao-lin, WANG Zhuang, LI Hui. DQN Algorithm Based on Averaged Neural Network Parameters [J]. Computer Science, 2021, 48(4): 223-228.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!