计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 40-46.doi: 10.11896/jsjkx.210400057

所属专题: 人工智能安全

• 人工智能安全* • 上一篇    下一篇

基于深度强化学习的智能化渗透测试路径发现

周仕承, 刘京菊, 钟晓峰, 卢灿举   

  1. 国防科技大学电子对抗学院 合肥230037
    网络空间安全态势感知与评估安徽省重点实验室 合肥230037
  • 收稿日期:2021-04-06 修回日期:2021-05-26 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 刘京菊(jingjul@aliyun.com)

Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning

ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju   

  1. College of Electronic Engineering,National University of Defense Technology,Hefei 230037,China
    Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation,Hefei 230037,China
  • Received:2021-04-06 Revised:2021-05-26 Online:2021-07-15 Published:2021-07-02
  • About author:ZHOU Shi-cheng,born in 1995,postgraduate.His main research interests include cyberspace security and reinforcement learning.(zhoushicheng@nudt.edu.cn)
    LIU Jing-ju,born in 1974,professor.Her main research interests include cyberspace security and machine learning.

摘要: 渗透测试是通过模拟黑客攻击的方式对网络进行安全测试的通用方法,传统渗透测试方式主要依赖人工进行,具有较高的时间成本和人力成本。智能化渗透测试是未来的发展方向,旨在更加高效、低成本地进行网络安全防护,渗透测试路径发现是智能化渗透测试研究的关键问题,目的是及时发现网络中的脆弱节点以及攻击者可能的渗透路径,从而做到有针对性的防御。文中将深度强化学习与渗透测试问题相结合,将渗透测试过程建模为马尔可夫决策模型,在模拟网络环境中训练智能体完成智能化渗透测试路径发现;提出了一种改进的深度强化学习算法Noisy-Double-Dueling DQNper,该算法融合了优先级经验回放机制、双重Q网络、竞争网络机制以及噪声网络机制,在不同规模的网络场景中进行了对比实验,该算法在收敛速度上优于传统DQN(Deep Q Network)算法及其改进版本,并且适用于较大规模的网络场景。

关键词: DQN算法, 路径发现, 深度强化学习, 渗透测试, 网络安全

Abstract: Penetration testing is a general method for network security testing by simulating hacker attacks.Traditional penetration testing methods mainly rely on manual operations,which have high time and labor costs.Intelligent penetration testing is the future direction of development,aiming at more efficient and low-cost network security protection.Penetration testing path discovery is a key issue in the research of intelligent penetration testing,the purpose of which is to discover vulnerabilities in the network and possible attackers’ penetration testing path in time and achieve targeted defense.In this paper,deep reinforcement learning and penetration testing are combined,the agent is trained in simulated network scenarios,the penetration testing process is modeled as a Markov decision process model,and an improved deep reinforcement learning algorithm Noisy-Double-Dueling DQNper is proposed.The algorithm integrates prioritized experience replay mechanism,double DQN,dueling DQN and noise net mechanism.Different scale network scenarios are used for comparative experiments.The algorithm is better than the traditional DQN (Deep Q Network) algorithm and its improved version in convergence speed and can be applied to larger scale network scenarios.

Key words: Cybersecurity, Deep reinforcement learning, DQN algorithm, Path discovery, Penetration testing

中图分类号: 

  • TP393
[1]XIONG Y.Design and Implementation of Automatic Penetration Testing Platform[D].Beijing:Beijing University of Posts and Telecommunications,2019.
[2]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912.06680,2019.
[3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354.
[4]YE D,CHEN G,ZHANG W,et al.Towards playing full moba games with deep reinforcement learning[J].arXiv:2011.12692,2020.
[5]ZANG Y C,ZHOU T Y,ZHU J H,et al.Domain-Independent Intelligent Planning Technology and Its Application to Automated Penetration Testing Oriented Attack Path Discovery[J].Journal of Electronics & Information Technology,2020,42(9):2095-2107.
[6]ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method forautomated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288.
[7]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Simulated penetration testing as contingent planning[C]//Proceedings of the International Conference on Automated Planning and Sche-duling.2018.
[8]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012.
[9]SCHWARTZ J,KURNIAWATI H,EL-MAHASSNI E.POMDP+ Information-Decay:Incorporating Defender’s Behaviour in Autonomous Penetration Testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243.
[10]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020.
[11]LI T,CAO S J,YIN S W,et al.Optimal method for the generation of the attack path based on the Q-Learning decision[J].Journal of Xidian University,2021,48(1):160-167.
[12]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019.
[13]BAILLIE C,STANDEN M,SCHWARTZ J,et al.Cyborg:An autonomous cyber operations research gym[J].arXiv:2002.10667,2020.
[14]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018.
[15]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6.
[16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[17]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015.
[19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016.
[20]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003.
[21]WUNDER M,LITTMAN M L,BABES M.Classes of multia-gent q-learning dynamics with epsilon-greedy exploration[C]//ICML.2010.
[22]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017.
[23]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].ArXiv,abs/1705.05088.
[24]YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 柳杰灵, 凌晓波, 张蕾, 王博, 王之梁, 李子木, 张辉, 杨家海, 吴程楠.
基于战术关联的网络安全风险评估框架
Network Security Risk Assessment Framework Based on Tactical Correlation
计算机科学, 2022, 49(9): 306-311. https://doi.org/10.11896/jsjkx.210600171
[3] 王磊, 李晓宇.
基于随机洋葱路由的LBS移动隐私保护方案
LBS Mobile Privacy Protection Scheme Based on Random Onion Routing
计算机科学, 2022, 49(9): 347-354. https://doi.org/10.11896/jsjkx.210800077
[4] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[5] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[6] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[7] 陶礼靖, 邱菡, 朱俊虎, 李航天.
面向网络安全训练评估的受训者行为描述模型
Model for the Description of Trainee Behavior for Cyber Security Exercises Assessment
计算机科学, 2022, 49(6A): 480-484. https://doi.org/10.11896/jsjkx.210800048
[8] 高文龙, 周天阳, 朱俊虎, 赵子恒.
基于双向蚁群算法的网络攻击路径发现方法
Network Attack Path Discovery Method Based on Bidirectional Ant Colony Algorithm
计算机科学, 2022, 49(6A): 516-522. https://doi.org/10.11896/jsjkx.210500072
[9] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[10] 杜鸿毅, 杨华, 刘艳红, 杨鸿鹏.
基于网络媒体的非线性动力学信息传播模型
Nonlinear Dynamics Information Dissemination Model Based on Network Media
计算机科学, 2022, 49(6A): 280-284. https://doi.org/10.11896/jsjkx.210500043
[11] 吕鹏鹏, 王少影, 周文芳, 连阳阳, 高丽芳.
基于进化神经网络的电力信息网安全态势量化方法
Quantitative Method of Power Information Network Security Situation Based on Evolutionary Neural Network
计算机科学, 2022, 49(6A): 588-593. https://doi.org/10.11896/jsjkx.210200151
[12] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[13] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[14] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[15] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!