计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 40-46.doi: 10.11896/jsjkx.210400057
所属专题: 人工智能安全
周仕承, 刘京菊, 钟晓峰, 卢灿举
ZHOU Shi-cheng, LIU Jing-ju, ZHONG Xiao-feng, LU Can-ju
摘要: 渗透测试是通过模拟黑客攻击的方式对网络进行安全测试的通用方法,传统渗透测试方式主要依赖人工进行,具有较高的时间成本和人力成本。智能化渗透测试是未来的发展方向,旨在更加高效、低成本地进行网络安全防护,渗透测试路径发现是智能化渗透测试研究的关键问题,目的是及时发现网络中的脆弱节点以及攻击者可能的渗透路径,从而做到有针对性的防御。文中将深度强化学习与渗透测试问题相结合,将渗透测试过程建模为马尔可夫决策模型,在模拟网络环境中训练智能体完成智能化渗透测试路径发现;提出了一种改进的深度强化学习算法Noisy-Double-Dueling DQNper,该算法融合了优先级经验回放机制、双重Q网络、竞争网络机制以及噪声网络机制,在不同规模的网络场景中进行了对比实验,该算法在收敛速度上优于传统DQN(Deep Q Network)算法及其改进版本,并且适用于较大规模的网络场景。
中图分类号:
[1]XIONG Y.Design and Implementation of Automatic Penetration Testing Platform[D].Beijing:Beijing University of Posts and Telecommunications,2019. [2]BERNER C,BROCKMAN G,CHAN B,et al.Dota 2 with large scale deep reinforcement learning[J].arXiv:1912.06680,2019. [3]VINYALS O,BABUSCHKIN I,CZARNECKI W M,et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature,2019,575(7782):350-354. [4]YE D,CHEN G,ZHANG W,et al.Towards playing full moba games with deep reinforcement learning[J].arXiv:2011.12692,2020. [5]ZANG Y C,ZHOU T Y,ZHU J H,et al.Domain-Independent Intelligent Planning Technology and Its Application to Automated Penetration Testing Oriented Attack Path Discovery[J].Journal of Electronics & Information Technology,2020,42(9):2095-2107. [6]ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method forautomated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288. [7]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Simulated penetration testing as contingent planning[C]//Proceedings of the International Conference on Automated Planning and Sche-duling.2018. [8]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs make better hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012. [9]SCHWARTZ J,KURNIAWATI H,EL-MAHASSNI E.POMDP+ Information-Decay:Incorporating Defender’s Behaviour in Autonomous Penetration Testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243. [10]ZENNARO F M,ERDODI L.Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J].arXiv:2005.12632,2020. [11]LI T,CAO S J,YIN S W,et al.Optimal method for the generation of the attack path based on the Q-Learning decision[J].Journal of Xidian University,2021,48(1):160-167. [12]SCHWARTZ J,KURNIAWATI H.Autonomous penetrationtesting using reinforcement learning[J].arXiv:1905.05965,2019. [13]BAILLIE C,STANDEN M,SCHWARTZ J,et al.Cyborg:An autonomous cyber operations research gym[J].arXiv:2002.10667,2020. [14]SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018. [15]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6. [16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013. [17]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [18]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[J].arXiv:1511.05952,2015. [19]VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016. [20]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003. [21]WUNDER M,LITTMAN M L,BABES M.Classes of multia-gent q-learning dynamics with epsilon-greedy exploration[C]//ICML.2010. [22]FORTUNATO M,AZAR M G,PIOT B,et al.Noisy networks for exploration[J].arXiv:1706.10295,2017. [23]BACKES M,HOFFMANN J,KÜNNEMANN R,et al.Simulated penetration testing and mitigation analysis[J].ArXiv,abs/1705.05088. [24]YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 柳杰灵, 凌晓波, 张蕾, 王博, 王之梁, 李子木, 张辉, 杨家海, 吴程楠. 基于战术关联的网络安全风险评估框架 Network Security Risk Assessment Framework Based on Tactical Correlation 计算机科学, 2022, 49(9): 306-311. https://doi.org/10.11896/jsjkx.210600171 |
[3] | 王磊, 李晓宇. 基于随机洋葱路由的LBS移动隐私保护方案 LBS Mobile Privacy Protection Scheme Based on Random Onion Routing 计算机科学, 2022, 49(9): 347-354. https://doi.org/10.11896/jsjkx.210800077 |
[4] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[5] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[6] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
[7] | 陶礼靖, 邱菡, 朱俊虎, 李航天. 面向网络安全训练评估的受训者行为描述模型 Model for the Description of Trainee Behavior for Cyber Security Exercises Assessment 计算机科学, 2022, 49(6A): 480-484. https://doi.org/10.11896/jsjkx.210800048 |
[8] | 高文龙, 周天阳, 朱俊虎, 赵子恒. 基于双向蚁群算法的网络攻击路径发现方法 Network Attack Path Discovery Method Based on Bidirectional Ant Colony Algorithm 计算机科学, 2022, 49(6A): 516-522. https://doi.org/10.11896/jsjkx.210500072 |
[9] | 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018 |
[10] | 杜鸿毅, 杨华, 刘艳红, 杨鸿鹏. 基于网络媒体的非线性动力学信息传播模型 Nonlinear Dynamics Information Dissemination Model Based on Network Media 计算机科学, 2022, 49(6A): 280-284. https://doi.org/10.11896/jsjkx.210500043 |
[11] | 吕鹏鹏, 王少影, 周文芳, 连阳阳, 高丽芳. 基于进化神经网络的电力信息网安全态势量化方法 Quantitative Method of Power Information Network Security Situation Based on Evolutionary Neural Network 计算机科学, 2022, 49(6A): 588-593. https://doi.org/10.11896/jsjkx.210200151 |
[12] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[13] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[14] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[15] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
|