计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230200162-6.doi: 10.11896/jsjkx.230200162
陈宇飞1, 李赛飞1, 张丽杰2, 赵越3
CHEN Yufei1, LI Saifei1, ZHANG Lijie2, ZHAO Yue3
摘要: 红队测试是一种通过模拟真实黑客攻击行为来对网络系统进行安全测评的方法。然而,目前人工测试存在成本较高与适应性较差的问题。红队测试智能化与自动化是当前研究的热点问题,旨在降低红队测试的成本,提高网络安全测评的测试性能与测试效率。自动化攻击策略是自动化红队测试的核心,其作用是替代安全专家进行攻击技术的决策。文中将红队攻击技术映射到强化学习,从而将红队测试过程建模为马尔可夫决策模型,通过有限状态机模型实现了固定策略与强化学习策略;在真实网络环境中对不同的强化学习策略进行训练和测试,验证了强化学习策略的收敛性和可行性。实验结果表明,基于SARSA(λ)算法的强化学习策略优于其他强化学习策略,收敛速度最快;3种强化学习策略均能在测试实验中稳定完成测试目标,且性能远优于固定策略。
中图分类号:
[1]XIONG Y.Design and Implementation of Automatic Penetration Testing Platform[D].Beijing:Beijing University of Posts and Telecommunications,2019. [2]APPLEBAUM A,MILLER D,STROM B,et al.Intelligent,Automated Red Team Emulation[C]//Proceedings of the 32nd Annual Conference on Computer Security Applications.ACM,2016:363-373. [3]GANGUPANTULU R,CODY T,PARK P,et al.Using Cyber Terrain in Reinforcement Learning for Penetration Testing[C]//2022 IEEE International Conference on Omni-layer Intelligent Systems(COINS).IEEE,2022:1-8. [4]HU Z,BEURAN R,TAN Y.Automated Penetration TestingUsing Deep Reinforcement Learning[C]//2020 IEEE European Symposium on Security and Privacy Workshops(EuroS&PW).IEEE,2020:2-10. [5]POZDNIAKOV K,ALONSO E,STANKOVIC V,et al.SmartSecurity Audit:Reinforcement Learning with a Deep Neural Network Approximator[C]//2020 International Conference on Cyber Situational Awareness,Data Analytics and Assessment(CyberSA).IEEE,2020:1-8. [6]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs MakeBetter Hackers:Accounting for Uncertainty in Penetration Testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012:1816-1824. [7]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Simulated Penetration Testing as Contingent Planning[C]//Proceedings of the International Conference on Automated Planning and Sche-duling.2018:241-249. [8]LI T,CAO S J,YIN S W,et al.Optimal method for the generation of the attack path based on the Q-Learning decision[J].Journal of Xidian University,2021,48(1):160-167. [9]MAEDA R,MIMURA M.Automating post-exploitation withdeep reinforcement learning[J].Computers & Security,2021,100:102-108. [10]The MITRE ATT&CK.Adversarial Tactics,Techniques,andCommon Knowledge[EB/OL].(2022-10-25)[2022-12-13].https://attack.mitre.org/. [11]The MITRE CALDERA.A Scalable,Automated AdversaryEmulation Platform[EB/OL].(2022-09-20)[2022-12-13].https://caldera.mitre.org/. [12]QIN Z H,LI N,LIU X T,et al.Overview of Research on Model-free Reinforcement Learning[J].Computes Science,2021,48(3):180-187. [13]GAO Y,CHEN S F,LU X.Research on Reinforcement Learning Technology:A Review[J].Acta Automatica Sinica,2004,30(1):86-100. [14]CHEN S L,WEI Y M.Least-squares SARSA(Lambda) algorithms for reinforcement learning[C]//2008 Fourth International Conference on Natural Computation.IEEE,2008:632-636. |
|