计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 230200162-6.doi: 10.11896/jsjkx.230200162

• 信息安全 • 上一篇    下一篇

自动化红队测试中强化学习策略的实现与验证

陈宇飞1, 李赛飞1, 张丽杰2, 赵越3   

  1. 1 西南交通大学信息科学与技术学院 成都 611756
    2 北方激光研究院有限公司信息技术中心 成都 610041
    3 中国电子科技集团公司第三十研究所保密通信重点实验室 成都 610041
  • 发布日期:2023-11-09
  • 通讯作者: 李赛飞(lisaifei@swjtu.edu.cn)
  • 作者简介:(cyfllab@163.com)
  • 基金资助:
    四川省科技计划项目(2021YJ0372);四川省重大科技专项项目(2019ZDZX0007,2021YFQ0056);保密通信重点实验室基金(61421030201022108)

Implementation and Verification of Reinforcement Learning Strategy in Automated Red Teaming Testing

CHEN Yufei1, LI Saifei1, ZHANG Lijie2, ZHAO Yue3   

  1. 1 College of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China
    2 Norla Institute of Technical Physics,Chengdu 610041,China
    3 Science and Technology on Communication Security Laboratory,Chengdu 610041,China
  • Published:2023-11-09
  • About author:CHEN Yufei,born in 1997,postgraduate.His main research interests include cyberspace security and reinforcement learning.
    LI Saifei,born in 1988,Ph.D,engineer.His main research interests include cyberspace security and so on.
  • Supported by:
    Sichuan Science & Technology Planning Project(2021YJ0372) and Sichuan Science & Technology Major Special Project(2019ZDZX0007,2021YFQ0056) and Science and Technology on Communication Security Laboratory Foundation(61421030201022108).

摘要: 红队测试是一种通过模拟真实黑客攻击行为来对网络系统进行安全测评的方法。然而,目前人工测试存在成本较高与适应性较差的问题。红队测试智能化与自动化是当前研究的热点问题,旨在降低红队测试的成本,提高网络安全测评的测试性能与测试效率。自动化攻击策略是自动化红队测试的核心,其作用是替代安全专家进行攻击技术的决策。文中将红队攻击技术映射到强化学习,从而将红队测试过程建模为马尔可夫决策模型,通过有限状态机模型实现了固定策略与强化学习策略;在真实网络环境中对不同的强化学习策略进行训练和测试,验证了强化学习策略的收敛性和可行性。实验结果表明,基于SARSA(λ)算法的强化学习策略优于其他强化学习策略,收敛速度最快;3种强化学习策略均能在测试实验中稳定完成测试目标,且性能远优于固定策略。

关键词: 网络安全, 红队, 自动化攻击策略, 渗透测试, 强化学习

Abstract: Red teaming testing is a method to evaluate the security of network system by simulating real hacker attack behavior.However,manual test has the problems of high cost and poor adaptability at present.Red teaming testing intelligence and automation is currently a hot research topic,aiming at reducing the cost of red teaming testing and improving the test performance and efficiency of cybersecurity assessments.Automated attack strategy is the core of automated red teaming testing,it is designed to replace security experts in the attack technology decision-making process.In this paper,the red teaming attack technique is mapped to reinforcement learning,the red teaming testing process is modeled as a Markov decision process model,and the fixed strategy and reinforcement learning strategy are implemented through the finite state machine.Reinforcement learning strategy is trained and tested in the real network environment to verify the convergence and feasibility.Experimental results show that the SARSA(λ) algorithm is superior to other reinforcement learning algorithms and has the fastest convergence speed.The three reinforcement learning strategies can achieve the test objective stably in the test experiment,and the performance is much better than that of the fixed strategy.

Key words: Cybersecurity, Red teaming, Automated attack strategy, Penetration testing, Reinforcement learning

中图分类号: 

  • TP393
[1]XIONG Y.Design and Implementation of Automatic Penetration Testing Platform[D].Beijing:Beijing University of Posts and Telecommunications,2019.
[2]APPLEBAUM A,MILLER D,STROM B,et al.Intelligent,Automated Red Team Emulation[C]//Proceedings of the 32nd Annual Conference on Computer Security Applications.ACM,2016:363-373.
[3]GANGUPANTULU R,CODY T,PARK P,et al.Using Cyber Terrain in Reinforcement Learning for Penetration Testing[C]//2022 IEEE International Conference on Omni-layer Intelligent Systems(COINS).IEEE,2022:1-8.
[4]HU Z,BEURAN R,TAN Y.Automated Penetration TestingUsing Deep Reinforcement Learning[C]//2020 IEEE European Symposium on Security and Privacy Workshops(EuroS&PW).IEEE,2020:2-10.
[5]POZDNIAKOV K,ALONSO E,STANKOVIC V,et al.SmartSecurity Audit:Reinforcement Learning with a Deep Neural Network Approximator[C]//2020 International Conference on Cyber Situational Awareness,Data Analytics and Assessment(CyberSA).IEEE,2020:1-8.
[6]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs MakeBetter Hackers:Accounting for Uncertainty in Penetration Testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012:1816-1824.
[7]SHMARYAHU D,SHANI G,HOFFMANN J,et al.Simulated Penetration Testing as Contingent Planning[C]//Proceedings of the International Conference on Automated Planning and Sche-duling.2018:241-249.
[8]LI T,CAO S J,YIN S W,et al.Optimal method for the generation of the attack path based on the Q-Learning decision[J].Journal of Xidian University,2021,48(1):160-167.
[9]MAEDA R,MIMURA M.Automating post-exploitation withdeep reinforcement learning[J].Computers & Security,2021,100:102-108.
[10]The MITRE ATT&CK.Adversarial Tactics,Techniques,andCommon Knowledge[EB/OL].(2022-10-25)[2022-12-13].https://attack.mitre.org/.
[11]The MITRE CALDERA.A Scalable,Automated AdversaryEmulation Platform[EB/OL].(2022-09-20)[2022-12-13].https://caldera.mitre.org/.
[12]QIN Z H,LI N,LIU X T,et al.Overview of Research on Model-free Reinforcement Learning[J].Computes Science,2021,48(3):180-187.
[13]GAO Y,CHEN S F,LU X.Research on Reinforcement Learning Technology:A Review[J].Acta Automatica Sinica,2004,30(1):86-100.
[14]CHEN S L,WEI Y M.Least-squares SARSA(Lambda) algorithms for reinforcement learning[C]//2008 Fourth International Conference on Natural Computation.IEEE,2008:632-636.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!