计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 329-339.doi: 10.11896/jsjkx.231000207

• 信息安全 • 上一篇    下一篇

基于强化学习的智能化渗透路径规划与求解优化

李成恩1, 朱东君1, 贺杰彦1, 韩兰胜1,2   

  1. 1 华中科技大学网络空间安全学院 武汉 430000
    2 武汉金银湖实验室 武汉 430000
  • 收稿日期:2023-10-30 修回日期:2024-04-16 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 韩兰胜(hanlansheng@hust.edu.cn)
  • 作者简介:(m202271758@hust.edu.cn)
  • 基金资助:
    国家重点研发项目(2022YFB3103402);国家自然科学基金(62072200,62172176,62127808)

Intelligent Penetration Path Planning and Solution Optimization Based on Reinforcement Learning

LI Cheng’en1, ZHU Dongjun1, HE Jieyan1, HAN Lansheng1,2   

  1. 1 School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan 430000,China
    2 Wuhan Jinyinhu Laboratory,Wuhan 430000,China
  • Received:2023-10-30 Revised:2024-04-16 Online:2024-11-15 Published:2024-11-06
  • About author:LI Cheng’en,born in 2001,postgra-duate.His main research interest is cyberspace security.
    HAN Lansheng,born in 1972,Ph.D,professor,Ph.D supervisor.His main research interests include network security protection,malicious code analysis and big data security.
  • Supported by:
    National Key Research and Development Program of China(2022YFB3103402) and National Natural Science Foundation of China(62072200,62172176,62127808).

摘要: 在大数据技术广泛应用的背景下,传统渗透测试过于依赖专家经验和人工操作的问题日益显著。自动化渗透测试旨在解决上述问题以达到更准确全面地发现系统安全漏洞的效果,而寻找最优渗透路径是自动化渗透测试中最重要的任务。然而,当前的主流研究试图在包含大量冗余路径的原始解空间中规划最优路径,导致问题的求解复杂度大幅提升;此外,当前研究对漏洞利用和正奖励获取动作的评估不够。通过剔除大量冗余渗透路径,并采取漏洞利用样本增强方法和正奖励样本增强方法,可以简化问题并优化训练过程。基于此,结合解空间转换和样本增强,提出了MASK-SALT-DQN算法,并定性和定量地分析了该方法对模型求解过程的影响,通过压缩比来衡量解空间转换给模型完成目标所带来的收益。实验表明,原始解空间中冗余解路径的比例始终保持在83%以上,证明了解空间转换的必要性。此外,在标准场景下,理论压缩比为57.2,实验压缩比与理论压缩比的误差仅为1.40%,且相比基线方法,MASK-SALT-DQN在所有实验场景下均有最优的表现,证明了其有效性和先进性。

关键词: 渗透路径规划, 强化学习, 解空间转换, 样本增强, 压缩比

Abstract: In the background of the widespread application of big data technology,the problems that traditional penetration testing overly relies on expert experience and manual operation have become more significant.Automated penetration testing aims to solve the above problems,so as to discover system security vulnerabilities more accurately and comprehensively.Finding the optimal penetration path is the most important task in automated penetration testing.However,current mainstream research suffers from the following problems:1)seeking the optimal path in the original solution space,which contains numberous redundant paths,significantly increases the complexity of problem-solving;2)evaluation of vulnerability exploitation and positive reward obtainment actions is not enough.The problem-solving can be optimized by eliminating a significant number of redundant penetration paths and employing exploit sample enhancement and positive reward sample enhancement methods.Therefore,this paper proposes the MASK-SALT-DQN algorithm by integrating solution space transformation and sample enhancement methods.It qualitatively and quantitatively analyzes the influence of the proposed algorithm on the model solving process,proposing the compression ratio to measure the benefits of solution space transformation.Experiments indicate that the proportion of redundant solution paths in the original solution space consistently remains over 83%,proving the necessity of solution space transformation.In addition,in standard experiment scenario,the theoretical compression ratio is 57.2,and the error between the experimental compression ratio and theoretical value is only 1.40%.Moreover,in comparison to baseline methods,MASK-SALT-DQN has the optimal performance in all experiment scenarios,which confirms its the effectiveness and superiority.

Key words: Penetration path planning, Reinforcement learning, Solution space transformation, Sample enhancement, Compression ratio

中图分类号: 

  • TP393
[1] CUI Y,ZHANG L J,WU H.Automatic Generation Method for Penetration Test Programs Based on attack graph[J].Journal of Computer Applications,2010,30(8):2146-2150.
[2] ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316.
[3] SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs makebetter hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012,26(1):1816-1824.
[4] SCHNEIER B.Attack trees[J].Dr.Dobb’s Journal,1999,24(12):21-29.
[5] PHILLIPS C,SWILER L P.A graph-based system for network-vulnerability analysis[C]//Proceedings of the 1998 Workshop on New Security Paradigms.1998:71-79.
[6] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018.
[7] WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8:279-292.
[8] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]//Neural Information Processing Systems Deep Learning Workshops.NIPS,2013.
[9] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[10] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937.
[11] HUANG S,ONTAÑÓN S.A closer look at invalid action mas-king in policy gradient algorithms[C]//Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference.FLAIRS,2022.
[12] YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191.
[13] JONATHON S,HANNA K.NASim:Network Attack Simulator[EB/OL].https://networkattacksimulator.readthedocs.io/.
[14] SCHWARTZ J,KURNIAWATI H,El-MAHASSNI E.Pomdp+information-decay:Incorporating defender′s behaviour in autonomous penetration testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243.
[15] SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[C]//Working Notes for the 2011 IJCAI Workshop on Intelligent Security(SecArt).2011.
[16] SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017.
[17] ZENNARO F M,ERDO″DI L.Modelling penetration testing with reinforcement learning using capture-the-flag challenges:Trade-offs between model-free learning and a priori knowledge[J].IET Information Security,2023,17(3):441-457.
[18] YOUSEFI M,MTETWA N,ZHANG Y,et al.A reinforcement learning approach for attack graph analysis[C]//2018 17th IEEE International Conference On Trust,Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering(TrustCom/BigDataSE).IEEE,2018:212-217.
[19] OU X,GOVINDAVAJHALA S,APPEL A W.MulVAL:A lo-gic-based network security analyzer[C]//USENIX Security Symposium.2005,8:113-128.
[20] HU Z,BEURAN R,TAN Y.Automated penetration testingusing deep reinforcement learning[C]//2020 IEEE European Symposium on Security and Privacy Workshops(EuroS&PW).IEEE,2020:2-10.
[21] ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288.
[22] ZHOU S,LIU J,HOU D,et al.Autonomous penetration testing based on improved deep q-network[J].Applied Sciences,2021,11(19):8823.
[23] NGUYEN H V,NGUYEN H N,UEHARA T.Multiple levelaction embedding for penetration testing[C]//Proceedings of the 4th International Conference on Future Networks and Distributed Systems.2020:1-9.
[24] SULTANA M,TAYLOR A,LI L.Autonomous network cyber offence strategy through deep reinforcement learning[C]//Artificial Intelligence and Machine Learning for Multi-Domain Ope-rations Applications III.SPIE,2021:490-502.
[25] LI Q,ZHANG M,SHEN Y,et al.A Hierarchical Deep Reinforcement Learning Model with Expert Prior Knowledge for Intelligent Penetration Testing[J].Computers & Security,2023,132:103358.
[26] BACKES M,HOFFMANN J,KÜNNEMANN R,et al.To-wards automated network mitigation analysis[C]//Proceedings of the 34th ACM/SIG APP Symposium on Applied Computing.2019:1971-1978.
[27] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]//International Conference on Learning Representations.ICLR,2016.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!