计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 329-339.doi: 10.11896/jsjkx.231000207
李成恩1, 朱东君1, 贺杰彦1, 韩兰胜1,2
LI Cheng’en1, ZHU Dongjun1, HE Jieyan1, HAN Lansheng1,2
摘要: 在大数据技术广泛应用的背景下,传统渗透测试过于依赖专家经验和人工操作的问题日益显著。自动化渗透测试旨在解决上述问题以达到更准确全面地发现系统安全漏洞的效果,而寻找最优渗透路径是自动化渗透测试中最重要的任务。然而,当前的主流研究试图在包含大量冗余路径的原始解空间中规划最优路径,导致问题的求解复杂度大幅提升;此外,当前研究对漏洞利用和正奖励获取动作的评估不够。通过剔除大量冗余渗透路径,并采取漏洞利用样本增强方法和正奖励样本增强方法,可以简化问题并优化训练过程。基于此,结合解空间转换和样本增强,提出了MASK-SALT-DQN算法,并定性和定量地分析了该方法对模型求解过程的影响,通过压缩比来衡量解空间转换给模型完成目标所带来的收益。实验表明,原始解空间中冗余解路径的比例始终保持在83%以上,证明了解空间转换的必要性。此外,在标准场景下,理论压缩比为57.2,实验压缩比与理论压缩比的误差仅为1.40%,且相比基线方法,MASK-SALT-DQN在所有实验场景下均有最优的表现,证明了其有效性和先进性。
中图分类号:
[1] CUI Y,ZHANG L J,WU H.Automatic Generation Method for Penetration Test Programs Based on attack graph[J].Journal of Computer Applications,2010,30(8):2146-2150. [2] ZENG Q W,ZHANG G M,XING C Y,et al.Intelligent Attack Path Discovery Based on Hierarchical Reinforcement Learning[J].Computer Science,2023,50(7):308-316. [3] SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs makebetter hackers:Accounting for uncertainty in penetration testing[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2012,26(1):1816-1824. [4] SCHNEIER B.Attack trees[J].Dr.Dobb’s Journal,1999,24(12):21-29. [5] PHILLIPS C,SWILER L P.A graph-based system for network-vulnerability analysis[C]//Proceedings of the 1998 Workshop on New Security Paradigms.1998:71-79. [6] SUTTON R S,BARTO A G.Reinforcement learning:An introduction[M].MIT press,2018. [7] WATKINS C J C H,DAYAN P.Q-learning[J].Machine Lear-ning,1992,8:279-292. [8] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[C]//Neural Information Processing Systems Deep Learning Workshops.NIPS,2013. [9] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [10] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1928-1937. [11] HUANG S,ONTAÑÓN S.A closer look at invalid action mas-king in policy gradient algorithms[C]//Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference.FLAIRS,2022. [12] YANG W Y,BAI C J,CAI C,et al.Survey on Sparse Reward in Deep Reinforcement Learning[J].Computer Science,2020,47(3):182-191. [13] JONATHON S,HANNA K.NASim:Network Attack Simulator[EB/OL].https://networkattacksimulator.readthedocs.io/. [14] SCHWARTZ J,KURNIAWATI H,El-MAHASSNI E.Pomdp+information-decay:Incorporating defender′s behaviour in autonomous penetration testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020:235-243. [15] SARRAUTE C,BUFFET O,HOFFMANN J.Penetration testing==POMDP solving?[C]//Working Notes for the 2011 IJCAI Workshop on Intelligent Security(SecArt).2011. [16] SHMARYAHU D,SHANI G,HOFFMANN J,et al.Partially observable contingent planning for penetration testing[C]//Iwaise:First International Workshop on Artificial Intelligence in Security.2017. [17] ZENNARO F M,ERDO″DI L.Modelling penetration testing with reinforcement learning using capture-the-flag challenges:Trade-offs between model-free learning and a priori knowledge[J].IET Information Security,2023,17(3):441-457. [18] YOUSEFI M,MTETWA N,ZHANG Y,et al.A reinforcement learning approach for attack graph analysis[C]//2018 17th IEEE International Conference On Trust,Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering(TrustCom/BigDataSE).IEEE,2018:212-217. [19] OU X,GOVINDAVAJHALA S,APPEL A W.MulVAL:A lo-gic-based network security analyzer[C]//USENIX Security Symposium.2005,8:113-128. [20] HU Z,BEURAN R,TAN Y.Automated penetration testingusing deep reinforcement learning[C]//2020 IEEE European Symposium on Security and Privacy Workshops(EuroS&PW).IEEE,2020:2-10. [21] ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:a new method for automated penetration testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288. [22] ZHOU S,LIU J,HOU D,et al.Autonomous penetration testing based on improved deep q-network[J].Applied Sciences,2021,11(19):8823. [23] NGUYEN H V,NGUYEN H N,UEHARA T.Multiple levelaction embedding for penetration testing[C]//Proceedings of the 4th International Conference on Future Networks and Distributed Systems.2020:1-9. [24] SULTANA M,TAYLOR A,LI L.Autonomous network cyber offence strategy through deep reinforcement learning[C]//Artificial Intelligence and Machine Learning for Multi-Domain Ope-rations Applications III.SPIE,2021:490-502. [25] LI Q,ZHANG M,SHEN Y,et al.A Hierarchical Deep Reinforcement Learning Model with Expert Prior Knowledge for Intelligent Penetration Testing[J].Computers & Security,2023,132:103358. [26] BACKES M,HOFFMANN J,KÜNNEMANN R,et al.To-wards automated network mitigation analysis[C]//Proceedings of the 34th ACM/SIG APP Symposium on Applied Computing.2019:1971-1978. [27] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]//International Conference on Learning Representations.ICLR,2016. |
|