计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200165-6.doi: 10.11896/jsjkx.231200165
王紫阳, 王佳, 熊明亮, 王文涛
WANG Ziyang, WANG Jia, XIONG Mingliang, WANG Wentao
摘要: 渗透路径规划是渗透测试的首要步骤,对实现渗透测试的自动化有重大意义。现有渗透路径规划研究多将渗透测试建模为完全可观测的理想过程,难以准确反映部分可观测性的实际渗透测试过程。鉴于强化学习在渗透测试领域的广泛应用,将渗透测试过程建模为部分可观测的马尔可夫决策过程,从而更准确地模拟实际渗透测试过程。在此基础上,针对PPO算法使用全连接层拟合策略函数和价值函数无法提取部分可观测空间有效特征的问题,提出一种改进的PPO算法RPPO,其中策略网络和评估网络均融合全连接层和LSTM网络结构以提升其在未知环境提取特征的能力。同时,给出一种新的目标函数更新方法,以增强算法的鲁棒性和收敛性。实验结果表明,在不同网络场景中,相较于现有A2C,PPO和NDSPI-DQN算法,RPPO算法收敛轮次分别缩短了21.21%,28.64%,22.85%,获得累计奖励分别提升了66.01%,58.61%,132.64%,更适用于超过50台主机的较大规模网络环境。
中图分类号:
[1]ARKIN B,STENDER S,MCGRAW G.Software PenetrationTesting[J].IEEE Security & Privacy,2005,3(1):84-87. [2]SARRAUTE C,RICHARTE G,LUCÁNGELI OBES J.An Algorithm to Find Optimal Attack Paths in Nondeterministic Scenarios[C]//Proc.of the 4th ACM Workshop on Security and Artificial Intelligence.Chicago,US,2011:71-80. [3]SILVER D,HUANG A,MADDISON C J,et al.Mastering the Game of Go with Deep Neural Networks and Tree Search[J].Nature,2016,529(7587):484-489. [4]WARRINGTON A,LAVINGTON J W,SCIBIOR A,et al.Ro-bust Asymmetric Learning in Pomdps[C]// Proc.of the 38th International Conference on Machine Learning(PMLR).New York,US,2021:11013-11023. [5]VAN OTTERLO M,WIERING M.Reinforcement Learning and Markov Decision Processes[M]//Reinforcement learning:State-of-the-art.Berlin,Heidelberg:Springer,2012:3-42. [6]MCKINNEL D R,DARGAHI T,DEHGHANTANHA A,et al.A Systematic Literature Review and Meta-analysis on Artificial Intelligence in Penetration Testing and Vulnerability Assessment[J].Computers & Electrical Engineering,2019,75:175-188. [7]MAEDA R,MIMURA M.Automating Post-exploitation withDeep Reinforcement Learning[J].Computers & Security,2021,100:102-108. [8]LADOSZ P,BEN-IWHIWHU E,DICK J,et al.Deep reinforcement learning with modulated hebbian plus Q-network architecture[J].IEEE Transactions on Neural Networks and Learning Systems,2021,33(5):2045-2056. [9]SCHWARTZ J,KURNIAWATI H.Autonomous PenetrationTesting Using Reinforcement Learning[J].arXiv:1905.05965,2019. [10]ZENNARO F M,ERDÖDI L.Modelling Penetration Testingwith Reinforcement Learning Using Capture-the-flag Challenges:Trade-offs between Model-free Learning and A Priori knowledge[J].IET Information Security,2023,17(3):441-457. [11]ZHOU S,LIU J,HOU D,et al.Autonomous penetration testing based on improved deep q-network[J].Applied Sciences,2021,11(19):8823. [12]ZHANG G M,ZHANG S Y,ZHANG J W.Attack Path Disco-very and Optimization Method Based on PPO Algorithm[J].Information Network Security,2023,23(9):47-57. [13]CHEN J,HU S,ZHENG H,et al.GAIL-PT:An IntelligentPenetration Testing Framework with Generative Adversarial Imitation Learning[J].Computers & Security,2023,126:103055. [14]ZHOU T,ZANG Y,ZHU J,et al.NIG-AP:A New Method For Automated Penetration Testing[J].Frontiers of Information Technology & Electronic Engineering,2019,20(9):1277-1288. [15]CODY T.A Layered Reference Model for Penetration Testing with Reinforcement Learning and Attack Graphs[C]//Proc.of 2022 IEEE 29th Annual Software Technology Conference(STC).Gaithersburg,MD,USA,IEEE,2022:41-50. [16]NGUYEN H V,TEERAKANOK S,INOMATA A,et al.The Proposal of Double Agent Architecture using Actor-critic Algorithm for Penetration Testing[C]//ICISSP.2021:440-449. [17]SARRAUTE C,BUFFET O,HOFFMANN J.POMDPs Make Better Hackers:Accounting for Uncertainty in Penetration Testing[C]//Proc.of the 26th AAAI Conference on Artificial Intelligence.Toronto,Ontario,Canada,2012:1816-1824. [18]ZHANG Y,LIU J,ZHOU S,et al.Improved Deep Recurrent Q-Network of POMDPs for Automated Penetration Testing[J].Applied Sciences,2022,12(20):10339. [19]GHANEM M C,CHEN T M.Reinforcement Learning for Efficient Network Penetration Testing[J].Information,2019,11(1):6. [20]GHANEM M C,CHEN T M,NEPOMUCENO E G.Hierarchical Reinforcement Learning for Efficient and Effective Automated Penetration Testing of Large Networks[J].Journal of Intelligent Information Systems,2023,60(2):281-303. [21]KORONIOTIS N,MOUSTAFA N,TURNBULL B,et al.Adeep learning-based penetration testing framework for vulnerability identification in internet of things environments[C]//2021 IEEE 20th International Conference on Trust,Security and Privacy in Computing and Communications(TrustCom).IEEE,2021:887-894. [22]SCHWARTZ J,KURNIAWATI H,EL-MAHASSNI E.Pomdp+information-decay:Incorporating defender's behaviour in autonomous penetration testing[C]//Proceedings of the International Conference on Automated Planning and Scheduling.2020,30:235-243. [23]ZHOU S,LIU J,HOU D,et al.Autonomous Penetration Tes-ting Based on Improved Deep Q-network[J].Applied Sciences,2021,11(19):8823. |
|