计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 360-367.doi: 10.11896/jsjkx.221200104
王焱, 王天荆, 沈航, 白光伟
WANG Yan, WANG Tianjing, SHEN Hang, BAI Guangwei
摘要: 从攻击者角度分析入侵意图和渗透行为对于指导网络安全防御具有重要意义。然而,现有的渗透路径大多依据瞬时的网络环境构建,导致路径参考价值降低。针对该问题,文中提出了一种基于最大熵强化学习的最优渗透路径生成方法,该方法可以在网络环境动态变化的情况下,以探索的形式捕获多种模式的近似最优行为。首先,依据攻击图和漏洞评分对渗透过程进行建模,通过量化攻击获益来刻画渗透行为的威胁程度;然后,考虑到入侵行为的复杂性,开发基于最大熵模型的Soft Q-学习方法,通过控制熵值和奖励的重要程度来保证求解渗透路径的过程具有稳定性;最后将该方法应用于动态变化的测试环境中,生成高可用的渗透路径。仿真实验结果表明,相比于现有基于强化学习的基准方法,所提方法具有更强的环境适应性,能够以更低的代价生成更高收益的渗透路径。
中图分类号:
[1]HOU J,JIA X.Research on enterprise network security system[C]//2021 2nd International Conference on Computer Science and Management Technology(ICCSMT).IEEE,2021:216-219. [2]HU H,LIU Y,ZHANG H.Route prediction method for network intrusion using absorbing Markov chain[J].Journal of Computer Research and Development,2018,55(4):831-845. [3]MA Y,WU Y,YU D,et al.Vulnerability association evaluation of Internet of thing devices based on attack graph [J].International Journal of Distributed Sensor Networks,2022,18(5):1-10. [4]HOU S,CHEN X,MA J,et al.An ontology-based dynamic attack graph generation approach for the internet of vehicles [J].Frontiers in Energy Research,2022,10:1-12. [5]SCHIELE N D,GADYATSKAYA O.A novel approach for attack tree to attack graph transformation[C]//International Conference on Risks and Security of Internet and Systems.Cham:Springer,2022:74-90. [6]KAYNAR K.A taxonomy for attack graph generation andusage in network security[J].Journal of Information Security and Applications,2016,29:27-56. [7]SUN F,PI J,LV J,et al.Network security risk assessment system based on attack graph and Markov chain [C]//Journal of Physics:Conference Series,The 2017 International Conference on Cloud Technology and Communication Engineering(CTCE2017).Guilin,China,2017:1-10. [8]WANG S,WANG J H,TANG G M,et al.Intelligent and efficient method for optimal penetration path generation[J].Journal of Computer Research and Development,2019,56(5):929-941. [9]WANG S,TANG G,KOU G.Attack path prediction method based on causal knowledge net[J].Journal on Communications,2016,37(10):188-198. [10]AL-ARAJI Z,SYED A S S,ABDULLAH R S.Attack prediction to enhance attack path discovery using improved attack graph[J].Karbala International Journal of Modern Science,2022,8(3):313-329. [11]STERGIOPOULOS G,DEDOUSIS P,GRITZALIS D.Auto-matic analysis of attack graphs for risk mitigation and prioritization on large-scale and complex networks in Industry 4.0[J].International Journal of Information Security,2022,21(1):37-59. [12]BOUDERMINE A,KHATOUN R,CHOYER J H.Attackgraph-based solution for vulnerabilities impact assessment in dynamic environment[C]//2022 5th Conference on Cloud and Internet of Things(CIoT).IEEE,2022:24-31. [13]ZHOU S C,LIU J J,ZHONG X F,et al.Intelligent penetration testing path discovery based on deep reinforcement learning[J].Computer Science,2021,48(7):40-46. [14]WANG B,LIU Z,LI Q,et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Roboticsand Automation Letters,2020,5(4):6932-6939. [15]KOO K,MOON D,HUH J H,et al.Attack graph generationwith machine learning for network security [J].Electronics,2022,11(9):1-25. [16]CODY T,RAHMAN A,REDINO C,et al.Discovering exfiltration paths using reinforcement learning with attack graphs[J].arXiv:2201.12416,2022. [17]ZHANG L,BAI W,LI W,et al.Discover the hidden attack path in multi-domain cyberspace based on reinforcement learning[J].arXiv:2104.07195,2021. [18]HOUMB S H,FRANQUEIRA V N,ENGUM E A.Quantifying security risk level from CVSS estimates of frequency and impact[J].The Journal of Systems & Software,2009,83(9):1622-1634. [19]ATEFEH K,MOHAMMAD G,VALI D.An automatic method for CVSS score prediction using vulnerabilities description[J].Journal of Intelligent & Fuzzy Systems,2015,30(1):89-96. [20]CHEN C L,CHEN J M.Use of markov chain for early detecting DDOS attacks[J].International Journal of Network Security & Its Applications(IJNSA),2021,13(4):1-11. [21]SHARMA K,SINGH B,HERMAN E,et al.Maximum information measure policies in reinforcement learning with deep energy-based model[C]//2021 International Conference on Computational Intelligence and Knowledge Economy(ICCIKE).IEEE,2021:19-24. [22]ZHANG T,LI Y,WANG C,et al.Fop:Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2021:12491-12500. [23]SCHULMAN J,CHEN X,ABBEEL P.Equivalence betweenpolicy gradients and soft q-learning[J].arXiv:1704.06440,2017. [24]GRAU-MOYA J,LEIBFRIED F,BOU-AMMAR H.Balancing two-player stochastic games with soft q-learning[J].arXiv:1802.03216,2018. [25]GARG D,CHAKRABORTY S,CUNDY C,et al.IQ-Learn:Inverse soft-Q Learning for Imitation[J].Advances in Neural Information Processing Systems,2021,34:4028-4039. |
|