Computer Science ›› 2024, Vol. 51 ›› Issue (3): 360-367.doi: 10.11896/jsjkx.221200104

• Information Security • Previous Articles     Next Articles

Optimal Penetration Path Generation Based on Maximum Entropy Reinforcement Learning

WANG Yan, WANG Tianjing, SHEN Hang, BAI Guangwei   

  1. College of Computer Science and Technology,Nanjing Tech University,Nanjing 211816,China
  • Received:2022-12-15 Revised:2023-05-22 Online:2024-03-15 Published:2024-03-13
  • About author:WANG Yan,born in 1999,postgra-duate.His main research interest is network security based on AI.SHEN Hang,born in 1984,Ph.D,asso-ciate professor,is a senior member of CCF(No.19088S).His main research interests include network security and privacy computing.
  • Supported by:
    National Natural Science Foundation of China(61502230,61501224),Natural Science Foundation of Jiangsu province,China(BK20201357) and Six Talent Peaks Project in Jiangsu Province(RJFW-020).

Abstract: Analyzing intrusion intentions and penetration behaviors from the attackers' perspective is of great significance for guiding network security defense.However,most existing penetration paths are constructed based on the instantaneous network environment,resulting in reduced reference value.Aiming at this problem,this paper proposes an optimal penetration path generation method based on maximum entropy reinforcement learning,which can capture the approximate optimal behavior of multiple modes in the form of exploration under dynamic network environments.Firstly,the penetration process is modeled according to the attack graph and the vulnerability score,and the threat degree of the penetration behavior is described by quantifying the attack benefits.Then,considering the complexity of the intrusion behavior,a soft Q-learning method based on the maximum entropy model is developed.The stability of the penetration path is ensured by controlling the entropy value and the importance of the reward.Finally,the method is applied to a dynamic environment to generate a highly available penetration path.Simulation experimental results show that,compared with the existing baseline methods based on reinforcement learning,the proposed method has more robust environmental adaptability and can generate higher-yielding penetration paths at a lower cost.

Key words: Maximum entropy reinforcement learning, Attack graph, Soft Q-learning, Penetration path

CLC Number: 

  • TP393
[1]HOU J,JIA X.Research on enterprise network security system[C]//2021 2nd International Conference on Computer Science and Management Technology(ICCSMT).IEEE,2021:216-219.
[2]HU H,LIU Y,ZHANG H.Route prediction method for network intrusion using absorbing Markov chain[J].Journal of Computer Research and Development,2018,55(4):831-845.
[3]MA Y,WU Y,YU D,et al.Vulnerability association evaluation of Internet of thing devices based on attack graph [J].International Journal of Distributed Sensor Networks,2022,18(5):1-10.
[4]HOU S,CHEN X,MA J,et al.An ontology-based dynamic attack graph generation approach for the internet of vehicles [J].Frontiers in Energy Research,2022,10:1-12.
[5]SCHIELE N D,GADYATSKAYA O.A novel approach for attack tree to attack graph transformation[C]//International Conference on Risks and Security of Internet and Systems.Cham:Springer,2022:74-90.
[6]KAYNAR K.A taxonomy for attack graph generation andusage in network security[J].Journal of Information Security and Applications,2016,29:27-56.
[7]SUN F,PI J,LV J,et al.Network security risk assessment system based on attack graph and Markov chain [C]//Journal of Physics:Conference Series,The 2017 International Conference on Cloud Technology and Communication Engineering(CTCE2017).Guilin,China,2017:1-10.
[8]WANG S,WANG J H,TANG G M,et al.Intelligent and efficient method for optimal penetration path generation[J].Journal of Computer Research and Development,2019,56(5):929-941.
[9]WANG S,TANG G,KOU G.Attack path prediction method based on causal knowledge net[J].Journal on Communications,2016,37(10):188-198.
[10]AL-ARAJI Z,SYED A S S,ABDULLAH R S.Attack prediction to enhance attack path discovery using improved attack graph[J].Karbala International Journal of Modern Science,2022,8(3):313-329.
[11]STERGIOPOULOS G,DEDOUSIS P,GRITZALIS D.Auto-matic analysis of attack graphs for risk mitigation and prioritization on large-scale and complex networks in Industry 4.0[J].International Journal of Information Security,2022,21(1):37-59.
[12]BOUDERMINE A,KHATOUN R,CHOYER J H.Attackgraph-based solution for vulnerabilities impact assessment in dynamic environment[C]//2022 5th Conference on Cloud and Internet of Things(CIoT).IEEE,2022:24-31.
[13]ZHOU S C,LIU J J,ZHONG X F,et al.Intelligent penetration testing path discovery based on deep reinforcement learning[J].Computer Science,2021,48(7):40-46.
[14]WANG B,LIU Z,LI Q,et al.Mobile robot path planning in dynamic environments through globally guided reinforcement learning[J].IEEE Roboticsand Automation Letters,2020,5(4):6932-6939.
[15]KOO K,MOON D,HUH J H,et al.Attack graph generationwith machine learning for network security [J].Electronics,2022,11(9):1-25.
[16]CODY T,RAHMAN A,REDINO C,et al.Discovering exfiltration paths using reinforcement learning with attack graphs[J].arXiv:2201.12416,2022.
[17]ZHANG L,BAI W,LI W,et al.Discover the hidden attack path in multi-domain cyberspace based on reinforcement learning[J].arXiv:2104.07195,2021.
[18]HOUMB S H,FRANQUEIRA V N,ENGUM E A.Quantifying security risk level from CVSS estimates of frequency and impact[J].The Journal of Systems & Software,2009,83(9):1622-1634.
[19]ATEFEH K,MOHAMMAD G,VALI D.An automatic method for CVSS score prediction using vulnerabilities description[J].Journal of Intelligent & Fuzzy Systems,2015,30(1):89-96.
[20]CHEN C L,CHEN J M.Use of markov chain for early detecting DDOS attacks[J].International Journal of Network Security & Its Applications(IJNSA),2021,13(4):1-11.
[21]SHARMA K,SINGH B,HERMAN E,et al.Maximum information measure policies in reinforcement learning with deep energy-based model[C]//2021 International Conference on Computational Intelligence and Knowledge Economy(ICCIKE).IEEE,2021:19-24.
[22]ZHANG T,LI Y,WANG C,et al.Fop:Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2021:12491-12500.
[23]SCHULMAN J,CHEN X,ABBEEL P.Equivalence betweenpolicy gradients and soft q-learning[J].arXiv:1704.06440,2017.
[24]GRAU-MOYA J,LEIBFRIED F,BOU-AMMAR H.Balancing two-player stochastic games with soft q-learning[J].arXiv:1802.03216,2018.
[25]GARG D,CHAKRABORTY S,CUNDY C,et al.IQ-Learn:Inverse soft-Q Learning for Imitation[J].Advances in Neural Information Processing Systems,2021,34:4028-4039.
[1] ZENG Kunlun, ZHANG Ni, LI Weihao, QIN Yuanyuan. Network Asset Security Assessment Model Based on Bayesian Attack Graph [J]. Computer Science, 2023, 50(12): 349-358.
[2] LI Jia-rui, LING Xiao-bo, LI Chen-xi, LI Zi-mu, YANG Jia-hai, ZHANG Lei, WU Cheng-nan. Dynamic Network Security Analysis Based on Bayesian Attack Graphs [J]. Computer Science, 2022, 49(3): 62-69.
[3] YANG Ping, SHU Hui, KANG Fei, BU Wen-juan, HUANG Yu-yao. Generating Malicious Code Attack Graph Using Semantic Analysis [J]. Computer Science, 2021, 48(6A): 448-458.
[4] ZHANG Kai, LIU Jing-ju. Attack Path Analysis Method Based on Absorbing Markov Chain [J]. Computer Science, 2021, 48(5): 294-300.
[5] YE Zi-wei, GUO Yuan-bo, LI Tao, JU An-kang. Extended Attack Graph Generation Method Based on Knowledge Graph [J]. Computer Science, 2019, 46(12): 165-173.
[6] XU Bing-feng, HE Gao-feng. Penetration Testing Method for Cyber-Physical System Based on Attack Graph [J]. Computer Science, 2018, 45(11): 143-148.
[7] ZENG Sai-wen, WEN Zhong-hua, DAI Liang-wei and YUAN Run. Analysis of Network Security Based on Uncertain Attack Graph Path [J]. Computer Science, 2017, 44(Z6): 351-355.
[8] ZHANG Jian,WANG Jin-dong,ZHANG Heng-wei and WANG Na. Network Risk Analysis Method Based on Node-Game Vulnerability Attack Graph [J]. Computer Science, 2014, 41(9): 169-173.
[9] JIAO Jian and CHEN Xin. Analysis for Network Security by Stochastic Petri-net [J]. Computer Science, 2014, 41(7): 119-121.
[10] LIAN Li-quan,PENG Wu and WANG Dong-hai. Method of Network Security Dynamic Assessment Based on Attack-defense Confrontation [J]. Computer Science, 2013, 40(Z11): 214-218.
[11] MA Yan-tu and WANG Lian-guo. Attack Graph Construction Method Based on Intelligent State Transition and Permission Improvement [J]. Computer Science, 2013, 40(9): 156-158.
[12] LI Qing-peng,WANG Bu-hong,WANG Xiao-dong and ZHANG Chun-ming. Approach on Network Security Enhancement Strategies Based on Optimal Attack Path [J]. Computer Science, 2013, 40(4): 152-154.
[13] . Multi-Agents Network Security Risk Evaluation Model Based on Attack Graph [J]. Computer Science, 2013, 40(2): 148-152.
[14] . Distributed Network Risk Assessment Method Based on Attack Graph [J]. Computer Science, 2013, 40(2): 139-144.
[15] . Real-time Network Security Assessment Based on Dynamic Attack Graph [J]. Computer Science, 2013, 40(2): 133-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!