计算机科学 ›› 2023, Vol. 50 ›› Issue (4): 359-368.doi: 10.11896/jsjkx.220300040

• 信息安全 • 上一篇    下一篇

基于异构溯源图学习的APT攻击检测方法

董程昱, 吕明琪, 陈铁明, 朱添田   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2022-03-04 修回日期:2022-09-28 出版日期:2023-04-15 发布日期:2023-04-06
  • 通讯作者: 吕明琪(lvmingqi@zjut.edu.cn)
  • 作者简介:(dcyzjut@foxmail.com)
  • 基金资助:
    国家自然科学基金联合重点项目(U1936215);浙江省重点研发项目(2021C01117);国家自然科学基金青年项目(62002324);浙江省自然科学基金重大项目(LD22F020002);浙江省自然科学基金探索项目(LQ21F020016);浙江省“万人计划”科技创新领军人才项目(2020R52011)

Heterogeneous Provenance Graph Learning Model Based APT Detection

DONG Chengyu, LYU Mingqi, CHEN Tieming, ZHU Tiantian   

  1. College of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2022-03-04 Revised:2022-09-28 Online:2023-04-15 Published:2023-04-06
  • About author:DONG Chengyu,born in 1996,postgra-duate.His main research interests include data mining and graph neural networks.
    LYU Mingqi,born in 1982,Ph.D,associated professor,is a member of China Computer Federation.His main research interests include data mining and ubiquitous computing.
  • Supported by:
    Joint Funds of the National Natural Science Foundation of China(U1936215),Zhejiang Key R&D Projects (2021C01117),National Natural Science Foundation of China(62002324),Major Program of Natural Science Foundation of Zhejiang Province(LD22F020002),Zhejiang Provincial Natural Science Foundation of China(LQ21F020016) and “Ten Thousand People Program” Technology Innovation Leading Talent Project in Zhejiang Province(2020R52011).

摘要: APT攻击(Advanced Persistent Threat),指黑客组织对目标信息系统进行的高级持续性的网络攻击。APT攻击的主要特点是持续时间长和综合运用多种攻击技术,这使得传统的入侵检测方法难以有效地对其进行检测。现有大多数APT攻击检测系统都是在整理各类领域知识(如ATT&CK网络攻防知识库)的基础上通过手动设计检测规则来实现的。然而,这种方式智能化水平低,扩展性差,且难以检测未知APT攻击。为此,通过操作系统内核日志来监测系统行为,在此基础上提出了一种基于图神经网络技术的智能APT攻击检测方法。首先,为捕捉APT攻击多样化攻击技术中的上下文关联,将操作系统内核日志中包含的系统实体(如进程、文件、套接字)及其关系建模成一个溯源图(Provenance Graph),并采用异构图学习算法将每个系统实体表征成一个语义向量。然后,为解决APT攻击长期行为造成的图规模爆炸问题,提出了一种从大规模异构图中进行子图采样的方法,在此基础上基于图卷积算法对其中的关键系统实体进行分类。最后,基于两个真实的APT攻击数据集进行了一系列的实验。实验结果表明,提出的APT攻击检测方法的综合性能优于其他基于学习的检测模型以及最先进的基于规则的APT攻击检测系统。

关键词: APT攻击检测, 图神经网络, 溯源图, 主机安全, 数据驱动安全

Abstract: APT(advanced persistent threat)are advanced persistent cyber-attack by hacker organizations to breach the target information system.Usually,the APTs are characterized by long duration and multiple attack techniques,making the traditional intrusion detection methods ineffective.Most existing APT detection systems are implemented based on manually designed rules by referring to domain knowledge(e.g.,ATT&CK).However,this way lacks of intelligence,generalization ability,and is difficult to detect unknown APT attacks.Aiming at this limitation,this paper proposes an intelligent APT detection method based on provenance data and graph neural networks.To capture the rich context information in the diversified attack techniques of APTs,it firstly models the system entities(e.g.,process,file,socket)in the provenance data into a provenance graph,and learns a semantic vector representation for each system entity by heterogeneous graph learning model.Then,to solve the problem of graph scale explosion caused by the long-term behaviors of APTs,APT detection is performed by sampling a local graph from the large scale heterogeneous graph,and classifying the key system entities as malicious or benign by graph convolution networks.A series of experiments are conducted on two datasets with real APT attacks.Experiment results show that the comprehensive performance of the proposed method outperforms other learning based detection models,as well as the state-of-the-art rule based APT detection systems.

Key words: APT detection, Graph neural network, Provenance graph, Hosted-based security, Data-driven security

中图分类号: 

  • TP393
[1]GHAFIR I,HAMMOUDEH M,PRENOSIL V,et al.Detection of Advanced Persistent Threat Using Machine-Learning Correlation Analysis[J].Future Generation Computer Systems,2018,89(DEC.):349-359.
[2]BRIDGES R A,GLASS-VANDERLAN T R,IANNACONE M D,et al.A survey of intrusion detection systems leveraging host data[J].ACM Computing Surveys(CSUR),2019,52(6):1-35.
[3]SINGLA A,BERTINO E,VERMA D.Preparing Network In-trusion Detection Deep Learning Models with Minimal Data Using Adversarial Domain Adaptation[C]//ASIA CCS’20:The 15th ACM Asia Conference on Computer and Communications Security.ACM,2020.
[4]HAN X,PASQUIER T,SELTZER M.Provenance-based intrusion detection:opportunities and challenges[C]//10th USENIX Workshop on the Theory and Practice of Provenance(TaPP 2018).2018.
[5]JENKINSON G,CARATA L,BYTHEWAY T,et al.Applying Provenance in APT Monitoring and Analysis:Practical Challenges for Scalable,Efficient and Trustworthy Distributed Provenance[C]//9th USENIX Workshop on the Theory and Practice of Provenance(TaPP 2017).2017.
[6]HAN X,PASQUIER T,BATES A,et al.Unicorn:Runtimeprovenance-based detector for advanced persistent threats[C]//Network and Distributed System Security Symposium.2020.
[7]HOSSAIN M N,MILAJERDI S M,WANG J,et al.SLEUTH:Real-time attack scenario reconstruction from COTS audit data[C]//26th USENIX Security Symposium(USENIX Security 17).2017:487-504.
[8]MILAJERDI S M,GJOMEMO R,ESHETE B,et al.Holmes:real-time apt detection through correlation of suspicious information flows[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:1137-1152.
[9]XIONG C,ZHU T,DONG W,et al.Conan:A Practical Real-Time APT Detection System With High Accuracy and Efficiency[J].IEEE Transactions on Dependable and Secure Computing,2022,19(1):551-565.
[10]YADAV T,RAO A M.Technical aspects of cyber kill chain[C]//International Symposium on Security in Computing and Communication.Cham:Springer,2015:438-452.
[11]MITRE ATT&CK[OL].https://attack.mitre.org/.
[12]YE Y,LI T,ADJEROH D,et al.A survey on malware detection using data mining techniques[J].ACM Computing Surveys(CSUR),2017,50(3):1-40.
[13]ZHANG X,ZHANG Y,ZHONG M,et al.Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware[C]//Proceedings of the 2020 ACM SIGSAC Confe-rence on Computer and Communications Security.2020:757-770.
[14]WENG H,LI Z,JI S,et al.Online e-commerce fraud:a large-scale detection and analysis[C]//2018 IEEE 34th International Conference on Data Engineering(ICDE).IEEE,2018:1435-1440.
[15]BRANCO B,ABREU P,GOMES A S,et al.Interleaved se-quence RNNs for fraud detection[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Disco-very & Data Mining.2020:3101-3109.
[16]WU Z,PAN S,CHEN F,et al.A comprehensive survey ongraph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(1):4-24.
[17]SUN Y,HAN J.Mining heterogeneous information networks:principles and methodologies[J].Synthesis Lectures on Data Mining and Knowledge Discovery,2012,3(2):1-159.
[18]DONG Y,CHAWLA N V,SWAMI A.metapath2vec:Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:135-144.
[19]LIU Y,ZHANG M,LI D,et al.Towards a Timely CausalityAnalysis for Enterprise Security[C]//NDSS.2018.
[20]BARRE M,GEHANI A,YEGNESWARAN V.Mining dataprovenance to detect advanced persistent threats[C]//11th International Workshop on Theory and Practice of Provenance(TaPP 2019).2019.
[21]BERRADA G,CHENEY J,BENABDERRAHMANE S,et al.A baseline for unsupervised advanced persistent threat detection in system-level provenance[J].Future Generation Computer Systems,2020,108:401-413.
[22]XIANG Z,GUO D,LI Q.Detecting mobile advanced persistent threats based on large-scale DNS logs[J].Computers & Security,2020,96:101933.
[23]ZIMBA A,CHEN H,WANG Z,et al.Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex networks characteristics[J].Future Generation Computer Systems,2020,106:501-517.
[24]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[25]DU M,LI F,ZHENG G,et al.Deeplog:Anomaly detection and diagnosis from system logs through deep learning[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:1285-1298.
[26]SHEN Y,MARICONTI E,VERVIER P A,et al.Tiresias:Predicting security events through deep learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.2018:592-605.
[27]EKE H N,PETROVSKI A,AHRIZ H.The use of machine learning algorithms for detecting advanced persistent threats[C]//Proceedings of the 12th International Conference on Security of Information and Networks.2019:1-8.
[28]LIU F,WEN Y,ZHANG D,et al.Log2vec:A heterogeneous graph embedding based approach for detecting cyber threats within enterprise[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security.2019:1777-1794.
[29]Linux Auditd[OL].https://linoxide.com/auditd-tool-security-auditing/.
[30]Windows ETW[OL].https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/event-tracing-for-windows--etw-.
[31]WANG J,HUANG P,ZHAO H,et al.Billion-scale commodity embedding for e-commerce recommendation in alibaba[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:839-848.
[32]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2014:701-710.
[33]GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.2016:855-864.
[34]TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.2015:1067-1077.
[35]WANG X,JI H,SHI C,et al.Heterogeneous graph attention network[C]//The World Wide Web Conference.2019:2022-2032.
[36]ZHANG D,YIN J,ZHU X,et al.Metagraph2vec:Complex semantic path augmented heterogeneous network embedding[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2018:196-208.
[37]SCHLICHTKRULL M,KIPF T N,BLOEM P,et al.Modeling relational data with graph convolutional networks[C]//Euro-pean Semantic Web Conference.Cham:Springer,2018:593-607.
[38]GEHANI A,TARIQ D.SPADE:Support for provenance audi-ting in distributed environments[C]//ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing.Berlin:Springer,2012:101-120.
[39]VELIČKOVIĆ P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//International Conference on Learning Representations.2018.
[40]WANG S,WANG Z,ZHOU T,et al.threaTrace:Detecting and Tracing Host-based Threats in Node Level Through Provenance Graph Learning[J].arXiv:2111.04333,2021.
[1] 邵云飞, 宋友, 王宝会.
基于社交网络图节点度的神经网络个性化传播算法研究
Study on Degree of Node Based Personalized Propagation of Neural Predictions forSocial Networks
计算机科学, 2023, 50(4): 16-21. https://doi.org/10.11896/jsjkx.220300274
[2] 陈富强, 寇嘉敏, 苏利敏, 李克.
基于图神经网络的多信息优化实体对齐模型
Multi-information Optimized Entity Alignment Model Based on Graph Neural Network
计算机科学, 2023, 50(3): 34-41. https://doi.org/10.11896/jsjkx.220700242
[3] 于健, 赵满坤, 高洁, 王聪源, 李亚蓉, 张文彬.
基于高阶和时序特征的图神经网络社会化推荐算法研究
Study on Graph Neural Networks Social Recommendation Based on High-order and Temporal Features
计算机科学, 2023, 50(3): 49-64. https://doi.org/10.11896/jsjkx.220700108
[4] 章琪, 于双元, 尹鸿峰, 徐保民.
基于图注意力的神经协同过滤社会推荐算法
Neural Collaborative Filtering for Social Recommendation Algorithm Based on Graph Attention
计算机科学, 2023, 50(2): 115-122. https://doi.org/10.11896/jsjkx.211200019
[5] 郝敬宇, 文静轩, 刘华锋, 景丽萍, 于剑.
结合全局信息的深度图解耦协同过滤
Deep Disentangled Collaborative Filtering with Graph Global Information
计算机科学, 2023, 50(1): 41-51. https://doi.org/10.11896/jsjkx.220900255
[6] 顾希之, 邵蓥侠.
基于影响力剪枝的图神经网络快速计算图精简
Fast Computation Graph Simplification via Influence-based Pruning for Graph Neural Network
计算机科学, 2023, 50(1): 52-58. https://doi.org/10.11896/jsjkx.220900032
[7] 蒲金垚, 卜令梅, 卢永美, 叶子铭, 陈黎, 于中华.
利用异构图神经网络实现情绪-原因对的有效抽取
Utilizing Heterogeneous Graph Neural Network to Extract Emotion-Cause Pairs Effectively
计算机科学, 2023, 50(1): 205-212. https://doi.org/10.11896/jsjkx.211100265
[8] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[9] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[10] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[11] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[12] 熊中敏, 舒贵文, 郭怀宇.
融合用户偏好的图神经网络推荐模型
Graph Neural Network Recommendation Model Integrating User Preferences
计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[13] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[14] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[15] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!