计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 344-352.doi: 10.11896/jsjkx.240400029

• 信息安全 • 上一篇    下一篇

Augmenter:基于数据源图的事件级别入侵检测

孙鸿斌1,2, 王苏3, 王之梁1,3, 蒋哲宇1, 杨家海1,3, 张辉1,2   

  1. 1 清华大学网络研究院 北京 100084
    2 泉城实验室 济南 250000
    3 中关村实验室 北京 100094
  • 收稿日期:2024-04-03 修回日期:2024-08-30 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 张辉(hzhang@cernet.edu.cn)
  • 作者简介:(sunhb21@mails.tsinghua.edu.cn)
  • 基金资助:
    泉城实验室重点项目(QCLZD202304-2);山东省实验室项目(SYS202201)

Augmenter:Event-level Intrusion Detection Based on Data Provenance Graph

SUN Hongbin1,2, WANG Su3, WANG Zhiliang1,3, JIANG Zheyu1, YANG Jiahai1,3, ZHANG Hui1,2   

  1. 1 Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China
    2 Quancheng Laboratory,Ji'nan 250000,China
    3 Zhongguancun Laboratory,Beijing 100094,China
  • Received:2024-04-03 Revised:2024-08-30 Online:2025-02-15 Published:2025-02-17
  • About author:SUN Hongbin,born in 1999,postgra-duate.His main research interest is provenance-based intrusion detection.
    ZHANG Hui,born in 1973,postgra-duate.His main research interest is network measurement.
  • Supported by:
    Quancheng Laboratory(QCLZD202304-2) and Research Project of Provincial Laboratory of Shandong, China(SYS202201).

摘要: 近年来,高级可持续威胁(APT)攻击频发。数据源图包含丰富的上下文信息,反映了进程的执行过程,具有检测APT攻击的潜力,因此基于数据源图的入侵检测系统(PIDS)备受关注。PIDS通过捕获系统日志生成数据源图来识别恶意行为。PIDS主要面临3个挑战:高效性、通用性和实时性,特别是高效性。目前的PIDS在检测到异常行为时,一个异常节点或一张异常图就会产生成千上万条告警,其中会包含大量的误报,给安全人员带来不便。为此,提出了基于数据源图的入侵检测系统Augmenter,同时解决上述3个挑战。Augmenter利用节点的信息字段对进程进行社区划分,有效学习不同进程的行为。此外,Augmenter提出时间窗口策略实现子图划分,并采用了图互信息最大化的无监督特征提取方法提取节点的增量特征,通过增量特征提取来放大异常行为,同时实现异常行为与正常行为的划分。最后,Augmenter依据进程的类型训练多个聚类模型来实现事件级别的检测,通过检测到事件级别的异常能够更精准地定位攻击行为。在DARPA数据集上对Augmenter进行评估,通过衡量检测阶段的运行效率,验证了Augmenter的实时性。在检测能力方面,与最新工作Kairos和ThreaTrace相比,所提方法的精确率和召回率分别为0.83和0.97,Kairos为0.17和0.80,ThreaTrace为0.29和0.76,Augmenter具有更高的精确率和检测性能。

关键词: 高级可持续威胁, 数据源图, 入侵检测, 增量特征, 异常行为

Abstract: In recent years,advanced persistent threat(APT) attacks have become increasingly prevalent.Data provenance graphs,which contain rich contextual information reflecting process execution,have shown potential for detecting APT attacks.Therefore,provenance-based intrusion detection systems(PIDS) have garnered attention.PIDS identify malicious behavior by capturing system logs to generate provenance graphs.PIDS encounter the following main challenges:efficiency,generality,and real-time capability,particularly in terms of efficiency.Current PIDS generate thousands of alerts for a single anomalous node or graph,lea-ding to a significant number of false positives,which inconveniences security personnel.This paper presents Augmenter,the first PIDS simultaneously addresses the three aforementioned challenges.Augmenter partitions processes into communities based on the information fields of nodes,effectively learning the behavior of different processes.Additionally,Augmenter introduces a time-window strategy for subgraph partitioning and employs an unsupervised feature extraction method based on graph mutual information maximization.The incremental feature extraction algorithm amplifies abnormal behavior and distinguishes it from normal behavior.Finally,Augmenter trains multiple clustering models based on process types to achieve event-level detection,allowing for more precise localization of attack behaviors.Augmenter is evaluated on the DARPA dataset,confirming its real-time performance by measuring the efficiency of the detection phase.In terms of detection efficiency,we compare the precision and recall rates with the state-of-the-art works,Kairos and ThreaTrace.Kairos achieves precision and recall rates of 0.17 and 0.80,while ThreaTrace achieves 0.29 and 0.76.In contrast,Augmenter achieves precision and recall rates of 0.83 and 0.97,demonstrating that Augmenter has significantly higher precision and detection performance.

Key words: Advanced persistent threat, Data provenance graph, Intrusion detection, Incremental feature, Abnormal behavior

中图分类号: 

  • TP393
[1]PASQUIER T,HAN X,GOLDSTEIN M,et al.Practical whole-system provenance capture[C]//Proceedings of the 2017 Symposium on Cloud Computing.2017:405-418.
[2]DONG F,LI S,JIANG P,et al.Are we there yet? an industrial viewpoint on provenance-based endpoint detection and response tools[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security.2023:2396-2410.
[3]ALTINISIK E,DENIZ F,SENCAR H T.ProvG-Searcher:AGraph Representation Learning Approach for Efficient Provenance Graph Search[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security.2023:2247-2261.
[4]CHENG Z,LV Q,LIANG J,et al.KAIROS:Practical Intrusion Detection and Investigation using Whole-system Provenance[C]//2024 IEEE Symposium on Security and Privacy(SP).2024.
[5]WANG S,WANG Z,ZHOU T,et al.Threatrace:Detecting and tracing host-based threats in node level through provenance graph learning[J].IEEE Transactions on Information Forensics and Security,2022,17:3972-3987.
[6]MANZOOR E,MILAJERDI S M,AKOGLU L.Fast memory-efficient anomaly detection in streaming heterogeneous graphs[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:1035-1044.
[7]HAN X,PASQUIER T,BATES A,et al.UNICORN:Runtime Provenance-Based Detector for Advanced Persistent Threats[C]//Network and Distributed System Security Symposium.2020.
[8]WANG Q,HASSAN W U,LI D,et al.You Are What You Do:Hunting Stealthy Malware via Data Provenance Analysis[C]//Network and Distributed System Security Symposium.2020.
[9]YANG F,XU J,XIONG C,et al.{PROGRAPHER}:An Ano-maly Detection System based on Provenance Graph Embedding[C]//32nd USENIX Security Symposium.2023:4355-4372.
[10]XU Z,FANG P,LIU C,et al.Depcomm:Graph summarization on system audit logs for attack investigation[C]//2022 IEEE Symposium on Security and Privacy(SP).2022:540-557.
[11]ZENG J,CHUA Z L,CHEN Y,et al.WATSON:AbstractingBehaviors from Audit Logs via Aggregation of Contextual Semantics[C]//Network and Distributed System Security Symposium.2021.
[12]ZENGY J,WANG X,LIU J,et al.Shadewatcher:Recommendation-guided cyber threat analysis using system audit records[C]//2022 IEEE Symposium on Security and Privacy(SP).2022:489-506.
[13]REHMAN M U,AHMADI H,HASSAN W U.FLASH:A Comprehensive Approach to Intrusion Detection via Provenance Graph Representation Learning[C]//2024 IEEE Symposium on Security and Privacy(SP).2024:139-139.
[14]HOSSAIN M N,MILAJERDI S M,WANG J,et al.{SLEUTH}:Real-time attack scenario reconstruction from {COTS} audit data[C]//26th USENIX Security Symposium.2017:487-504.
[15]HOSSAIN M N,SHEIKHI S,SEKAR R.Combating depen-dence explosion in forensic analysis using alternative tag propagation semantics[C]//2020 IEEE Symposium on Security and Privacy(SP).2020:1139-1155.
[16]XIONG C,ZHU T,DONG W,et al.CONAN:A practical real-time APT detection system with high accuracy and efficiency[J].IEEE Transactions on Dependable and Secure Computing,2020,19(1):551-565.
[17]MILAJERDI S M,GJOMEMO R,ESHETE B,et al.Holmes:real-time apt detection through correlation of suspicious information flows[C]//2019 IEEE Symposium on Security and Privacy(SP).2019:1137-1152.
[18]MILAJERDI S M,ESHETE B,GJOMEMOR,et al.Poirot:Aligning attack behavior with kernel audit records for cyber threat hunting[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security.2019:1795-1812.
[19]HASSAN W U,BATES A,MARINO D.Tactical provenanceanalysis for endpoint detection and response systems[C]//2020 IEEE Symposium on Security and Privacy(SP).2020:1172-1189.
[20]KEROMYTIS A.Transparent computing engagement 3 data release [EB/OL].https://github.com/darpa-i2o/Transparent-Computing.
[21]HAMILTON W L,YING R,LESKOVEC J.Inductive represen-tation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1025-1035.
[22]BRIDGE K,SHARKEY K,COULTER D,et al.About EventTracing [EB/OL].https://learn.microsoft.com/en-us/windows/win32/etw/about-event-tracing.
[23]The Linux Audit Framework [EB/OL].https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/security_guide/chap-system_auditing.
[24]VELICKOVIC P,FEDUS W,HAMILTON W L,et al.Deepgraph infomax[C]//ICLR.2019.
[25]SCHLICHTKRULL M,KIPF T N,BLOEM P,et al.Modeling relational data with graph convolutional networks[C]//The Semantic Web:15th International Conference.2018:593-607.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!