Computer Science ›› 2023, Vol. 50 ›› Issue (4): 359-368.doi: 10.11896/jsjkx.220300040

• Information Security • Previous Articles     Next Articles

Heterogeneous Provenance Graph Learning Model Based APT Detection

DONG Chengyu, LYU Mingqi, CHEN Tieming, ZHU Tiantian   

  1. College of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2022-03-04 Revised:2022-09-28 Online:2023-04-15 Published:2023-04-06
  • About author:DONG Chengyu,born in 1996,postgra-duate.His main research interests include data mining and graph neural networks.
    LYU Mingqi,born in 1982,Ph.D,associated professor,is a member of China Computer Federation.His main research interests include data mining and ubiquitous computing.
  • Supported by:
    Joint Funds of the National Natural Science Foundation of China(U1936215),Zhejiang Key R&D Projects (2021C01117),National Natural Science Foundation of China(62002324),Major Program of Natural Science Foundation of Zhejiang Province(LD22F020002),Zhejiang Provincial Natural Science Foundation of China(LQ21F020016) and “Ten Thousand People Program” Technology Innovation Leading Talent Project in Zhejiang Province(2020R52011).

Abstract: APT(advanced persistent threat)are advanced persistent cyber-attack by hacker organizations to breach the target information system.Usually,the APTs are characterized by long duration and multiple attack techniques,making the traditional intrusion detection methods ineffective.Most existing APT detection systems are implemented based on manually designed rules by referring to domain knowledge(e.g.,ATT&CK).However,this way lacks of intelligence,generalization ability,and is difficult to detect unknown APT attacks.Aiming at this limitation,this paper proposes an intelligent APT detection method based on provenance data and graph neural networks.To capture the rich context information in the diversified attack techniques of APTs,it firstly models the system entities(e.g.,process,file,socket)in the provenance data into a provenance graph,and learns a semantic vector representation for each system entity by heterogeneous graph learning model.Then,to solve the problem of graph scale explosion caused by the long-term behaviors of APTs,APT detection is performed by sampling a local graph from the large scale heterogeneous graph,and classifying the key system entities as malicious or benign by graph convolution networks.A series of experiments are conducted on two datasets with real APT attacks.Experiment results show that the comprehensive performance of the proposed method outperforms other learning based detection models,as well as the state-of-the-art rule based APT detection systems.

Key words: APT detection, Graph neural network, Provenance graph, Hosted-based security, Data-driven security

CLC Number: 

  • TP393
[1]GHAFIR I,HAMMOUDEH M,PRENOSIL V,et al.Detection of Advanced Persistent Threat Using Machine-Learning Correlation Analysis[J].Future Generation Computer Systems,2018,89(DEC.):349-359.
[2]BRIDGES R A,GLASS-VANDERLAN T R,IANNACONE M D,et al.A survey of intrusion detection systems leveraging host data[J].ACM Computing Surveys(CSUR),2019,52(6):1-35.
[3]SINGLA A,BERTINO E,VERMA D.Preparing Network In-trusion Detection Deep Learning Models with Minimal Data Using Adversarial Domain Adaptation[C]//ASIA CCS’20:The 15th ACM Asia Conference on Computer and Communications Security.ACM,2020.
[4]HAN X,PASQUIER T,SELTZER M.Provenance-based intrusion detection:opportunities and challenges[C]//10th USENIX Workshop on the Theory and Practice of Provenance(TaPP 2018).2018.
[5]JENKINSON G,CARATA L,BYTHEWAY T,et al.Applying Provenance in APT Monitoring and Analysis:Practical Challenges for Scalable,Efficient and Trustworthy Distributed Provenance[C]//9th USENIX Workshop on the Theory and Practice of Provenance(TaPP 2017).2017.
[6]HAN X,PASQUIER T,BATES A,et al.Unicorn:Runtimeprovenance-based detector for advanced persistent threats[C]//Network and Distributed System Security Symposium.2020.
[7]HOSSAIN M N,MILAJERDI S M,WANG J,et al.SLEUTH:Real-time attack scenario reconstruction from COTS audit data[C]//26th USENIX Security Symposium(USENIX Security 17).2017:487-504.
[8]MILAJERDI S M,GJOMEMO R,ESHETE B,et al.Holmes:real-time apt detection through correlation of suspicious information flows[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:1137-1152.
[9]XIONG C,ZHU T,DONG W,et al.Conan:A Practical Real-Time APT Detection System With High Accuracy and Efficiency[J].IEEE Transactions on Dependable and Secure Computing,2022,19(1):551-565.
[10]YADAV T,RAO A M.Technical aspects of cyber kill chain[C]//International Symposium on Security in Computing and Communication.Cham:Springer,2015:438-452.
[11]MITRE ATT&CK[OL].https://attack.mitre.org/.
[12]YE Y,LI T,ADJEROH D,et al.A survey on malware detection using data mining techniques[J].ACM Computing Surveys(CSUR),2017,50(3):1-40.
[13]ZHANG X,ZHANG Y,ZHONG M,et al.Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware[C]//Proceedings of the 2020 ACM SIGSAC Confe-rence on Computer and Communications Security.2020:757-770.
[14]WENG H,LI Z,JI S,et al.Online e-commerce fraud:a large-scale detection and analysis[C]//2018 IEEE 34th International Conference on Data Engineering(ICDE).IEEE,2018:1435-1440.
[15]BRANCO B,ABREU P,GOMES A S,et al.Interleaved se-quence RNNs for fraud detection[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Disco-very & Data Mining.2020:3101-3109.
[16]WU Z,PAN S,CHEN F,et al.A comprehensive survey ongraph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(1):4-24.
[17]SUN Y,HAN J.Mining heterogeneous information networks:principles and methodologies[J].Synthesis Lectures on Data Mining and Knowledge Discovery,2012,3(2):1-159.
[18]DONG Y,CHAWLA N V,SWAMI A.metapath2vec:Scalable representation learning for heterogeneous networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:135-144.
[19]LIU Y,ZHANG M,LI D,et al.Towards a Timely CausalityAnalysis for Enterprise Security[C]//NDSS.2018.
[20]BARRE M,GEHANI A,YEGNESWARAN V.Mining dataprovenance to detect advanced persistent threats[C]//11th International Workshop on Theory and Practice of Provenance(TaPP 2019).2019.
[21]BERRADA G,CHENEY J,BENABDERRAHMANE S,et al.A baseline for unsupervised advanced persistent threat detection in system-level provenance[J].Future Generation Computer Systems,2020,108:401-413.
[22]XIANG Z,GUO D,LI Q.Detecting mobile advanced persistent threats based on large-scale DNS logs[J].Computers & Security,2020,96:101933.
[23]ZIMBA A,CHEN H,WANG Z,et al.Modeling and detection of the multi-stages of advanced persistent threats attacks based on semi-supervised learning and complex networks characteristics[J].Future Generation Computer Systems,2020,106:501-517.
[24]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[25]DU M,LI F,ZHENG G,et al.Deeplog:Anomaly detection and diagnosis from system logs through deep learning[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:1285-1298.
[26]SHEN Y,MARICONTI E,VERVIER P A,et al.Tiresias:Predicting security events through deep learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.2018:592-605.
[27]EKE H N,PETROVSKI A,AHRIZ H.The use of machine learning algorithms for detecting advanced persistent threats[C]//Proceedings of the 12th International Conference on Security of Information and Networks.2019:1-8.
[28]LIU F,WEN Y,ZHANG D,et al.Log2vec:A heterogeneous graph embedding based approach for detecting cyber threats within enterprise[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security.2019:1777-1794.
[29]Linux Auditd[OL].https://linoxide.com/auditd-tool-security-auditing/.
[30]Windows ETW[OL].https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/event-tracing-for-windows--etw-.
[31]WANG J,HUANG P,ZHAO H,et al.Billion-scale commodity embedding for e-commerce recommendation in alibaba[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:839-848.
[32]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2014:701-710.
[33]GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.2016:855-864.
[34]TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.2015:1067-1077.
[35]WANG X,JI H,SHI C,et al.Heterogeneous graph attention network[C]//The World Wide Web Conference.2019:2022-2032.
[36]ZHANG D,YIN J,ZHU X,et al.Metagraph2vec:Complex semantic path augmented heterogeneous network embedding[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2018:196-208.
[37]SCHLICHTKRULL M,KIPF T N,BLOEM P,et al.Modeling relational data with graph convolutional networks[C]//Euro-pean Semantic Web Conference.Cham:Springer,2018:593-607.
[38]GEHANI A,TARIQ D.SPADE:Support for provenance audi-ting in distributed environments[C]//ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing.Berlin:Springer,2012:101-120.
[39]VELIČKOVIĆ P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//International Conference on Learning Representations.2018.
[40]WANG S,WANG Z,ZHOU T,et al.threaTrace:Detecting and Tracing Host-based Threats in Node Level Through Provenance Graph Learning[J].arXiv:2111.04333,2021.
[1] SHAO Yunfei, SONG You, WANG Baohui. Study on Degree of Node Based Personalized Propagation of Neural Predictions forSocial Networks [J]. Computer Science, 2023, 50(4): 16-21.
[2] CHEN Fuqiang, KOU Jiamin, SU Limin, LI Ke. Multi-information Optimized Entity Alignment Model Based on Graph Neural Network [J]. Computer Science, 2023, 50(3): 34-41.
[3] YU Jian, ZHAO Mankun, GAO Jie, WANG Congyuan, LI Yarong, ZHANG Wenbin. Study on Graph Neural Networks Social Recommendation Based on High-order and Temporal Features [J]. Computer Science, 2023, 50(3): 49-64.
[4] ZHANG Qi, YU Shuangyuan, YIN Hongfeng, XU Baomin. Neural Collaborative Filtering for Social Recommendation Algorithm Based on Graph Attention [J]. Computer Science, 2023, 50(2): 115-122.
[5] HAO Jingyu, WEN Jingxuan, LIU Huafeng, JING Liping, YU Jian. Deep Disentangled Collaborative Filtering with Graph Global Information [J]. Computer Science, 2023, 50(1): 41-51.
[6] GU Xizhi, SHAO Yingxia. Fast Computation Graph Simplification via Influence-based Pruning for Graph Neural Network [J]. Computer Science, 2023, 50(1): 52-58.
[7] PU Jinyao, BU Lingmei, LU Yongmei, YE Ziming, CHEN Li, YU Zhonghua. Utilizing Heterogeneous Graph Neural Network to Extract Emotion-Cause Pairs Effectively [J]. Computer Science, 2023, 50(1): 205-212.
[8] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[9] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10] QI Xiu-xiu, WANG Jia-hao, LI Wen-xiong, ZHOU Fan. Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning [J]. Computer Science, 2022, 49(7): 18-24.
[11] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[12] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[13] XIONG Zhong-min, SHU Gui-wen, GUO Huai-yu. Graph Neural Network Recommendation Model Integrating User Preferences [J]. Computer Science, 2022, 49(6): 165-171.
[14] YU Ai-xin, FENG Xiu-fang, SUN Jing-yu. Social Trust Recommendation Algorithm Combining Item Similarity [J]. Computer Science, 2022, 49(5): 144-151.
[15] LI Yong, WU Jing-peng, ZHANG Zhong-ying, ZHANG Qiang. Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism [J]. Computer Science, 2022, 49(4): 43-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!