计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 408-415.doi: 10.11896/jsjkx.230700014

• 信息安全 • 上一篇    下一篇

基于生成对抗网络的系统调用主机入侵检测技术

樊燚, 胡涛, 伊鹏   

  1. 战略支援部队信息工程大学信息技术研究所 郑州 450002
  • 收稿日期:2023-07-03 修回日期:2023-11-13 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 胡涛(hutaondsc@163.com)
  • 作者简介:(fy.fzh@foxmail.com)
  • 基金资助:
    河南省重大科技专项(221100240100);郑州市重大科技创新专项(2021KJZX0060-3)

System Call Host Intrusion Detection Technology Based on Generative Adversarial Network

FAN Yi, HU Tao, YI Peng   

  1. Information Technology Institute,Information Engineering University,Zhengzhou 450002,China
  • Received:2023-07-03 Revised:2023-11-13 Online:2024-10-15 Published:2024-10-11
  • About author:FAN Yi,born in 1998,postgraduate.His main research interest is host anomaly detection.
    HU Tao,born in 1993,Ph.D,assistant researcher.His main research interests include new network architecture and active cyber defense.
  • Supported by:
    Key Science and Technology Project of Henan Province(221100240100) and Zhengzhou Key Science and Technology Innovation Project(2021KJZX0060-3).

摘要: 程序的系统调用信息是检测主机异常的重要数据,然而异常发生的次数相对较少,这使得收集到的系统调用数据往往存在数据不均衡的问题。较少的异常系统调用数据使得检测模型无法充分理解程序的异常行为模式,导致入侵检测的准确率较低、误报率较高。针对以上问题,提出了一种基于生成对抗网络的系统调用主机入侵检测方法,通过对异常系统调用数据的增强,缓解数据不平衡的问题。首先将程序的系统调用轨迹划分成固定长度的N-Gram序列,其次使用SeqGAN从异常数据的N-Gram序列中生成合成的N-Gram序列,生成的异常数据与原始数据集相结合,用于训练入侵检测模型。在一个主机系统调用数据集ADFA-LD及一个安卓系统调用数据集Drebin上进行了实验,所提方法的检测准确率分别为0.986和0.989,误报率分别为0.011和0,检测效果优于现有的基于混合神经网络的模型、WaveNet、Relaxed-SVM及RNN-VED的入侵检测研究方法。

关键词: 主机入侵检测, 系统调用, 生成对抗网络, 深度学习, 数据不均衡

Abstract: The system call information of a program is an important data for detecting host anomalies,but the number of anomalies is relatively small,which makes the collected system call data often have the problem of data imbalance.The lack of abnormal system call data makes the detection model unable to fully understand the abnormal behavior pattern of the program,which leads to low accuracy and high false positive rate of intrusion detection.To solve the above problems,a system call host intrusion detection method based on generative adversarial network is proposed.By enhancing abnormal system call data,the problem of data imbalance is alleviated.Firstly,the system call trace of the program is divided into fixed length N-Gram sequences.Secondly,SeqGAN is used to generate synthetic N-Gram sequences from the N-Gram sequences of abnormal data.The generated abnormal data is combined with the original dataset to train the intrusion detection model.Experiments are carried out on a host system call dataset ADFA-LD and an Android system call dataset Drebin.The detection accuracy rate is 0.986 and 0.989,and the false positive rates is 0.011 and 0,respectively.Compared with the existing intrusion detection research methods based on hybrid neural network model,WaveNet,Relaxed-SVM and RNN-VED,the detection performance of the proposed method is better than other methods.

Key words: Host intrusion detection, System call, Generative adversarial network, Deep learning, Data imbalance

中图分类号: 

  • TP309
[1]WONG S C,GATT A,STAMATESCU V,et al.Understanding Data Augmentation for Classification:When to Warp?[C]//2016 International Conference on Digital Image Computing:Techniques and Applications(DICTA).Gold Coast,Australia:IEEE,2016:1-6.
[2]ANABY-TAVOR A,CARMELI B,GOLDBRAICH E,et al.Not Enough Data? Deep Learning to the Rescue![C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7383-7390.
[3]YU L,ZHANG W,WANG J,et al.SeqGAN:Sequence Generative Adversarial Nets with Policy Gradient[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.San Francisco,California,USA:AAAI Press,2017:2852-2858.
[4]LIU Y,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[J].arXiv:1907.11692,2019.
[5]CREECH G,HU J.Generation of a new IDS test dataset:Timeto retire the KDD collection[C]//2013 IEEE Wireless Communications and Networking Conference(WCNC).Shanghai,China:IEEE,2013:4487-4492.
[6]ARP D,SPREITZENBARTH M,HÜBNER M,et al.Drebin:Effective and Explainable Detection of Android Malware in Your Pocket[C]//Proceedings 2014 Network and Distributed System Security Symposium.San Diego,CA:Internet Society,2014:23-36.
[7]DIMJAŠEVIĆ M,ATZENI S,UGRINA I,et al.Evaluation of Android Malware Detection Based on System Calls[C]//Proceedings of the 2016 ACM on International Workshop on Secu-rity And Privacy Analytics.New Orleans Louisiana USA:ACM,2016:1-8.
[8]CREECH G,HU J.A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns[J].IEEE Transactions on Computers,2014,63(4):807-819.
[9]SALEM M,TAHERI S,YUAN J S.Anomaly Generation Using Generative Adversarial Networks in Host-Based Intrusion Detection[C]//2018 9th IEEE Annual Ubiquitous Computing,Electronics & Mobile Communication Conference(UEMCON).New York City,NY,USA:IEEE,2018:683-687.
[10]OSAMOR F,WELLMAN B.Deep Learning-based Hybrid Mo-del for Efficient Anomaly Detection[J].International Journal of Advanced Computer Science and Applications,2022,13(4):975-979.
[11]RING J H,VAN OORT C M,DURST S,et al.Methods for Host-based Intrusion Detection with Deep Learning[J].Digital Threats:Research and Practice,2021,2(4):1-29.
[12]LIAO X,WANG C,CHEN W.Anomaly Detection of SystemCall Sequence Based on Dynamic Features and Relaxed-SVM[J].Security and Communication Networks,2022,2022:1-13.
[13]BOUZAR-BENLABIOD L,RUBIN S H,BELAIDI K,et al.RNN-VED for Reducing False Positive Alerts in Host-based Anomaly Detection Systems[C]//2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science(IRI).Las Vegas,NV,USA:IEEE,2020:17-24.
[14]YOLACAN E N,DY J G,KAELI D R.System Call Anomaly Detection Using Multi-HMMs[C]//2014 IEEE Eighth International Conference on Software Security and Reliability-Compa-nion.San Francisco,CA,USA:IEEE,2014:25-30.
[15]SURATKAR S,KAZI F,GAIKWAD R,et al.Multi HiddenMarkov Models for Improved Anomaly Detection Using System Call Analysis[C]//2019 IEEE Bombay Section Signature Conference(IBSSC).Mumbai,India:IEEE,2019:1-6.
[16]KIM G,YI H,LEE J,et al.LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems[J].arXiv:1611.01726,2016.
[17]CHAWLA A,LEE B,FALLON S,et al.Host Based Intrusion Detection System with Combined CNN/RNN Model[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Cham:Springer,2018:149-158.
[18]IACOVAZZI A,RAZA S.Ensemble of Random and IsolationForests for Graph-Based Intrusion Detection in Containers[C]//2022 IEEE International Conference on Cyber Security and Resilience(CSR).Rhodes,Greece:IEEE,2022:30-37.
[19]LIU Z,JAPKOWICZ N,WANG R,et al.A statistical patternbased feature extraction method on system call traces for ano-maly detection[J].Information and Software Technology,2020,126:106348.
[20]MURTAZA S S,KHREICH W,HAMOU-LHADJ A,et al.A trace abstraction approach for host-based anomaly detection[C]//2015 IEEE Symposium on Computational Intelligence for Security and Defense Applications(CISDA).Verona,NY,USA:IEEE,2015:1-8.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).Curran Associates Inc.,2017:6000-6010.
[22]BRIDGES R A,GLASS-VANDERLAN T R,IANNACONE M D,et al.A Survey of Intrusion Detection Systems Leveraging Host Data[J].ACM Computing Surveys,2019,52(6):1-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!