计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 380-390.doi: 10.11896/jsjkx.220100032

• 信息安全 • 上一篇    下一篇

一种基于GRU的半监督网络流量异常检测方法

李海涛1, 王瑞敏2, 董卫宇2, 蒋烈辉2   

  1. 1 郑州大学网络空间安全学院 郑州 450001
    2 信息工程大学数学工程与先进计算国家重点实验室 郑州 450001
  • 收稿日期:2022-01-04 修回日期:2022-07-17 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 王瑞敏(380430313@qq.com)
  • 作者简介:(926206615@qq.com)
  • 基金资助:
    国家重点研发计划(2018YFB0804500)

Semi-supervised Network Traffic Anomaly Detection Method Based on GRU

LI Haitao1, WANG Ruimin2, DONG Weiyu2, JIANG Liehui2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhenzhou 450001,China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhenzhou 450001,China
  • Received:2022-01-04 Revised:2022-07-17 Online:2023-03-15 Published:2023-03-15
  • About author:LI Haitao,born in 1994,postgraduate.His main research interests include cyber security and intrusion detection.
    WANG Ruimin,born in 1982,Ph.D,associate professor.Her main research interests include cyber security and IoT device identification.
  • Supported by:
    National Key R & D Program of China(2018YFB0804500).

摘要: 入侵检测系统(IDS)是在出现网络攻击时能够发出警报的检测系统,检测网络中未知的攻击是IDS面临的挑战。深度学习技术在网络流量异常检测方面发挥着重要的作用,但现有的方法大多具有较高的误报率且模型的训练大多使用有监督学习的方式。为此,提出了一种基于门循环单元网络(GRU)的半监督网络流量异常检测方法(SEMI-GRU)。该方法将多层双向门循环单元神经网络(MLB-GRU)和改进的前馈神经网络(FNN)相结合,采用数据过采样技术和半监督学习训练方式,应用二分类和多分类方式检验网络流量异常检测的效果,并使用NSL-KDD,UNSW-NB15和CIC-Bell-DNS-EXF-2021数据集进行验证。与经典机器学习模型和DNN,ANN等深度学习模型相比,SEMI-GRU方法在准确率、精确率、召回率、误报率和F1分数等指标上的表现均表现更优。在NSL-KDD二分类和多分类任务中,SEMI-GRU在F1分数指标上领先于其他方法,分别为93.08%和82.15%;在UNSW-NB15二分类和多分类任务中,SEMI-GRU在F1分数上的表现优于对比方法,分别为88.13%和75.24%;在CIC-Bell-DNS-EXF-2021轻文件攻击数据集二分类任务中,SEMI-GRU对所有测试数据均分类正确。

关键词: 入侵检测系统, 半监督学习, 多层双向门循环单元, 前馈神经网络, NSL-KDD, UNSW-NB15

Abstract: Intrusion detection system(IDS) is a detection system that can issue an alarm when a network attack occurs.Detecting unknown attacks in the network is a challenge that IDS faces.Deep learning technology plays an important role in network traffic anomaly detection,but most of the existing methods have a high false positive rate and most of the models are trained using supervised learning methods.A gated recurrent unit network(GRU)-based semi-supervised network traffic anomaly detection me-thod(SEMI-GRU) is proposed,which combines a multi-layer bidirectional gated recurrent unit neural network(MLB-GRU) and an improved feedforward neural network(FNN).Data oversampling technology and semi-supervised learning training method are used to test the effect of network traffic anomaly detection using binary classification and multi-classification methods,and NSL-KDD,UNSW-NB15 and CIC-Bell-DNS-EXF-2021 datasets are used for verification.Compared with classic machine learning mo-dels and deep learning models such as DNN and ANN,the SEMI-GRU method outperforms the machines lear-ning and deep learning methods listed in this paper in terms of accuracy,precision,recall,false positives,and F1 scores.In the NSL-KDD binary and multi-class tasks,SEMI-GRU outperforms other methods on the F1 score metric,which is 93.08% and 82.15%,respectively.In the UNSW-NB15 binary and multi-class tasks,SEMI-GRU outperforms the other methods on the F1 score,which is 88.13% and 75.24%,respectively.In the CIC-Bell-DNS-EXF-2021 light file attack dataset binary classification task,all test data are classified correctly.

Key words: Intrusion detection system, Semi-supervised learning, Multilayer bidirectional GRU, Feedforward neural network, NSL-KDD, UNSW-NB15

中图分类号: 

  • TP181
[1]XIAO X,ZHANG S,MERCALDO F,et al.Android malware detection based on system call sequences and LSTM[J].Multimedia Tools and Applications,2019,78(4):3979-3999.
[2]BALAKRISHNAN S M,SANGAIAH A K.MIFIM—Middleware solution for service centric anomaly in future Internet models[J].Future Generation Computer Systems,2017,74:349-365.
[3]CREECH G,HU J.A semantic approach to host-based intrusion detection systems using contiguousand discontiguous system call patterns[J].IEEE Transactions on Computers,2013,63(4):807-819.
[4]LEE W,STOLFO S J,MOK K W.A data mining framework for building intrusion detection models[C]//Proceedings of the 1999 IEEE Symposium on Security and Privacy(Cat.No.99CB36344).IEEE,1999:120-132.
[5]KHRAISAT A,GONDAL I,VAMPLEW P.An anomaly intrusion detection system using C5 decision tree classifier[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2018:149-155.
[6]BUTUN I,MORGERA S D,SANKAR R.A survey of intrusion detection systems in wireless sensor networks[J].IEEE Communications Surveys & Tutorials,2013,16(1):266-282.
[7]BOCHKOVSKIY A,WANG C Y,LIAO H Y M.Yolov4:Optimal speed and accuracy of object detection[J].arXiv:2004.10934,2020.
[8]SONG K,TAN X,QIN T,et al.Mpnet:Masked andpermutedpre-training for language understanding[J].arXiv:2004.09297,2020.
[9]FU Y,LOU F,MENG F,et al.An intelligent network attack detection method based on rnn[C]//2018 IEEE Third International Conference on Data Science in Cyberspace(DSC).IEEE,2018:483-489.
[10]IMRANA Y,XIANG Y,ALI L,et al.A bidirectional LSTM deep learning approach for intrusion detection[J].Expert Systems with Applications,2021,185:115524.
[11]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[12]BERTHELOT D,CARLINI N,GOODFELLOW I,et al.Mix-match:A holistic approach to semi-supervised learning[J].ar-Xiv:1905.02249,2019.
[13]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[14]MOUSTAFA N,SLAY J.UNSW-NB15:a comprehensive dataset for network intrusion detection systems(UNSW-NB15 network data set)[C]//2015 Military Communications and Information Systems Conference(MilCIS).IEEE,2015:1-6.
[15]TAVALLAEE M,BAGHERI E,LU W,et al.A detailed analysis of the KDD CUP 99 data set[C]//IEEE Symposium on Computational Intelligence for Security and Defense Applications.IEEE,2009:1-6.
[16]SAMANEH M,AMGAD H S,PRINCY V,et al.Lightweight Hybrid Detection of Data Exfiltration using DNS based on Machine Learning[C]//The 11th IEEE International Conference on Communication and Network Security(ICCNS).2021:3-5.
[17]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution[J].Neural Computation,2001,13(7):1443-1471.
[18]ESKIN E,ARNOLD A,PRERAU M,et al.A geometric framework for unsupervised anomaly detection[M]//Applications of Data Mining in Computer Security.Boston:Springer,2002:77-101.
[19]SMITH R,BIVENS A,EMBRECHTS M,et al.Clustering approaches for anomaly based intrusion detection[J].Proceedings of Intelligent Engineering Systems Through Artificial Neural Networks,2002,12(1):579-584.
[20]ERFANI S M,RAJASEGARAR S,KARUNASEKERA S,et al.High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J].Pattern Recognition,2016,58:121-134.
[21]AN J,CHO S.Variational autoencoder based anomaly detection using reconstruction probability[J].Special Lecture on IE,2015,2(1):1-18.
[22]BEGGEL L,PFEIFFER M,BISCHL B.Robust anomaly detection in images using adversarial autoencoders[J].arXiv:1901.06355,2019.
[23]ZENATI H,ROMAIN M,FOO C S,et al.Adversarially learned anomaly detection[C]//2018 IEEE International Conference on Data Mining(ICDM).IEEE,2018:727-736.
[24]RADFORD B J,APOLONIO L M,TRIAS A J,et al.Network traffic anomaly detection using recurrent neural networks[J].arXiv:1803.10769,2018.
[25]WANG W,SHENG Y,WANG J,et al.HAST-IDS:Learninghierarchical spatial-temporal features using deep neural networks to improve intrusion detection[J].IEEE access,2017,6:1792-1806.
[26]WANG W,ZHU M,ZENG X,et al.Malware traffic classification using convolutional neural network for representation learning[C]//17 International Conference on Information Networking(ICOIN).IEEE,2017:712-717.
[27]VINAYAKUMAR R,ALAZAB M,SOMAN K P,et al.Deeplearning approach for intelligent intrusion detection system[J].IEEE Access,2019,7:41525-41550.
[28]JAVAID A,NIYAZ Q,SUN W,et al.A deep learning approach for network intrusion detection system[C]//Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies(formerly BIONETICS).2016:21-26.
[29]INGRE B,YADAV A.Performance analysis of NSL-KDD dataset using ANN[C]//15 International Conference on Signal Processing and Communication Engineering Systems.IEEE,2015:92-96.
[30]WU K,CHEN Z,LI W.A novel intrusion detection model for a massive network using convolutional neural networks[J].IEEE Access,2018,6:50850-50859.
[31]AL-TURAIKI I,ALTWAIJRY N.A Convolutional Neural Network for Improved Anomaly-Based Network Intrusion Detection[J].Big Data,2021,9(3):233-252.
[32]ALTWAIJRY N,ALQAHTANI A,ALTURAIKI I.A deeplearning approach for anomaly-based network intrusion detection[C]//International Conference on Big Data and Security.Singapore:Springer,2019:603-615.
[33]XU W,JANG-JACCARD J,SINGH A,et al.Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset[J].IEEE Access,2021,9:140136-140146.
[34]RAJ S,JAIN M,CHOUKSEY P.A Network Intrusion Detection System Based on Categorical Boosting Technique using NSL-KDD[J].IJCNS,2021,1(2):2582-9238.
[35]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:Beyondempirical risk minimization[J].arXiv:1710.09412,2017.
[1] 王祥炜, 韩锐, 刘驰.
基于层级化数据记忆池的边缘侧半监督持续学习方法
Hierarchical Memory Pool Based Edge Semi-supervised Continual Learning Method
计算机科学, 2023, 50(2): 23-31. https://doi.org/10.11896/jsjkx.221100133
[2] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[3] 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真.
一种基于支持向量机的主动度量学习算法
Active Metric Learning Based on Support Vector Machines
计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034
[4] 庞兴龙, 朱国胜.
基于半监督学习的网络流量分析研究
Survey of Network Traffic Analysis Based on Semi Supervised Learning
计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131
[5] 魏辉, 陈泽茂, 张立强.
一种基于顺序和频率模式的系统调用轨迹异常检测框架
Anomaly Detection Framework of System Call Trace Based on Sequence and Frequency Patterns
计算机科学, 2022, 49(6): 350-355. https://doi.org/10.11896/jsjkx.210500031
[6] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[7] 许华杰, 陈育, 杨洋, 秦远卓.
基于混合样本自动数据增强技术的半监督学习方法
Semi-supervised Learning Method Based on Automated Mixed Sample Data Augmentation Techniques
计算机科学, 2022, 49(3): 288-293. https://doi.org/10.11896/jsjkx.210100156
[8] 王璐, 文武松.
基于人工智能的分布式入侵检测研究
Study on Distributed Intrusion Detection System Based on Artificial Intelligence
计算机科学, 2022, 49(10): 353-357. https://doi.org/10.11896/jsjkx.220700095
[9] 李贝贝, 宋佳芮, 杜卿芸, 何俊江.
DRL-IDS:基于深度强化学习的工业物联网入侵检测系统
DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things
计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
[10] 郇文明, 林海涛.
基于采样集成算法的入侵检测系统设计
Design of Intrusion Detection System Based on Sampling Ensemble Algorithm
计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101
[11] 秦悦, 丁世飞.
半监督聚类综述
Survey of Semi-supervised Clustering
计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002
[12] 吴振宇, 李云雷, 吴凡.
基于Tucker分解的半监督支持张量机
Semi-supervised Support Tensor Based on Tucker Decomposition
计算机科学, 2019, 46(9): 195-200. https://doi.org/10.11896/j.issn.1002-137X.2019.09.028
[13] 沈鸿, 刘军发, 陈益强, 蒋鑫龙, 黄正宇.
基于多模融合的半监督场景识别方法
Semi-supervised Scene Recognition Method Based on Multi-mode Fusion
计算机科学, 2019, 46(12): 306-312. https://doi.org/10.11896/jsjkx.191200500C
[14] 高忠石, 苏旸, 柳玉东.
基于PCA-LSTM的入侵检测研究
Study on Intrusion Detection Based on PCA-LSTM
计算机科学, 2019, 46(11A): 473-476.
[15] 喻影, 陈珂, 寿黎但, 陈刚, 吴晓凡.
基于关键词和关键句抽取的用户评论情感分析
Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences
计算机科学, 2019, 46(10): 19-26. https://doi.org/10.11896/jsjkx.191000531C
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!