基于采样集成算法的入侵检测系统设计

doi:10.11896/jsjkx.201100101

摘要/Abstract

摘要： 入侵检测系统作为防火墙之后的第二道防线已经在网络安全领域得到了广泛应用,基于机器学习的入侵检测系统因其优越的检测性能吸引了越来越多的关注。为了提高入侵检测系统在多类非平衡数据中的检测性能,文中提出基于采样集成算法(OSEC)的入侵检测系统。OSEC首先根据“一对多”原则将多类别检测问题转化为多个二分类问题,然后在每个二分类问题中根据AUC值选择最优的采样集成算法以缓解数据的非平衡问题,最后根据文中设计的类别判决模块判断待测样本的具体类别。在NSL-KDD数据集上进行仿真验证,发现本系统相较于传统方法在R2L,U2R上的F1得分分别提高了0.595和0.185;对比最新的入侵检测系统,所提方法在整体检测准确率上提高了1.4%。

关键词: AUC, NSL-KDD, 多类非平衡, 集成学习, 入侵检测, 重采样

Abstract: As the second line of defense after firewalls,intrusion detection systems have been widely used in the field of network security.Machine learning-based intrusion detection systems have attracted more and more interest due to their superior detection performance.In order to improve the detection performance of the intrusion detection system in multiple types of imbalanced data,this paper proposes an intrusion detection system based on the optimal sampling ensemble algorithm(OSEC).OSEC first converts the multi-category detection problem into multiple binary classification problems according to the “one-to-all” principle,and then selects the optimal sampling ensemble algorithm according to the AUC value in each binary classification problem to alleviate the data imbalance problem.Finally,the category judgment module designed in this article judges the specific category of the sample to be tested.We perform simulation verification on the NSL-KDD data set,and find that compared with the traditional method,the F1 score of this system on R2L and U2R has increased by 0.595 and 0.185 respectively;compared with the latest intrusion detection system,the method in this paper improves the overall detection accuracy by 1.4%.

Key words: AUC, Ensemble learning, Intrusion detection, Multi-class imbalanced, NSL-KDD, Resampling

中图分类号:

TP309

郇文明, 林海涛. 基于采样集成算法的入侵检测系统设计[J]. 计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101

HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm[J]. Computer Science, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101

参考文献

[1]GAMAGE S,SAMARABANDU J.Deep learning methods innetwork intrusion detection:A survey and an objective comparison[J].Journal of Network and Computer Applications,2020,169:102767.
[2]AMBUSAIDI M A,HE X,NANDA P,et al.Building an intrusion detection system using a filter-based feature selection algorithm[J].IEEE Transactions on Computers,2016,65(10):2986-2998.
[3]AL-QATF M,LASHENG Y,AL-HABIB M,et al.Deep learning approach combining sparse autoencoder with SVM for network intrusion detection[J].IEEE Access,2018,6:52843-52856.
[4]ZHOU Y,CHENG G,JIANG S,et al.Building an efficient intrusion detection system based on feature selection and ensemble classifier[J].Computer Networks,2020,174:297-304.
[5]YIJING L,HAIXIANG G,XIAO L,et al.Adapted ensembleclassification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data[J].Knowledge-Based Systems,2016,94:88-104.
[6]FERNÁNDEZ A,GARCÍA S,GALAR M,et al.ImbalancedClassification with Multiple Classes [M].Learning from Imbalanced Data Sets.Cham,Springer International Publishing.2018:197-226.
[7]LI Y X,CHAI Y,HU Y Q,et al.Review of imbalanced dataclassification methods[J].Control and Decision,2019,34(4):673-688.
[8]LIN W C,TSAI C F,HU Y H,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409/410:17-26.
[9]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[10]ZHANG X Y,WANG H Z.Intrusion Detection of ICS Based on Improved BorderSMOTE for Unbalance Data[J].Netinfo Security,2020,20(7):70-76.
[11]ZHANG H,HUANG L,WU C Q,et al.An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset[J].Compu-ter Networks,2020,177:303-315.
[12]WU Y X,WANG J L,YANG L,et al.Survey on cost-sensitive Deep Learning Methods[J].Computer Science,2019,46(5):8-19.
[13]TELIKANI A,GANDOMI A H.Cost-sensitive stacked auto-encoders for intrusion detection in the Internet of Things[J].Internet of Things,2019,14:157-169.
[14]HAIXIANG G,YIJING L,SHANG J,et al.Learning fromclass-imbalanced data:Review of methods and applications[J].Expert Systems with Applications,2016,73(MAY):220-239.
[15]SHAHRAKI A,ABBASI M,HAUGEN Ø.Boosting algorithms for network intrusion detection:A comparative evaluation of Real AdaBoost,Gentle AdaBoost and Modest AdaBoost[J].Engineering Applications of Artificial Intelligence,2020,94:103770.
[16]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems,Man,and Cybernetics,Part C (Applications and Reviews),2012,42(4):463-484.
[17]SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J,et al.RUSBoost:A hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics Part A:Systems and Humans,2010,40(1):185-197.
[18]CHAWLA N V,LAZAREVIC A,HALL L O,et al.SMOTEBoost:Improving prediction of the minority class in Boosting[C]//The 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Springer Verlag,2003:107-119.
[19]DIEZ-PASTOR J F,RODRIGUEZ J J,GARCIA-OSORIO C,et al.Random Balance:Ensembles of variable priors classifiers for imbalanced data[J].Knowledge-Based Systems,2015,85:96-111.
[20]FREUND Y,SCHAPIRE R E.A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139.
[21]KDD Cup 1999 Data[EB/OL].http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[22]TAVALLAEE M,BAGHERI E,LU W,et al.A detailed analysis of the KDD CUP 99 data set[C]//IEEE Symposium on Computational Intelligence for Security and Defense Applications.2009:1-6.
[23]HONG J H,MIN J K,CHO U K,et al.Fingerprint classification using one-vs-all support vector machines dynamically ordered with naïve Bayes classifiers[J].Pattern Recognition,2008,41(2):662-671.
[24]GAO X,SHAN C,HU C,et al.An Adaptive Ensemble Machine Learning Model for Intrusion Detection[J].IEEE Access,2019,7:82512-82521.
[25]KASONGO S M,SUN Y.A Deep Learning Method with FilterBased Feature Engineering for Wireless Intrusion Detection system[J].IEEE Access,2019:38597-38607.

相关文章 15

[1]	王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[2]	林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[3]	康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[4]	周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧. 基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法 SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm 计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106
[5]	曹扬晨, 朱国胜, 孙文和, 吴善超. 未知网络攻击识别关键技术研究 Study on Key Technologies of Unknown Network Attack Identification 计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[6]	朱旭东, 熊贇. 基于样本分布损失的图像多标签分类研究 Study on Multi-label Image Classification Based on Sample Distribution Loss 计算机科学, 2022, 49(6): 210-216. https://doi.org/10.11896/jsjkx.210300267
[7]	魏辉, 陈泽茂, 张立强. 一种基于顺序和频率模式的系统调用轨迹异常检测框架 Anomaly Detection Framework of System Call Trace Based on Sequence and Frequency Patterns 计算机科学, 2022, 49(6): 350-355. https://doi.org/10.11896/jsjkx.210500031
[8]	王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[9]	韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[10]	任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[11]	陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[12]	刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[13]	周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[14]	张师鹏, 李永忠. 基于降噪自编码器和三支决策的入侵检测方法 Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions 计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[15]	李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统 DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things 计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed