计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 357-365.doi: 10.11896/jsjkx.240900142

• 信息安全 • 上一篇    下一篇

基于改进主动学习的入侵检测方法

何浩, 张辉   

  1. 北京航空航天大学计算机学院 北京 100191
  • 收稿日期:2024-09-24 修回日期:2025-03-07 出版日期:2025-10-15 发布日期:2025-10-14
  • 通讯作者: 张辉(hzhang@buaa.edu.cn)
  • 作者简介:(haohe@buaa.edu.cn)
  • 基金资助:
    复杂关键软件环境全国重点实验室资助项目(SKLSDE-2023ZX-07)

Intrusion Detection Method Based on Improved Active Learning

HE Hao, ZHANG Hui   

  1. School of Computer Science,Beihang University,Beijing 100191,China
  • Received:2024-09-24 Revised:2025-03-07 Online:2025-10-15 Published:2025-10-14
  • About author:HE Hao,born in 2001,postgraduate, is a member of CCF(No.V3877G).His main research interests include compu-ter network and deep learning.
    ZHANG Hui,born in 1968,professor,Ph.D supervisor.His main research interests include computer networks,network and information security,artificial intelligence,big data management and mining,etc.
  • Supported by:
    State Key Laboratory of Complex & Critical Software Environment(SKLSDE-2023ZX-07).

摘要: 传统基于深度学习的入侵检测技术需要大量的标注样本才能达到较高的准确率。然而,获取大量标注样本所需时间和人力成本巨大,限制了其在实际应用中的推广。为此,提出了一种结合主动学习和卷积神经网络的入侵检测方法。该方法通过改进的自适应主动学习策略,更高效地选择最具代表性的样本进行标注,有效降低模型训练过程中的计算成本,并提高模型的整体表现。在CCF-BDCI-2022和Malicious-URLs-2021数据集上的实验结果表明,在查询时间和迭代时间上,该方法优于传统基于深度学习的模型。在CCF-BDCI-2022数据集上,该方法的准确率达到97.10%,误报率为1.3%。在Malicious-URLs-2021数据集上,该方法的准确率达到99.05%,误报率为1.1%。与其他方法相比,该方法不仅在准确率和误报率上表现更优,而且显著减少了计算资源的消耗,提升了模型的效率和实用性。

关键词: 主动学习, 入侵检测, 卷积神经网络, K-means, 样本标注

Abstract: Conventional intrusion detection methodologies based on deep learning necessitate a substantial number of labeled samples to attain optimal accuracy.Nevertheless,the acquisition of a substantial number of labeled samples necessitates a considerable investment of time and labor,which constrains its applicability in practical settings.In order to address these limitations,a novel intrusion detection method that integrates active learning with convolutional neural networks is proposed.This method employs an enhanced adaptive active learning approach to more efficiently identify the most representative samples for labeling,effectively reducing the computational cost of the model training process and enhancing the overall performance of the model.The experimental results on the CCF-BDCI-2022 and Malicious-URLs-2021 datasets demonstrate that the proposed method exhibits superior performance in terms of query time and iteration time compared to traditional deep learning-based models.In the CCF-BDCI-2022 dataset,the method demonstrates an accuracy rate of 97.10% and a false positive rate of 1.3%.In the Malicious-URLs-2021 dataset,the method achieves an accuracy rate of 99.05% and a false positive rate of 1.1%.Compared with other methods,this method not only performs better in terms of accuracy and false positive rate,but also significantly reduces the consumption of computing resources,thereby improving the efficiency and practicality of the model.

Key words: Active learning,Intrusion detection,Convolutional neural network,K-means,Sample annotation

中图分类号: 

  • TP393
[1]BACE R G,MELL P.Intrusion detection systems:Technical Report[R].National Institute of Standards and Technology,2001.
[2]KHRAISAT A,GONDAL I,VAMPLEW P,et al.Survey of intrusion detection systems:techniques,datasets and challenges [J].Cybersecurity,2019,2(1):1-22.
[3]GARCIA-TEODORO P,DIAZ-VERDEJO J,MACIÁ-FERNÁ-NDEZ G,et al.Anomaly-based network intrusion detection:Techniques,systems and challenges [J].Computers & Security,2009,28(1/2):18-28.
[4]SETTLES B.Active learning literature survey:Technical Report[R].University of Wisconsin-Madison Department of Computer Sciences,2009.
[5]MOHAMMADPOUR L,LING T C,LIEW C S,et al.A survey of CNN-based network intrusion detection [J].Applied Sciences,2022,12(16):8162.
[6]SHEIKHAN M,JADIDI Z,FARROKHI A.Intrusion detection using reduced-size RNN based on feature grouping [J].Neural Computing and Applications,2012,21:1185-1190.
[7]FEDERATION C C.CCF-BDCI-2022 Dateset [EB/OL].(2022-08-26)[2024-09-20].https://www.datafountain.cn/competitions/596/datasets.
[8]SIDDHARTHA M.Malicious URLs dataset [EB/OL].[2024-09-20].https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset.
[9]KIM D S,PARK J S.Network-based intrusion detection with support vector machines[C]//Information Networking:International Conference.Cheju Island,Korea:Springer,2003:747-756.
[10]BAHROLOLUM M,KHALEGHI M.Anomaly intrusion detection system using Gaussian mixture model[C]//Third International Conference on Convergence and Hybrid Information Technology.IEEE,2008:1162-1167.
[11]ROESCH M.Snort:Lightweight intrusion detection for net-works[C]//Lisa.1999:229-238.
[12]VINAYAKUMAR R,SOMAN K,POORNACHANDRAN P.Applying convolutional neural network for network intrusion detection[C]//International Conference on Advances in Computing,Communications and Informatics(ICACCI 2017).IEEE,2017:1222-1228.
[13]YIN C,ZHU Y,FEI J,et al.A deep learning approach for intrusion detection using recurrent neural networks [J].Ieee Access,2017,5:21954-21961.
[14]IMRANA Y,XIANG Y,ALI L,et al.A bidirectional LSTMdeep learning approach for intrusion detection [J].Expert Systems with Applications,2021,185:115524.
[15]WANG J,WANG H L,HUANG B W,et al.Intrusion detection for industrial internet of things based on federated learning and self-attention[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(11):3229-3237.
[16]FARAHNAKIAN F,HEIKKONEN J.A deep auto-encoderbased approach for intrusion detection system[C]//20th International Conference on Advanced Communication Technology(ICACT 2018).IEEE,2018:178-183.
[17]ALMGREN M,JONSSON E.Using active learning in intrusion detection[C]//Proceedings 17th IEEE Computer Security Foundations Workshop,2004.IEEE,2004:88-98.
[18]GÖRNITZ N,KLOFT M,RIECK K,et al.Active learning for network intrusion detection[C]//Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence.2009:47-54.
[19]LEWIS D D,CATLETT J.Heterogeneous uncertainty sampling for supervised learning [M]//Machine Learning Proceedings 1994.Elsevier,1994:148-156.
[20]GRAFSTRÖM A,SCHELIN L.How to select representativesamples [J].Scandinavian Journal of Statistics,2014,41(2):277-290.
[21]LI Y,GUO L.An active learning based TCM-KNN algorithm for supervised network intrusion detection [J].Computers & Security,2007,26(7/8):459-467.
[22]CAI Y,CHEN W R.Anomaly detection algorithm based on improved active learning [J].Computer Engineering and Design,2022:43(11):3057-3062.
[23]ZHANG Y,NIU J,HE G,et al.Network intrusion detection based on active semi-supervised learning[C]//51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops(DSNW).IEEE,2021:129-135.
[24]AHMED M,SERAJ R,ISLAM S M S.The k-means algorithm:A comprehensive survey and performance evaluation [J].Electronics,2020,9(8):1295.
[25]COATES A,NG A,LEE H.An analysis of single-layer networks in unsupervised feature learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics.JMLR Workshop and Conference Proceedings,2011:215-223.
[26]MUNJAL P,HAYAT N,HAYAT M,et al.Towards robust and reproducible active learning using neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:223-232.
[27]LIU P,WANG L,RANJAN R,et al.A survey on active deep learning:from model driven to data driven [J].ACM Computing Surveys(CSUR),2022,54(10s):1-34.
[28]MILLER B,LINDER F,MEBANE JR W R.Active learning approaches for labeling text:review and assessment of the performance of active learning approaches [J].Political Analysis,2020,28(4):532-551.
[29]BHUYAN R,BORAH S.A survey of some density based clustering techniques [J].arXiv:2306,09256,2023.
[30]SOUZA V,ROSSI R G,BATISTA G E,et al.Unsupervised active learning techniques for labeling training sets:an experimental evaluation on sequential data [J].Intelligent Data Analysis,2017,21(5):1061-1095.
[31]VU V V,LABROCHE N,BOUCHON-MEUNIER B.Activelearning for semi-supervised k-means clustering[C]//2010 22nd IEEE International Conference on Tools with Artificial Intelligence.IEEE,2010:12-15.
[32]SENER O,SAVARESE S.Active learning for convolutionalneural networks:A core-set approach [J].arXiv:1708,00489,2017.
[33]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[34]FONSECA J,VIEIRA M,MADEIRA H.The web attacker perspective-a field study[C]//2010 IEEE 21st International Symposium on Software Reliability Engineering.IEEE,2010:299-308.
[35]HALFOND W G,VIEGAS J,ORSO A.A Classification of SQL Injection Attacks and Countermeasures[C]//ISSSE.2006.
[36]RODRÍGUEZ G E,TORRES J G,FLORES P,et al.Cross-site scripting(XSS) attacks and mitigation:A survey [J].Computer Networks,2020,166:106960.
[37]JOSEPH V R.Optimal ratio for data splitting [J].StatisticalAnalysis and Data Mining:The ASA Data Science Journal,2022,15(4):531-538.
[38]CITOVSKY G,DESALVO G,GENTILE C,et al.Batch active learning at scale [J].Advances in Neural Information Processing Systems,2021,34:11933-11944.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!