计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 459-463.doi: 10.11896/jsjkx.200600161

• 信息安全 • 上一篇    下一篇

基于随机森林的入侵检测分类研究

曹扬晨1, 朱国胜1, 祁小云2, 邹洁1   

  1. 1 湖北大学计算机与信息工程学院 武汉430062
    2 湖北大学化学化工学院 武汉430062
  • 出版日期:2021-06-10 发布日期:2021-06-17
  • 通讯作者: 朱国胜(zhuguosheng@hubu.edu.cn)
  • 作者简介:943407866@qq.com
  • 基金资助:
    赛尔网络下一代互联网技术创新项目;基于Cloud VR和IPv6的特殊作业教育培训系统项目(NGII20180507)

Research on Intrusion Detection Classification Based on Random Forest

CAO Yang-chen1, ZHU Guo-sheng1, QI Xiao-yun2, ZOU Jie1   

  1. 1 School of Computer and Information Engineering,Hubei University,Wuhan 430062,China
    2 School of Chemistry and Chemical Engineering,Hubei University,Wuhan 430062,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:CAO Yang-chen,born in 1996,postgraduate.Her main research interests include machine learning and network traffic analysis.
    ZHU Guo-sheng,born in 1972,Ph.D,professor.His main research interestsinclude next-generation internet and software-defined networks.
  • Supported by:
    CERNET Innovation Project and Special Operation Education and Training System Based on Cloud VR and IPv6(NGII20180507).

摘要: 为了有效地检测网络的攻击行为,机器学习被广泛用于对不同类型的入侵检测进行分类,传统的决策树方法通常用单个模型训练数据,容易出现泛化误差大、过拟合的问题。为解决该问题,文中引入并行式集成学习的思想,提出基于随机森林的入侵检测模型,由于随机森林中每棵决策树都有决策权,因此可以很好地提高分类的准确性。利用NSL-KDD数据集对入侵检测模型进行训练和测试,实验结果表明,该模型的准确率可达99.91%,具有非常好的入侵检测分类效果。

关键词: 机器学习, 决策树, 入侵检测, 随机森林

Abstract: In order to effectively detect the attack behavior of the network,the machine learning method are widely used to classify different types of network intrusion detection.The traditional decision tree methods usually use a single model to training data,which is prone to generalization errors and is prone to over-fitting.To solve this problem,this paper introduces the idea of parallel integrated learning,and proposes an intrusion detection model based on random fo-rest.Since each decision tree in the random fo-rest has decision-making power,it can improve the accuracy of classification very well.By using the NSL-KDD data set to train and test the intrusion detection model,the experimental results show that the accuracy rate can reach 99.91%,which shows that the model has a very good intrusion detection classification effect.

Key words: Decision tree, Intrusion detection, Machine learning, Random forest

中图分类号: 

  • TP181
[1] ZHOU Z H.Machine learning [M].Beijing:Tsinghua University Press,2016:27,75-84,178-181.
[2] GRIFFITHS W,HAJARGASHT G.On GMM estimation ofdistributions from grouped data[J].Economics Letters,2015,126:122-126.
[3] HE W H,LI T S,HUANG R W.Intrusion detection model based on Improved BP algorithm in cloud environment [J].Computer Technology and Development,2016,26(2):87-90.
[4] WANG M.Network intrusion detection system based on convolutional neural network [D].Beijing:Beijing University of Posts and Telecommunications,2018.
[5] HOU C,WANG Y,SHAN H,et al.Application and optimization of stochastic forest algorithm in intrusion detection system [J].Industrial Control Computer,2019,32(6):118-120,122.
[6] WANG T,CAI X,NITHYANAND R,et al.Effective attacksand provable defense for website fingerprinting[C]//Proc of the 23rd USENIX Security Symposium.2014:143-157.
[7] PANCHENKO A,LANZE F,ZINNEN A,et al.Website fingerprinting at Internet scale[C]//Proc of Network and Distributed Sytem Security Symposium.2016:1-15.
[8] GLENNAN T,LECKIEC C,ERFANI M S.Improved classification of known and unknown network traffic flows using semi-supervised machine learning[C]//Proc of Australasian Conference on Information Security and Privacy.2016:493-501.
[9] XIE G W,ILIOFOTOUS M,FALOUTSOS M,et al.SubFlow:Towards practical flow-level traffic classification[C]//Proc of International Conference on Communications.2012:2541-2545.
[10] CHEN Z Y,YU B W,ZHANG Y,et al.Automatic mobile appliction traffic identification by convolutional neural networks[C]//Proc of IEEE TrustCom/BigDataSE/ISPA.2016:301-307.
[11] NGUYEN T T T,ARMITAGE G,BRANCHP,et al.Timelyand continuous machine-learning-based classification for interactive IP traffic[J].IEEE/ACM Transaction on Networking,2012,20(6):1880-1894.
[12] WANG Y S,XIA S T.Overview of stochastic forest algorithm in integrated learning [J].Information and Communication Technology,2018,12(1):49-55.
[13] FANG K N,WU J B,ZHU J P,et al.Summary of random forest method research [J].Forum of Statistics and Information,2011,26(3):32-38.
[14] WEI J T,GAO D M.Research on Intrusion Detection System Based on information gain and random forest classifier [J].Journal of Zhongbei University (Natural Science EditionITION),2018,39(1):74-79,88.
[15] ZHU K,ZHANG Q.Application of machine learning in network intrusion detection [J].Data Collection and Processing,2017,32(3):479-488.
[16] ZHAO S,CHEN S H.Overview and Prospect of flow recognition technology based on machine learning [J].Computer Engineering and Science,2018,40(10):1746-1756.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[4] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[7] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[8] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[9] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[10] 王文强, 贾星星, 李朋.
自适应的集成定序算法
Adaptive Ensemble Ordering Algorithm
计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108
[11] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[12] 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏.
Grassberger熵随机森林在窃电行为检测的应用
Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection
计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032
[13] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[14] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[15] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!