计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 464-467.doi: 10.11896/jsjkx.200900101

• 信息安全 • 上一篇    下一篇

基于特征重要度二次筛选的DDoS攻击随机森林检测方法

李娜娜1, 王勇1, 周林1, 邹春明2, 田英杰3, 郭乃网3   

  1. 1 上海电力大学计算机科学与技术学院 上海200090
    2 公安部第三研究所 上海200031
    3 国网上海市电力公司电力科学研究院 上海200120
  • 出版日期:2021-06-10 发布日期:2021-06-17
  • 通讯作者: 王勇(wy616@126.com)
  • 作者简介:764529188@qq.com
  • 基金资助:
    国家自然科学基金面上项目(61772327);上海自然科学基金面上项目(20ZR1455900);上海市科委科技创新行动计划(18511105700);上海市科委电力人工智能工程技术研究中心项目(19DZ2252800);奇安信大数据协同安全国家工程实验室开放课题(QAX-201803);浙江大学工业控制技术国家重点实验室开放式基金(ICT1800380)

DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance

LI Na-na1, WANG Yong1, ZHOU Lin1, ZOU Chun-ming2, TIAN Ying-jie3, GUO Nai-wang3   

  1. 1 College of Computer Science and Technology,Shanghai University of Electric Power,Shanghai 200090,China
    2 The Third Research Institute of Ministry of Public Security,Shanghai 200031,China
    3 Institute of Electric Power Research,State Grid Shanghai Electric Power Company,Shanghai 200120,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:LI Na-na,born in 1992,postgraduate.Her main research interests include robot safety and Informationsecurity.
    WANG Yong,born in 1973,Ph.D,professor.His main research interests include power system virus analysis and defense.
  • Supported by:
    General Project of National Natural Science Foundation of China(61772327),General Project of Shanghai Natural Science Foundation of China (20ZR1455900),Shanghai Science and Technology Commission Science and Technology Innovation Action Plan(18511105700),Shanghai Science and Technology Commission Power Artificial Intelligence Engineering Technology Research Center Project(19DZ2252800),Qi'anxin Big Data Collaborative Security National Engineering Laboratory Open Project(QAX-201803) and Open-end Fund of State Key Laboratory of Industrial Control Technology,Zhejiang University(ICT1800380).

摘要: 特征选择是攻击检测算法中的一种重要方法,该方法多采用交叉验证递归特征消除(Recursive Feature Elimination with Cross-Validation,RFECV)技术,并通常结合机器学习算法使用。但该算法多用于选取单模型特征,其性能也极易受特征量、学习器的变化而波动,因其计算量大,该算法的分类准确率也仍需提高。针对上述问题,文中提出了一种基于特征重要度二次筛选的DDoS攻击随机森林检测方法。首先,该算法对原始数据集进行预处理并提取特征;其次,该算法为了从所选模型中选择最相关的变量,使用RF变量重要度准则,利用随机森林的重要性评分对变量进行排序;然后,在随机森林特征排序的基础上,对变量计算累积重要性并得到最重要变量;接着,使用所筛选出的最重要变量再次进行训练以生成分类模型,从而得出一组新的重要变量并将其定义为当前变量;最后,通过重要度准则,计算累积重要性来得到最终的最佳变量,从而有效地去除异常点,避免局部最优,进而实现对DDOS攻击的精准分类检测。实验结果表明,该方法具有较高的准确度和精确度,能够实现对正常流量以及各种DDoS攻击流量的精准分类,适用于在大数据下检测DDoS攻击。

关键词: DDoS 攻击检测, 机器学习, 随机森林, 特征提取, 重要度准则

Abstract: Feature selection is an important method for attack detection algorithms.This method mostly uses cross-validation recursive feature elimination (Recursive Feature Elimination with Cross-Validation,RFECV) technology,and is usually combined with machine learning algorithms.However,this algorithm is mostly used to select single-model features,and its performance is also very susceptible to fluctuations due to changes in feature quantities and learners.Due to the large amount of calculation,the classification accuracy of this algorithm still needs to be improved.In response to the above problems,this paper proposes a random forest detection method for DDoS attacks based on the secondary screening of feature importance.Firstly,the algorithm preprocesses the original data set and extracts features.Secondly,in order to select the most relevant variables from the selected model,the algorithm uses the RF variable importance criterion and the random forest importance score to rank the variables.Then,on the basis of random forest feature ranking,the cumulative importance of the variables is calculated and the most important variables are obtained.Then,the most important variables selected are used for training again to generate a classification model,and a new set of important variables is defined as the current variable.Finally,the final optimal variable is obtained through the importance criterion and the cumulative importance again,which effectively removes the abnormal points and avoids the local optimum,thereby realizing accurate classification and detection of DDOS attacks.Experimental results show that this method has high accuracy and precision,can accurately classify normal traffic and various DDoS attack traffic,and is suitable for detecting DDoS attacks under big data.

Key words: DDoS attack detection, Feature extraction, Importance criterion, Machine learning, Random forests

中图分类号: 

  • TP309.2
[1] WANG C,ZHENG J,LI X Y.Research on DDoS attacks detection based on RDF-SVM[C]//International Conference on Intelligent Computation Technology and Automation (ICICTA).2017:161-165.
[2] ZHANG W A,HONG Z,ZHU J W,et al.A survey of network intrusion detection methods for industrial control systems[J].Control and Decision,2019,34(11):2277-2288.
[3] XU J Z,WU Z H,XU Y,et al.Face recognition combiningPCA,LDA and SVM algorithms[J].Computer Engineering and Applications,2019,55(18):34-37.
[4] LI Z Q,DU J Q,NIE B,et al.Summary of Feature Selection Methods[J].Computer Engineering and Applications,2019,55(24):10-19.
[5] GAO N,FENG F D,XIANG J.A data-mining based dos detection technique[J].Jisuanji Xuebao(Chinese Journal of Computers),2006,29(6):944-951.
[6] PEI J T.DDOS Attack Detection based on machine learning and Big Data Real-time Computing analysis [D].Beijing:Beijing University of Technology,2019.
[7] LI M Y,TANG X Y,CHENG J R,et al.Random forest DDoS attack detection method based on combination correlation [J].Journal of Zhengzhou University (Science Edition),2019,51(2):23-28.
[8] SYLVESTER E,BENTZEN P,BRADBURY I R,et al.Applications of random forest feature selection for fine-scale genetic population assignment[J].Evolutionary Applications,2018,11(2):153-165.
[9] ZHAO L,CHEN Z,HU Y,et al.Distributed feature selection for efficient economic big data analysis [J].IEEE Transactions on Big Data,2008,32(2):164-176.
[10] YANG C C,XU X,HUAN J,et al.Feature selection method of student portrait based on random forest[J].Computer Engineering and Design,2019,40(10):2827-2834.
[11] FILHO F,SILVEIRA F,JUNIOR A,et al.Smart Detection:An Online Approach for DoS/DDoS Attack Detection Using Machine Learning[J].Security and Communication Networks,2019(12):1-15.
[12] Cisco systems netflow services export version 9[EB/OL].ht-tps://www.rfc-editor.org/info/rfc3954.
[13] PARK S H,GOO J M,JO C H.Receiver operating characteristic(ROC) curve:practical review for radiologists[J].Korean Journal of Radiology,2004,5(1):11-18.
[14] MARTIN D,POWERS W.Evaluation:from precision,recall and F-measure to ROC,informedness,markedness and correlation[J].Journal of Machine Learning Technologies,2011,2(1):37-63.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[4] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[5] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[8] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[9] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[10] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[11] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[12] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[13] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[14] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[15] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!