计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220400151-6.doi: 10.11896/jsjkx.220400151

• 信息安全 • 上一篇    下一篇

基于PRF-RFECV特征优选的GA-LightGBM的网络安全态势评估

任高科1, 莫秀良2   

  1. 1 天津理工大学计算机科学与工程学院 天津 300384;
    2 天津市智能计算及软件新技术重点实验室 天津 300384
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 莫秀良(moxiuliang@163.com)
  • 作者简介:(935208706@qq.com)
  • 基金资助:
    国家基金面上-联合基金(U1536122);科技部“科技助力经济2020”重点专项(SQ2020YFF0413781);天津市科委重大专项(15ZXDSGX00030)

Network Security Situation Assessment for GA-LightGBM Based on PRF-RFECV Feature Optimization

REN Gaoke1, MO Xiuliang2   

  1. 1 School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China;
    2 Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology,Tianjin University of Technology,Tianjin 300384,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:REN Gaoke,born in 1996,master.His main research interests include network security situation awareness and so on. MO Xiuliang,born in 1969,associate professor,His main research interests include information security and artificial intelligence.
  • Supported by:
    National Funds-Joint Fund Projects(U1536122),Key Special Project of “Science and Technology Helps Economy 2020” of the Ministry of Science and Technology(SQ2020YFF0413781) and Major Project of Tianjin Science and Technology Commission(15ZXDSGX00030).

摘要: 目前,在网络安全领域中,传统机器学习模型存在训练时间过长和对冗余特征高敏感性的缺点,已然处理不了日益复杂的网络空间。为针对海量、高维的网络安全要素,提高网络安全态势评估的精度和效率,提出了一种基于PRF-RFECV特征优选的GA-LightGBM的网络安全态势评估模型。首先利用并行随机森林筛选出的特征重要度,然后结合带有交叉验证的递归特征消除选出最优特征集,最后利用遗传算法的全局搜索特性选取轻度级梯度提升机模型的最优参数后进行分类。实验仿真表明,该模型在准确率和F1分数上均优于传统的网络安全态势评估算法,且效率更高。

关键词: 网络安全态势, 轻量级梯度提升机, 随机森林, 遗传算法

Abstract: At present,in the field of cyber security,due to the shortcomings of long training time and high sensitivity to redundant features,traditional machine learning models have been unable to deal with the increasingly complex network space.To improve the accuracy and efficiency of network security situation awareness for massive and high-dimensional network security elements,a GA-LightGBM network security situation awareness model based on PRF-RFECV feature preference is proposed,which first uses parallel random forest to filter out feature importance,then combines recursive feature elimination with cross-validation to select the optimal feature set,and finally uses the global search property of genetic algorithm to select the optimal parameters of LightGBM model for classification.Experimental simulation shows that the model is more accurate and more efficient than the traditional network security situation awareness algorithm in terms of both accuracy and F1 score.

Key words: Network security situation, Light gradient boosting machine, Random forest, Genetic algorithm

中图分类号: 

  • TP393
[1]GONG J,ZHANG X,SHU Q,et al,Survey of network security situation awareness[J].Journal of Software,2017,28(4):1010-1026.
[2]DONG G,LI W,WANG S,et al.The assessment method of network security situation based on improved BP neural network[C]//International Conference on Computer Engineering and Networks.Cham:Springer,2018:67-76.
[3]HE Y.Assessment model of network security situation based on K nearest neighbor and support vector machine[J].Computer Engineering and Applications,2013,49(9):81-84.
[4]ZHAO D M,SONG H Q,ZHANG H B.Network security situation based on time factor and composite CNN structure[J].Computer Science,2021,48(12):349-356.
[5]TAVALLAEE M,BAGHERI E,LU W,et al.A de-tailed analysis of the KDD CUP 99 data set[C]//2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.IEEE,2009:1-6.
[6]MELL P,SCARFONE K,ROMANOSKYS.Common vulnerability scoring system[J].IEEE Security & Privacy,2006,4(6):85-89.
[7]State Council.National master plan for resp-onding to public emergencies[M].Beijing:China Legal Publishing House,2006:4-6.
[8]KE G,MENG Q,FINLEY T,et al.Lightgbm:Ahighly efficient gradient boosting decision tree[J].Advances in Neural Information Processing Systems,2017,30:6-8
[9]ZHOU J Y,HE P F,QIU R F,et al.Research on intrusion detection based on random Forest and gradient[J].Journal of Software,2021,32(10):3254-3265.
[10]SCHONLAU M,ZOU R Y.The random forest algorithm for statistical learning[J].The Stata Journal,2020,20(1):3-29.
[11]NAGHIBI S A,HASHEMI H,BERNDTSSONR,et al.Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors[J].Journal of Hydrology,2020,589:125197.
[12]SHANG Q,FENG L,GAO S.A hybrid method for traffic incident detection using random forest-recursive feature elimination and long short-term memory network with bayesian optimization algorithm[J].IEEE Access,2020,9:1219-1232.
[13]KILIÇ H,YÜZGEÇ U.Tournament selection based antlion optimization algorithm for solving quadratic assignment problem[J].Engineering Science and Technology,an International Journal,2019,22(2):673-691.
[14]WANG J H,DAN Z L,et al.NetWork securety situation assessment based on genetic optimized PNN neural netword[J].Computer Science,2021,48(06):338-342.
[15]OPITZ J,BURST S.Macro f1 and macro f1[J].arXiv:1911.03347,2019.
[16]SHAH K,PATEL H,SANGHVI D,et al.A comparative analysis of logistic regression,random forest and KNN models for the text classification[J].Augmented Human Research,2020,5(1):1-16.
[17]TAO P,SUN Z,SUN Z.An improved intrusion detection algorithm based on GA and SVM[J].IEEE Access,2018,6:13624-13631.
[18]SHI Q,KANG J,WANG R,et al.A framework of intrusion detection system based on Bayesian network in IoT[J].International Journal of Performability Engineering,2018,14(10):2280-2293.
[19]DONG G,LI W,WANG S,et al.The assessment method of network security situation based on improved BP neural network[C]//International Conference on Computer Engineering and Networks.Cham:Springer,2018:67-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!