计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 363-367.doi: 10.11896/jsjkx.200100064
赵瑞杰1, 施勇1,2, 张涵1, 龙军1, 薛质1,2
ZHAO Rui-jie1, SHI Yong1,2, ZHANG Han1, LONG Jun1, XUE Zhi1,2
摘要: 随着互联网的飞速发展,网络攻击行为日益频繁。Webshell是常见的网络攻击方式,而传统的检测手段已无法应对复杂灵活的变种 Webshell攻击。为解决这一问题,提出了一种基于TF-IDF的Webshell文件检测方法。系统首先对不同类型的Webshell文件进行分类,并对不同文件进行相应的预处理转码,以降低混淆干扰技术对检测的影响;随后建立词袋模型,并采用TF-IDF算法加权提取相关特征;最后使用XGBoost算法训练得到检测模型。与传统机器学习算法进行的10折交叉验证对比测试表明,使用TF-IDF算法预处理后结合XGBoost算法的Webshell文件检测模型性能出色,检测效果相较于传统检测方法在准确率、精确率、召回率等方面均有所提高,同时具备更强的鲁棒性与泛化能力,其中对PHP类型文件检测的准确率达到了98.09%,对JSP类型文件检测准确率达到了97.09%。
中图分类号:
[1] SHI L Y,FANG Y.Research on Webshell Detection MethodBased on Web Log [J].Information Security Research,2016,2(1):66-73. [2] DAI H,LI J,LU X Y,et al.Machine learning algorithm for intelligent detection of WebShell [J].Journal of Network and Information Security,2017,3(4):51-57. [3] GOLDBERG D E.Genetic algorithms in search,optimization and machine learning[M].Addison-wesley Longman Publishing Co.,1989. [4] BUCZAK A L,GUVEN E.A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection[J].IEEE Communications Surveys & Tutorials,2017,18(2):1153-1176. [5] XIAO H,RASUL K,VOLLGRAF R.Fashion-MNIST:a Novel Image Dataset for Benchmarking Machine Learning Algorithms[J].arXiv:1708.07747,2017. [6] STOLFO S J,LEE W.A data mining framework for constructing features and models for intrusion detection systems (computer security,network security)[M].Columbia University 2960 Broadway,1999:227-261. [7] YE F,GONG J,YANG W.Webshell black box detection based on support vector machine [J].Journal of Nanjing University of Aeronautics and Astronautics,2015(6):924-930. [8] FU J M,LI L,WANG Y J.Webshell File Detection Based on CNN [J].Journal of Zhengzhou University (Science Edition),2019,51(2):4-11. [9] QI J J.Stealing WebShell Detection Method [J].Computer and Network,2015(13):38-39. [10] MEI R,ZHANG T.Research on WebShell detection methodbased on SVM classifier under Linux [J].Information Network Security,2014(5):5-9. [11] CHI Y P,LING Z T,WANG Z Q,et al.Intrusion Detection System Based on Support Vector Machine and Adaboost [J/OL].Computer Engineering,2019,45(10):183-188. [12] WANG Y.Design and implementation of pedestrian detection algorithm based on random gradient boosting decision tree [D].Hangzhou:Zhejiang University,2017. [13] CHEN J,LI K,TANG Z,et al.A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment[J].IEEE Transactions on Parallel & Distributed Systems,2017,PP(99):1-1. [14] TU X Y,YU L,GENG Z C,et al.A Method for Early Warning of Leakage Accidents Based on Large-scale Time Series [J].Information Technology,2018,42(12):1-4. [15] CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:785-794. [16] ZHENG H,YUAN J,CHEN L.Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation[J].Energies,2017,10(8):1168. [17] LI Y F,WANG Y,LI J H.Repeatability of several cross-validation tests [J].Journal of Taiyuan Normal University (Natural Science Edition),2013(4):46-49. [18] WANG K,HOU Z R,WANG C L.Network Intrusion Detection Based on Cross-Validation SVM [J].Journal of Test and Measurement Technology,2010,24(5):419-423. [19] GUTLEIN M,HELMA C,KARWATH A,et al.A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR[J].Molecular Informatics,2013,32(5/6):516-528. |
[1] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[2] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[3] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[4] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[5] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[6] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[7] | 单晓英, 任迎春. 基于改进麻雀搜索优化支持向量机的渔船捕捞方式识别 Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm 计算机科学, 2022, 49(6A): 211-216. https://doi.org/10.11896/jsjkx.220300216 |
[8] | 王文强, 贾星星, 李朋. 自适应的集成定序算法 Adaptive Ensemble Ordering Algorithm 计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108 |
[9] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[10] | 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏. Grassberger熵随机森林在窃电行为检测的应用 Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection 计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032 |
[11] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[12] | 侯夏晔, 陈海燕, 张兵, 袁立罡, 贾亦真. 一种基于支持向量机的主动度量学习算法 Active Metric Learning Based on Support Vector Machines 计算机科学, 2022, 49(6A): 113-118. https://doi.org/10.11896/jsjkx.210500034 |
[13] | 高元浩, 罗晓清, 张战成. 基于特征分离的红外与可见光图像融合算法 Infrared and Visible Image Fusion Based on Feature Separation 计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148 |
[14] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[15] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
|