计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 375-379.

• 信息安全 • 上一篇    下一篇

基于XGBoost算法的Webshell检测方法研究

崔艳鹏1,2,史科杏1,胡建伟1,2   

  1. 西安电子科技大学网络与信息安全学院 西安7100711
    西安电子科技大学网络行为研究中心 西安7100712
  • 出版日期:2018-06-20 发布日期:2018-08-03
  • 作者简介:崔艳鹏(1978-),女,博士,副教授,主要研究方向为网络安全;史科杏(1988-),男,硕士,主要研究方向为网络安全、数据库技术,E-mail:skxde@163.com(通信作者);胡建伟(1973-),男,博士,副教授,主要研究方向为网络安全、信息对抗。

Research of Webshell Detection Method Based on XGBoost Algorithm

CUI Yan-peng1,2,SHI Ke-xing1,HU Jian-wei1,2   

  1. School of Network and Information Security,Xidian University,Xi’an 710071,China1
    Network Behavior Research Center,Xidian University,Xi’an 710071,China2
  • Online:2018-06-20 Published:2018-08-03

摘要: 为解决加密型Webshell与非加密型Webshell的代码特征不统一、难以提取的问题,提出一种基于XGBoost算法的Webshell检测方法。首先,对Webshell进行功能分析,发现绝大部分Webshell都具有代码执行、文件操作、数据库操作和压缩与混淆编码等特点,这些特征全面地描述了Webshell的行为。因此,对于非加密型的Webshell,将其主要特征划分为相关函数出现的次数。对于加密型的Webshell,根据代码的静态特性,将文件重合指数、信息熵、最长字符串长度、压缩比4个参数作为其特征。最后,将两种特征统一起来作为Webshell特征,改善了Webshell特征覆盖不全的问题。实验结果表明,所提方法能有效地对两种Webshell进行检测;与传统的单一类型Webshell检测方法相比,该方法提高了Webshell检测的效率与准确率。

关键词: Webshell检测, Web安全, XGBoost算法, 机器学习

Abstract: To solve problem of uniform code characteristics and difficulty to extract of the encrypted Webshell and non-encrypted Webshell,this paper proposed a Webshell detection method based on XGBoost algorithm.First of all,this paper analyzed features of Webshell,and found that most of the Webshell have code execution,file operations,database operations,compression,obfuscation coding and so on,which describe the behaviors of Webshell comprehensively.Therefore,for non-encrypted Webshell,its main feature is divided into the number of occurrences of correlation functions.For encrypted Webshell,according to the statistical characteristics of the code,file coincidence index,information entropy,the length of the longest string,compression ratio are taken as four parameters as its features.Finally,these two type of features are gregarded together as a Webshell features,improving the problem of lack of Webshell feature coverage.The experimental results show that the proposed method can achieve high performance,compared with the traditional single-type Webshell detection,it improves the efficiency and accuracy of Webshell detection.

Key words: Machine learning, Web security, Webshell detection, XGBoost algorithm

中图分类号: 

  • TP393
[1]张红瑞.Webshell原理分析与防范实践[J].现代企业教育,2013(20):254-255.
[2]2016年中国互联网网络安全报告[R/OL].http://www.cert.org.cn/publish/main/upload/File/2016_cncert_report.pdf.
[3]胡建康,徐震,马多贺,等.基于决策树的Webshell检测方法研究[J].网络新媒体技术,2012,1(6):15-19.
[4]袁勋,吴秀清,洪日昌,等.基于主动学习SVM分类器的视频分类[J].中国科学技术大学学报,2009,39(5):473-478.
[5]YAO X.Large and Medium-sized Network Intrusions Cases Research[M].Publishing House of Electronics Industry,2010:301-310.
[6]QUINLAN J R.C4.5:programs for machine learning[M].San Francisco:Morgan Kaufmann,1993.
[7]HOU Y T,CHANG Y M,CHEN T H.Malicious web content detection by machine learning[J].Expert Systems with Applications,2010,37(1):55-60.
[8]胡必伟.基于贝叶斯理论的Webshell检测方法研究[J].科技广场,2016(6):66-70.
[9]安晓瑞.ASP网站中asp一句话木马的安全性问题及防范措施的研究[J].首都师范大学学报(自然科学版),2014,35(1):39-43.
[10]OSUNA E,FREUND R,GIROSI F.An improved training algorithm for support vector machines[C]∥Proceedings of IEEE Workshop on Neural Networks for Signal Processing.Amelia Island,USA:IEEE Press,1997:276-285.
[11]谢清霞,于灏,于海妹,等.重合指数的研究[EB/OL].http://www.docin.com/P-147014653.html.
[12]孟正,梅瑞,张涛,等.Linux下基于SVM分类器的WebShell检测方法研究[J].信息网络安全,2014(5):5-9.
[13]叶飞,龚俭,杨望.基于支持向量机的Webshell黑盒检测[J].南京航空航天大学学报,2015,47(6):924-930.
[14]贾文超,戚兰兰,施凡,等.采用随机森林改进算法的 Webshell 检测方法[J/OL].
[2017-03-31].http://www.arocmag.com/article/02-2018-04-056.html.
[15]DENG L Y,DONG L L,CHEN Y H,et al.Lexical analysis for the WebShell attacks[C]∥The International Symposium on Computer,Consumer and Control,IEEE Computer Society.2016:579-582.
[16]石刘洋,方勇.基于Web日志的Webshell检测方法研究[J].信息安全研究,2016,2(1):66-73.
[17]NeoPI:Detection of web shells using statistical methods[EB/OL].https://github.com/Neohapsis/NeoPI.
[18]A Gentle Introduction to XGBoost for Applied MachineLearning[EB/OL].https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/.
[19]XGBoost:A Scalable Tree Boosting System[EB/OL].https://arxiv.org/abs/1603.02754.
[20]机器学习中的算法(1)-决策树模型组合之随机森林与GBDT[EB/OL].http://www.cnblogs.com/LeftNotEasy/archive/2011/03/07/random-forest-and-gbdt.html.
[21]李航.统计学习方法[M].北京:清华大学出版社,2012.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[8] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[9] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[10] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[11] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[12] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[13] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[14] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
[15] 张潆藜, 马佳利, 刘子昂, 刘新, 周睿.
以太坊Solidity智能合约漏洞检测方法综述
Overview of Vulnerability Detection Methods for Ethereum Solidity Smart Contracts
计算机科学, 2022, 49(3): 52-61. https://doi.org/10.11896/jsjkx.210700004
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!