基于XGBoost算法的Webshell检测方法研究

Abstract

Abstract: To solve problem of uniform code characteristics and difficulty to extract of the encrypted Webshell and non-encrypted Webshell,this paper proposed a Webshell detection method based on XGBoost algorithm.First of all,this paper analyzed features of Webshell,and found that most of the Webshell have code execution,file operations,database operations,compression,obfuscation coding and so on,which describe the behaviors of Webshell comprehensively.Therefore,for non-encrypted Webshell,its main feature is divided into the number of occurrences of correlation functions.For encrypted Webshell,according to the statistical characteristics of the code,file coincidence index,information entropy,the length of the longest string,compression ratio are taken as four parameters as its features.Finally,these two type of features are gregarded together as a Webshell features,improving the problem of lack of Webshell feature coverage.The experimental results show that the proposed method can achieve high performance,compared with the traditional single-type Webshell detection,it improves the efficiency and accuracy of Webshell detection.

Key words: Machine learning, Web security, Webshell detection, XGBoost algorithm

CLC Number:

TP393

CUI Yan-peng,SHI Ke-xing,HU Jian-wei. Research of Webshell Detection Method Based on XGBoost Algorithm[J].Computer Science, 2018, 45(6A): 375-379.

References

[1]张红瑞.Webshell原理分析与防范实践[J].现代企业教育,2013(20):254-255.
[2]2016年中国互联网网络安全报告[R/OL].http://www.cert.org.cn/publish/main/upload/File/2016_cncert_report.pdf.
[3]胡建康,徐震,马多贺,等.基于决策树的Webshell检测方法研究[J].网络新媒体技术,2012,1(6):15-19.
[4]袁勋,吴秀清,洪日昌,等.基于主动学习SVM分类器的视频分类[J].中国科学技术大学学报,2009,39(5):473-478.
[5]YAO X.Large and Medium-sized Network Intrusions Cases Research[M].Publishing House of Electronics Industry,2010:301-310.
[6]QUINLAN J R.C4.5:programs for machine learning[M].San Francisco:Morgan Kaufmann,1993.
[7]HOU Y T,CHANG Y M,CHEN T H.Malicious web content detection by machine learning[J].Expert Systems with Applications,2010,37(1):55-60.
[8]胡必伟.基于贝叶斯理论的Webshell检测方法研究[J].科技广场,2016(6):66-70.
[9]安晓瑞.ASP网站中asp一句话木马的安全性问题及防范措施的研究[J].首都师范大学学报(自然科学版),2014,35(1):39-43.
[10]OSUNA E,FREUND R,GIROSI F.An improved training algorithm for support vector machines[C]∥Proceedings of IEEE Workshop on Neural Networks for Signal Processing.Amelia Island,USA:IEEE Press,1997:276-285.
[11]谢清霞,于灏,于海妹,等.重合指数的研究[EB/OL].http://www.docin.com/P-147014653.html.
[12]孟正,梅瑞,张涛,等.Linux下基于SVM分类器的WebShell检测方法研究[J].信息网络安全,2014(5):5-9.
[13]叶飞,龚俭,杨望.基于支持向量机的Webshell黑盒检测[J].南京航空航天大学学报,2015,47(6):924-930.
[14]贾文超,戚兰兰,施凡,等.采用随机森林改进算法的 Webshell 检测方法[J/OL].
[2017-03-31].http://www.arocmag.com/article/02-2018-04-056.html.
[15]DENG L Y,DONG L L,CHEN Y H,et al.Lexical analysis for the WebShell attacks[C]∥The International Symposium on Computer,Consumer and Control,IEEE Computer Society.2016:579-582.
[16]石刘洋,方勇.基于Web日志的Webshell检测方法研究[J].信息安全研究,2016,2(1):66-73.
[17]NeoPI:Detection of web shells using statistical methods[EB/OL].https://github.com/Neohapsis/NeoPI.
[18]A Gentle Introduction to XGBoost for Applied MachineLearning[EB/OL].https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/.
[19]XGBoost:A Scalable Tree Boosting System[EB/OL].https://arxiv.org/abs/1603.02754.
[20]机器学习中的算法(1)-决策树模型组合之随机森林与GBDT[EB/OL].http://www.cnblogs.com/LeftNotEasy/archive/2011/03/07/random-forest-and-gbdt.html.
[21]李航.统计学习方法[M].北京:清华大学出版社,2012.

Related Articles 15

[1]	LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[2]	NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[3]	LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[4]	ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[5]	HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[6]	CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[7]	LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[8]	ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[9]	WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[10]	XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[11]	YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12]	XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
[13]	YAO Xiao-ming, DING Shi-chang, ZHAO Tao, HUANG Hong, LUO Jar-der, FU Xiao-ming. Big Data-driven Based Socioeconomic Status Analysis:A Survey [J]. Computer Science, 2022, 49(4): 80-87.
[14]	LI Ye, CHEN Song-can. Physics-informed Neural Networks:Recent Advances and Prospects [J]. Computer Science, 2022, 49(4): 254-262.
[15]	ZHANG Xiao-qing, FANG Jian-sheng, XIAO Zun-jie, CHEN Bang, Risa HIGASHITA, CHEN Wan, YUAN Jin, LIU Jiang. Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image [J]. Computer Science, 2022, 49(3): 204-210.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research of Webshell Detection Method Based on XGBoost Algorithm

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0