基于CNN的恶意Web请求检测技术

doi:10.11896/jsjkx.181202455

摘要/Abstract

摘要： 目前,基于卷积神经网络的Web恶意请求检测技术领域内只有针对URL部分进行恶意检测的研究,并且各研究对原始数据的数字化表示方法不同,这会造成检测效率和检测准确率较低。为提高卷积神经网络在Web恶意请求检测领域的性能,在现有工作的基础上将其他多个HTTP请求参数与URL合并,将数据集HTTP data set CSIC 2010 和 DEV_ACCESS作为原始数据,设计对比实验。首先采用6种数据数字向量化方法对字符串格式的原始输入进行处理;然后将其分别输入所设计的卷积神经网络,训练后可得到6个不同的模型,同时使用相同的训练数据集对经典算法HMM,SVM和RNN进行训练,得到对照组模型;最后在同一验证集上对9个模型进行评估。实验结果表明,采用多参数的Web恶意请求检测方法将词汇表映射与卷积神经网络内部嵌入层相结合对原始数据进行表示,可使卷积神经网络取得99.87%的准确率和98.92%的F1值。相比其他8个模型,所提方法在准确率上提升了0.4~7.7个百分点,在F1值上提升了0.3~13个百分点。实验充分说明,基于卷积神经网络的多参数Web恶意请求检测技术具有明显的优势,且使用词汇表映射和网络内部嵌入层对原始数据进行处理能使该模型取得最佳的检测效果。

关键词: Web安全, 恶意Web请求检测, 卷积神经网络, 深度学习

Abstract: At present,in the field of Web malicious requests detection technology based on convolutional neural network,malicious requests are detected only for the URL part,and each research has different digital representation methods for the original data,which will result in low detection efficiency and detection accuracy.In order to improve the performance of the convolutional neural network in web malicious request detection,this paper introduced other HTTP request parameters to be merged with URLs,and used the dataset HTTP data set CSIC 2010 and DEV_ACCESS as the raw data.The comparative experiment first used six digital representation methods to represent the raw input of the string format,and then put them to the designed convolutional neural network to obtain six different models.At the same time,the classical algorithms HMM,SVM and RNN were trained on the same training data set to obtain the control models.Finally,the nine models were evaluated on the same test data set.The experimental results show that in the multi-parameter Web malicious request detection method,the convolutional neural network using the combination of the vocabulary mapping and the internal embedding layer to represent the original data achieves 99.87% accuracy and 98.92.% F1 score,therefore,the accuracy is improved by 0.4~7.7 percentage points and the F1 value is improved by 0.3~13 percentage points.The experiment fully demonstrate that the multi-parameter Web malicious request detection technology based on convolutional neural network has obvious advantages,and using the vocabulary mapping and the internal embedding layer of the network to represent the original data can make the model achieve the best detection performance.

Key words: Convolutional neural network, Deep learning, Malicious Web request detection, Web security

中图分类号:

TP183

崔艳鹏,刘咪,胡建伟. 基于CNN的恶意Web请求检测技术[J]. 计算机科学, 2020, 47(2): 281-286. https://doi.org/10.11896/jsjkx.181202455

CUI Yan-peng,LIU Mi,HU Jian-wei. Malicious Web Request Detection Technology Based on CNN[J]. Computer Science, 2020, 47(2): 281-286. https://doi.org/10.11896/jsjkx.181202455

参考文献

[1]ATIENZA D,HERRERO Á,CORCHADO E.Neural analysis of http traffic for web attack detection[C]∥Computational Intelligence in Security for Information Systems Conference.Cham:Springer,2015:201-212.
[2]ZHANG M,XU B,BAI S,et al.A Deep Learning Method to Detect Web Attacks Using a Specially Designed CNN[C]∥International Conference on Neural Information Processing.Springer,2017:828-836.
[3]SAXE J,BERLIN K.eXpose:A character-level convolutional neural network with embeddings for detecting malicious URLs,file paths and registry keys[J].arXiv:1702.08568,2017.
[4]KUSEY A.Detecting Malicious Requests with Keras & Tensorflow[EB/OL].(2017-09-12)[2018-06-10].https://medium.com/slalom-engineering/detecting-malicious-requests-with-keras-tensorflow-5d5db06b4f28.
[5]LE H,PHAM Q,SAHOO D,et al.URLNet:Learning a URL Representation with Deep Learning for Malicious URL Detection[J].arXiv:1802.03162,2018.
[6]CHEN Y C,LI Y J,TSENG A,et al.Deep learning for malicious flow detection[C]∥Personal,Indoor,and Mobile Radio Communications.2017:1-7.
[7]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[8]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:1-9.
[9]KIM Y.Convolutional neural networks for sentence classification[J].arXiv:1408.5882,2014.
[10]KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A convolutional neural network for modelling sentences[J].arXiv:1404.2188,2014.
[11]ZHANG X,ZHAO J,LECUN Y.Character-level convolutional networks for text classification[C]∥Advances in Neural Information Processing Systems.2015:649-657.
[12]DOS SANTOS C,GATTI M.Deep convolutional neural net-works for sentiment analysis of short texts[C]∥Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.2014:69-78.
[13]SEVERYN A,MOSCHITTI A.Unitn:Training deep convolutional neural network for twitter sentiment classification[C]∥Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015).2015:464-469.
[14]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[15]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.
[16]ATHIWARATKUN B,STOKES J W.Malware classification with LSTM and GRU language models and a character-level CNN[C]∥IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2017:2482-2486.
[17]HENDLER D,KELS S,RUBIN A.Detecting Malicious PowerShell Commands using Deep Neural Networks[C]∥Proceedings of the 2018 on Asia Conference on Computer and Communications Security.ACM,2018:187-197.
[18]JOHNSON R,ZHANG T.Effective use of word order for text categorization with convolutional neural networks[J].arXiv:1412.1058,2014.
[19]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]∥Advances in Neural Information Processing Systems.2013:3111-3119.
[20]GIMÉNEZ C T,VILLEGAS A P,MARAÑÓ N G Á.HTTP data set CSIC 2010[J].Information Security Institute of CSIC (Spanish Research National Council),2010.
[21]JOSEPH A D,LASKOV P,ROLI F,et al.Machine learning methods for computer security (Dagstuhl Perspectives Workshop 12371)[C]∥Dagstuhl Manifestos.Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.2013:3.

相关文章 15

[1]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[3]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[5]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[6]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[7]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[13]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[14]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[15]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed