计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230300191-5.doi: 10.11896/jsjkx.230300191
王丽1,2,3, 陈刚1,3, 夏明山1,2, 胡皓1
WANG Li1,2,3, CHEN Gang1,3, XIA Mingshan1,2, HU Hao1
摘要: 现有的基于深度学习模型的词嵌入方法用于Web异常检测时,通常将语料库中没有出现的未知词汇(Out of Vocabulary,OOV)设置为unknown,并赋予零或随机向量输入到模型中进行训练,未考虑未知词汇在Web请求语句中的上下文关系。同时,在Web系统代码开发过程中,基于个人习惯并为了增加代码的可读性,程序员设计的请求路径代码往往存在一定的模式。因此,考虑到Web请求的模式和单词语义间的相关性,研究基于Word2vec的动态未知词表示方法DUWe(Dynamic Unknown Word Embedding),该方法通过分析Web请求路径中单词上下文的关系来赋予未知词向量的表示内容。在CSIC-2010和WAF Dataset数据集上的实验评估表明,增加未知词表示方法比仅用Word2vec静态特征提取方法具有更好的性能,同时在准确性、精准率、召回率和F1-Score方面均有提高,在训练时间上最大降低1.14倍。
中图分类号:
[1]The Open Web Application Security Project.OWASP Top 10:2021,[online]Available:https://owasp.org/www-project-top-ten/. [2]PROKHORENKO V,CHOO K K R,ASHMAN H.Web application protection techniques:a taxonomy[J].J.Netw.Comput.Appl.,2016,60:95-112. [3]KUMAR K N,SUKUMARAN S.A survey on network intrusion detection system techniques[J].Int.J.Adv.Technol.Eng.Explor.,2018,5(47):385-393. [4]LEBRET R P.Word embeddings for natural language proces-sing[R].Technical Report EPFL,2016. [5]KIM Y,JERNITE Y,SONTAG D,et al.Character-aware neural language models[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016. [6]KRUEGEL C,VIGNA G.Anomaly detection of web-based attacks[C]//10th Conference on Computer and Communication Security.ACM,USA,2003:251-261. [7]KUEGEL C,VIGNA G,ROBERTSON W.A multi-model ap-proach to the detection of web-based attacks[J].Computer Networks,2005,48(5). [8]ROBERTSON W,VIGNA G,KRUEGEL C,et al.Using generalization and characterization techniques in the anomaly-based detection of web attacks[C]//Annual Network and Distributed System Security Symposium(NDSS).2006. [9]TEKEREK A,GEMCI C,BAY O F.Development of a hybrid web application firewall to prevent web based attacks[C]//2014 IEEE 8th International Conference on Application of Information and Communication Technologies(AICT).2014:1-4. [10]APPLEBAUM S,GABER T,AHMED A.Signature-based andMachine-Learning-based Web Application Firewalls:A Short Survey[J].Procedia Computer Science,2021,189:359-367. [11]GAO Y,MA Y,LI D.Anomaly detection of malicious users’ behaviors for web applications based on web logs[C]//2017 IEEE 17th International Conference on Communication Technology(ICCT).2017:1352-1355. [12]SUNEETHA K R,KRISHNAMOORTHY K R.IdentifyingUser Behavior by Analyzing Web Server Access Log File[J].International Journal of Computer Science & Network Security,2009,9(4):327-332. [13]FENG Q Y.Research on Log Anomaly Detection and User Behavior Analysis based on Web Application[D].Guangzhou:South China University of Technology,2019. [14]LIANG J,ZHAO W,YE W.Anomaly-Based Web Attack Detection:A Deep Learning Approach[C]//Proceedings of the 2017 VI International Conference on Network Communication and Computing.2017:80-85. [15]JEMAL I,HADDAR M A,CHEIKHROUHOU O,et al.M-CNN:A New Hybrid Deep Learning Model for Web Security[C]//2020 IEEE/ACS 17th International Conference on Computer Systems and Applications(AICCSA).Antalya,Turkey,2020:1-7. [16]LE H,PHAM Q,SAHOO D,et al.URLNet:Learning a URL representation with deep learning for malicious URL detection[J].arXiv:1802.03162,2018. [17]ITO M,IYATOMI H.Web application firewall using character-level convolutional neural network[C]//2018 IEEE 14th International Colloquium on Signal Processing and Its Applications(CSPA).IEEE,2018:103-106. [18]SEYYAR Y E,YAVUZ A G,ÜNVER H M.Detection of Web Attacks Using the BERT Model[C]//2022 30th Signal Processing and Communications Applications Conference(SIU).Safranbolu,Turkey,2022:1-4. [19]BOKOLO B G,CHEN L,LIU Q.Detection of Web-Attack using DistilBERT,RNN,and LSTM[C]//2023 11th International Symposium on Digital Forensics and Security(ISDFS).2023:1-6. [20]TRAN A T,LUONG T D,PHAM X S,et al.Deep Models with Differential Privacy for Distributed Web Attack Detection[C]//2022 14th International Conference on Knowledge and Systems Engineering(KSE).Nha Trang,Vietnam,2022:1-6. [21]SAXE J,BERLIN K.eXpose:A Character-Level ConvolutionalNeural Network with Embeddings For Detecting Malicious URLs[J].arXiv:1702.08568,2017. [22]WU J.Convolutional Neural Network with Character Embeddings for Malicious Web Request Detection[C]//2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:622-627. [23]WANG J,ZHOU Z,CHEN J.Evaluating CNN and LSTM for web attack detection[C]//Proceedings of the 2018 10th International Conference on Machine Learning and Computing.2018:283-287. [24]PAL R,CHOWDARY N.Statistical profiling of n-grams forpayload based anomaly detection for HTTP web traffic[C]//Proceedings of the 2018 IEEE International Conference on Advanced Networksand Telecommunications Systems(ANTS).Indore,India,2018. [25]KHREICH W,KHOSRAVIFAR B,HAMOU-LHADJ A,et al.An anomaly detection system based on variable N-gram features and one-class SVM[J].Information and Software Technology,2017,91:186-197. [26]MIKOLOV T,CHEN K,CORRADO G,et al.Effcient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013. [27]MIKOLOV T,SUTSKEVER I,KAI C,et al.Distributed Representations of Words and Phrases and their Compositionality[J].arXiv:1310.4546,2013. [28]HTTP DATASET CSIC 2010[OL].http://www.isi.csic.es/dataset/. [29]AHMAD F Z.WAF Dataset[OL].https://github.com/faiz-ann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall. |
|