计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230300191-5.doi: 10.11896/jsjkx.230300191

• 信息安全 • 上一篇    下一篇

DUWe:动态未知词嵌入方法在Web异常检测中的应用

王丽1,2,3, 陈刚1,3, 夏明山1,2, 胡皓1   

  1. 1 中国科学院高能物理研究所 北京 100049
    2 散裂中子源科学中心 广东 东莞 523803
    3 中国科学院大学 北京 100049
  • 发布日期:2024-06-06
  • 通讯作者: 王丽(wangli320@ihep.ac.cn)
  • 基金资助:
    国家自然科学基金(11905239,12005248,12105303)

DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection

WANG Li1,2,3, CHEN Gang1,3, XIA Mingshan1,2, HU Hao1   

  1. 1 Institute of High Energy Physics,Chinese Academy of Sciences(CAS),Beijing 100049,China
    2 Spallation Neutron Source Science Center(SNSSC),Dongguan,Guangzhou 523803,China
    3 University of Chinese Academy of Sciences,Beijing 100049,China
  • Published:2024-06-06
  • About author:WANG Li,born in 1987,Ph.D,engineer.Her main research interests include network technology and network security.
  • Supported by:
    National Natural Science Foundation of China(11905239,12005248,12105303).

摘要: 现有的基于深度学习模型的词嵌入方法用于Web异常检测时,通常将语料库中没有出现的未知词汇(Out of Vocabulary,OOV)设置为unknown,并赋予零或随机向量输入到模型中进行训练,未考虑未知词汇在Web请求语句中的上下文关系。同时,在Web系统代码开发过程中,基于个人习惯并为了增加代码的可读性,程序员设计的请求路径代码往往存在一定的模式。因此,考虑到Web请求的模式和单词语义间的相关性,研究基于Word2vec的动态未知词表示方法DUWe(Dynamic Unknown Word Embedding),该方法通过分析Web请求路径中单词上下文的关系来赋予未知词向量的表示内容。在CSIC-2010和WAF Dataset数据集上的实验评估表明,增加未知词表示方法比仅用Word2vec静态特征提取方法具有更好的性能,同时在准确性、精准率、召回率和F1-Score方面均有提高,在训练时间上最大降低1.14倍。

关键词: 未知词汇, Web异常检测, 动态词嵌入, 词嵌入优化, 深度学习

Abstract: When the existing deep-learning model-based word embedding methods are used to detect Web anomalies,the vocabulary not appearing in the corpus is usually called out of vocabulary(OOV) and is set as unknown,and given zero or random vector as the input of the depth model for training without considering the context of unknown word in the web request.In the process of code development,in order to increase the readability of code,programmers often design request path code based on a certain pattern which usually makes web requests semantically related.Considering that there are certain request patterns in web requests and pattern correlation between semantics,this paper studies and proposes a dynamic unknown word embedding method DUWe based on Word2vec,which assigns unknown word representation through word context inference.Evaluation on CSIC-2010 and WAF dataset shows that adding unknown word embedding methods have better performance than word2vec feature extraction methods.The accuracy,precision,recall rate and F1-Score are improved,and the maximum reduction in training time is 1.14 times.

Key words: Unknown word, Web anomaly detection, Dynamic unknown word embedding, Word embedding optimization, Deep learning

中图分类号: 

  • TP393
[1]The Open Web Application Security Project.OWASP Top 10:2021,[online]Available:https://owasp.org/www-project-top-ten/.
[2]PROKHORENKO V,CHOO K K R,ASHMAN H.Web application protection techniques:a taxonomy[J].J.Netw.Comput.Appl.,2016,60:95-112.
[3]KUMAR K N,SUKUMARAN S.A survey on network intrusion detection system techniques[J].Int.J.Adv.Technol.Eng.Explor.,2018,5(47):385-393.
[4]LEBRET R P.Word embeddings for natural language proces-sing[R].Technical Report EPFL,2016.
[5]KIM Y,JERNITE Y,SONTAG D,et al.Character-aware neural language models[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016.
[6]KRUEGEL C,VIGNA G.Anomaly detection of web-based attacks[C]//10th Conference on Computer and Communication Security.ACM,USA,2003:251-261.
[7]KUEGEL C,VIGNA G,ROBERTSON W.A multi-model ap-proach to the detection of web-based attacks[J].Computer Networks,2005,48(5).
[8]ROBERTSON W,VIGNA G,KRUEGEL C,et al.Using generalization and characterization techniques in the anomaly-based detection of web attacks[C]//Annual Network and Distributed System Security Symposium(NDSS).2006.
[9]TEKEREK A,GEMCI C,BAY O F.Development of a hybrid web application firewall to prevent web based attacks[C]//2014 IEEE 8th International Conference on Application of Information and Communication Technologies(AICT).2014:1-4.
[10]APPLEBAUM S,GABER T,AHMED A.Signature-based andMachine-Learning-based Web Application Firewalls:A Short Survey[J].Procedia Computer Science,2021,189:359-367.
[11]GAO Y,MA Y,LI D.Anomaly detection of malicious users’ behaviors for web applications based on web logs[C]//2017 IEEE 17th International Conference on Communication Technology(ICCT).2017:1352-1355.
[12]SUNEETHA K R,KRISHNAMOORTHY K R.IdentifyingUser Behavior by Analyzing Web Server Access Log File[J].International Journal of Computer Science & Network Security,2009,9(4):327-332.
[13]FENG Q Y.Research on Log Anomaly Detection and User Behavior Analysis based on Web Application[D].Guangzhou:South China University of Technology,2019.
[14]LIANG J,ZHAO W,YE W.Anomaly-Based Web Attack Detection:A Deep Learning Approach[C]//Proceedings of the 2017 VI International Conference on Network Communication and Computing.2017:80-85.
[15]JEMAL I,HADDAR M A,CHEIKHROUHOU O,et al.M-CNN:A New Hybrid Deep Learning Model for Web Security[C]//2020 IEEE/ACS 17th International Conference on Computer Systems and Applications(AICCSA).Antalya,Turkey,2020:1-7.
[16]LE H,PHAM Q,SAHOO D,et al.URLNet:Learning a URL representation with deep learning for malicious URL detection[J].arXiv:1802.03162,2018.
[17]ITO M,IYATOMI H.Web application firewall using character-level convolutional neural network[C]//2018 IEEE 14th International Colloquium on Signal Processing and Its Applications(CSPA).IEEE,2018:103-106.
[18]SEYYAR Y E,YAVUZ A G,ÜNVER H M.Detection of Web Attacks Using the BERT Model[C]//2022 30th Signal Processing and Communications Applications Conference(SIU).Safranbolu,Turkey,2022:1-4.
[19]BOKOLO B G,CHEN L,LIU Q.Detection of Web-Attack using DistilBERT,RNN,and LSTM[C]//2023 11th International Symposium on Digital Forensics and Security(ISDFS).2023:1-6.
[20]TRAN A T,LUONG T D,PHAM X S,et al.Deep Models with Differential Privacy for Distributed Web Attack Detection[C]//2022 14th International Conference on Knowledge and Systems Engineering(KSE).Nha Trang,Vietnam,2022:1-6.
[21]SAXE J,BERLIN K.eXpose:A Character-Level ConvolutionalNeural Network with Embeddings For Detecting Malicious URLs[J].arXiv:1702.08568,2017.
[22]WU J.Convolutional Neural Network with Character Embeddings for Malicious Web Request Detection[C]//2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:622-627.
[23]WANG J,ZHOU Z,CHEN J.Evaluating CNN and LSTM for web attack detection[C]//Proceedings of the 2018 10th International Conference on Machine Learning and Computing.2018:283-287.
[24]PAL R,CHOWDARY N.Statistical profiling of n-grams forpayload based anomaly detection for HTTP web traffic[C]//Proceedings of the 2018 IEEE International Conference on Advanced Networksand Telecommunications Systems(ANTS).Indore,India,2018.
[25]KHREICH W,KHOSRAVIFAR B,HAMOU-LHADJ A,et al.An anomaly detection system based on variable N-gram features and one-class SVM[J].Information and Software Technology,2017,91:186-197.
[26]MIKOLOV T,CHEN K,CORRADO G,et al.Effcient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[27]MIKOLOV T,SUTSKEVER I,KAI C,et al.Distributed Representations of Words and Phrases and their Compositionality[J].arXiv:1310.4546,2013.
[28]HTTP DATASET CSIC 2010[OL].http://www.isi.csic.es/dataset/.
[29]AHMAD F Z.WAF Dataset[OL].https://github.com/faiz-ann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!