Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230300191-5.doi: 10.11896/jsjkx.230300191

• Information Security • Previous Articles     Next Articles

DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection

WANG Li1,2,3, CHEN Gang1,3, XIA Mingshan1,2, HU Hao1   

  1. 1 Institute of High Energy Physics,Chinese Academy of Sciences(CAS),Beijing 100049,China
    2 Spallation Neutron Source Science Center(SNSSC),Dongguan,Guangzhou 523803,China
    3 University of Chinese Academy of Sciences,Beijing 100049,China
  • Published:2024-06-06
  • About author:WANG Li,born in 1987,Ph.D,engineer.Her main research interests include network technology and network security.
  • Supported by:
    National Natural Science Foundation of China(11905239,12005248,12105303).

Abstract: When the existing deep-learning model-based word embedding methods are used to detect Web anomalies,the vocabulary not appearing in the corpus is usually called out of vocabulary(OOV) and is set as unknown,and given zero or random vector as the input of the depth model for training without considering the context of unknown word in the web request.In the process of code development,in order to increase the readability of code,programmers often design request path code based on a certain pattern which usually makes web requests semantically related.Considering that there are certain request patterns in web requests and pattern correlation between semantics,this paper studies and proposes a dynamic unknown word embedding method DUWe based on Word2vec,which assigns unknown word representation through word context inference.Evaluation on CSIC-2010 and WAF dataset shows that adding unknown word embedding methods have better performance than word2vec feature extraction methods.The accuracy,precision,recall rate and F1-Score are improved,and the maximum reduction in training time is 1.14 times.

Key words: Unknown word, Web anomaly detection, Dynamic unknown word embedding, Word embedding optimization, Deep learning

CLC Number: 

  • TP393
[1]The Open Web Application Security Project.OWASP Top 10:2021,[online]Available:https://owasp.org/www-project-top-ten/.
[2]PROKHORENKO V,CHOO K K R,ASHMAN H.Web application protection techniques:a taxonomy[J].J.Netw.Comput.Appl.,2016,60:95-112.
[3]KUMAR K N,SUKUMARAN S.A survey on network intrusion detection system techniques[J].Int.J.Adv.Technol.Eng.Explor.,2018,5(47):385-393.
[4]LEBRET R P.Word embeddings for natural language proces-sing[R].Technical Report EPFL,2016.
[5]KIM Y,JERNITE Y,SONTAG D,et al.Character-aware neural language models[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016.
[6]KRUEGEL C,VIGNA G.Anomaly detection of web-based attacks[C]//10th Conference on Computer and Communication Security.ACM,USA,2003:251-261.
[7]KUEGEL C,VIGNA G,ROBERTSON W.A multi-model ap-proach to the detection of web-based attacks[J].Computer Networks,2005,48(5).
[8]ROBERTSON W,VIGNA G,KRUEGEL C,et al.Using generalization and characterization techniques in the anomaly-based detection of web attacks[C]//Annual Network and Distributed System Security Symposium(NDSS).2006.
[9]TEKEREK A,GEMCI C,BAY O F.Development of a hybrid web application firewall to prevent web based attacks[C]//2014 IEEE 8th International Conference on Application of Information and Communication Technologies(AICT).2014:1-4.
[10]APPLEBAUM S,GABER T,AHMED A.Signature-based andMachine-Learning-based Web Application Firewalls:A Short Survey[J].Procedia Computer Science,2021,189:359-367.
[11]GAO Y,MA Y,LI D.Anomaly detection of malicious users’ behaviors for web applications based on web logs[C]//2017 IEEE 17th International Conference on Communication Technology(ICCT).2017:1352-1355.
[12]SUNEETHA K R,KRISHNAMOORTHY K R.IdentifyingUser Behavior by Analyzing Web Server Access Log File[J].International Journal of Computer Science & Network Security,2009,9(4):327-332.
[13]FENG Q Y.Research on Log Anomaly Detection and User Behavior Analysis based on Web Application[D].Guangzhou:South China University of Technology,2019.
[14]LIANG J,ZHAO W,YE W.Anomaly-Based Web Attack Detection:A Deep Learning Approach[C]//Proceedings of the 2017 VI International Conference on Network Communication and Computing.2017:80-85.
[15]JEMAL I,HADDAR M A,CHEIKHROUHOU O,et al.M-CNN:A New Hybrid Deep Learning Model for Web Security[C]//2020 IEEE/ACS 17th International Conference on Computer Systems and Applications(AICCSA).Antalya,Turkey,2020:1-7.
[16]LE H,PHAM Q,SAHOO D,et al.URLNet:Learning a URL representation with deep learning for malicious URL detection[J].arXiv:1802.03162,2018.
[17]ITO M,IYATOMI H.Web application firewall using character-level convolutional neural network[C]//2018 IEEE 14th International Colloquium on Signal Processing and Its Applications(CSPA).IEEE,2018:103-106.
[18]SEYYAR Y E,YAVUZ A G,ÜNVER H M.Detection of Web Attacks Using the BERT Model[C]//2022 30th Signal Processing and Communications Applications Conference(SIU).Safranbolu,Turkey,2022:1-4.
[19]BOKOLO B G,CHEN L,LIU Q.Detection of Web-Attack using DistilBERT,RNN,and LSTM[C]//2023 11th International Symposium on Digital Forensics and Security(ISDFS).2023:1-6.
[20]TRAN A T,LUONG T D,PHAM X S,et al.Deep Models with Differential Privacy for Distributed Web Attack Detection[C]//2022 14th International Conference on Knowledge and Systems Engineering(KSE).Nha Trang,Vietnam,2022:1-6.
[21]SAXE J,BERLIN K.eXpose:A Character-Level ConvolutionalNeural Network with Embeddings For Detecting Malicious URLs[J].arXiv:1702.08568,2017.
[22]WU J.Convolutional Neural Network with Character Embeddings for Malicious Web Request Detection[C]//2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:622-627.
[23]WANG J,ZHOU Z,CHEN J.Evaluating CNN and LSTM for web attack detection[C]//Proceedings of the 2018 10th International Conference on Machine Learning and Computing.2018:283-287.
[24]PAL R,CHOWDARY N.Statistical profiling of n-grams forpayload based anomaly detection for HTTP web traffic[C]//Proceedings of the 2018 IEEE International Conference on Advanced Networksand Telecommunications Systems(ANTS).Indore,India,2018.
[25]KHREICH W,KHOSRAVIFAR B,HAMOU-LHADJ A,et al.An anomaly detection system based on variable N-gram features and one-class SVM[J].Information and Software Technology,2017,91:186-197.
[26]MIKOLOV T,CHEN K,CORRADO G,et al.Effcient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[27]MIKOLOV T,SUTSKEVER I,KAI C,et al.Distributed Representations of Words and Phrases and their Compositionality[J].arXiv:1310.4546,2013.
[28]HTTP DATASET CSIC 2010[OL].http://www.isi.csic.es/dataset/.
[29]AHMAD F Z.WAF Dataset[OL].https://github.com/faiz-ann24/Fwaf-Machine-Learning-driven-Web-Application-Firewall.
[1] WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[2] LIANG Fang, XU Xuyao, ZHAO Kailong, ZHAO Xuanfeng, ZHANG Guijun. Remote Template Detection Algorithm and Its Application in Protein Structure Prediction [J]. Computer Science, 2024, 51(6A): 230600225-7.
[3] PENG Bo, LI Yaodong, GONG Xianfu, LI Hao. Method for Entity Relation Extraction Based on Heterogeneous Graph Neural Networks and TextSemantic Enhancement [J]. Computer Science, 2024, 51(6A): 230700071-5.
[4] ZHANG Tianchi, LIU Yuxuan. Research Progress of Underwater Image Processing Based on Deep Learning [J]. Computer Science, 2024, 51(6A): 230400107-12.
[5] WANG Guogang, DONG Zhihao. Lightweight Image Semantic Segmentation Based on Attention Mechanism and Densely AdjacentPrediction [J]. Computer Science, 2024, 51(6A): 230300204-8.
[6] MENG Xiangfu, REN Quanying, YANG Dongshen, LI Keqian, YAO Keyu, ZHU Yan. Literature Classification of Individual Reports of Adverse Drug Reactions Based on BERT and CNN [J]. Computer Science, 2024, 51(6A): 230400049-6.
[7] JIAO Ruodan, GAO Donghui, HUANG Yanhua, LIU Shuo, DUAN Xuanfei, WANG Rui, LIU Weidong. Study and Verification on Few-shot Evaluation Methods for AI-based Quality Inspection in Production Lines [J]. Computer Science, 2024, 51(6A): 230700086-8.
[8] ZHANG Le, YU Ying, GE Hao. Mural Inpainting Based on Fast Fourier Convolution and Feature Pruning Coordinate Attention [J]. Computer Science, 2024, 51(6A): 230400083-9.
[9] WU Yibo, HAO Yingguang, WANG Hongyu. Rice Defect Segmentation Based on Dual-stream Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230600107-8.
[10] HOU Linhao, LIU Fan. Remote Sensing Image Fusion Combining Multi-scale Convolution Blocks and Dense Convolution Blocks [J]. Computer Science, 2024, 51(6A): 230400110-6.
[11] HUANG Yuanhang, BIAN Shan, WANG Chuntao. Gaussian Enhancement Module for Reinforcing High-frequency Details in Camera ModelIdentification [J]. Computer Science, 2024, 51(6A): 230700125-5.
[12] SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6.
[13] SHI Songhao, WANG Xiaodan, YANG Chunxiao, WANG Yifei. SAR Image Target Recognition Based on Cross Domain Few Shot Learning [J]. Computer Science, 2024, 51(6A): 230800136-7.
[14] LI Yuanxin, GUO Zhongfeng, YANG Junlin. Container Lock Hole Recognition Algorithm Based on Lightweight YOLOv5s [J]. Computer Science, 2024, 51(6A): 230900021-6.
[15] HUANG Haixin, WU Di. Steel Defect Detection Based on Improved YOLOv7 [J]. Computer Science, 2024, 51(6A): 230800018-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!