计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220600172-6.doi: 10.11896/jsjkx.220600172

• 信息安全 • 上一篇    下一篇

基于FlexUDA模型的SQL注入检测研究

王清宇, 王海瑞, 朱贵富, 孟顺建   

  1. 昆明理工大学信息工程与自动化学院 昆明 650500
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 王海瑞(hrwang88@163.com)
  • 作者简介:(2078125631@qq.com)
  • 基金资助:
    国家自然科学基金(61863016,61263023)

Study on SQL Injection Detection Based on FlexUDA Model

WANG Qingyu, WANG Hairui, ZHU Guifu, MENG Shunjian   

  1. Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:WANG Qingyu,born in 1995,postgra-duate.His main research interests include cyber security and machine lear-ning. WANG Hairui,born in 1969,Ph.D,professor,is a member of China Computer Federation.His main research interests include multimedia intelligence techno-logy,network control technology,and embedded application technology.
  • Supported by:
    National Natural Science Foundation of China(61863016,61263023).

摘要: 针对深度学习方法检测SQL注入时有标签数据不足容易导致模型过拟合的问题,提出了一种基于半监督学习的FlexUDA模型。首先对采集到的数据进行解码、泛化和分词等预处理,然后通过计算TF-IDF值对无标签数据进行增强,并将原始数据和增强后的数据使用TF-IDF和Word2Vec融合算法进行向量化,最后使用FlexUDA模型进行训练,并将训练好的模型与其他模型进行对比分析。实验结果表明,FlexUDA模型仅使用1000条有标签数据和100000条无标签数据进行训练,就获得了99.42%的准确率和99.23%的召回率,相比其他有监督训练模型,表现出了更好的泛化性能,可以很好地解决SQL注入检测中有标签数据不足导致的过拟合问题。

关键词: SQL注入检测, 半监督学习, 无监督数据增强, 动态阈值

Abstract: FlexUDA model based on semi-supervised learning is proposed to solve the problem that insufficient labeled data is easy to cause model over fitting when deep learning method detects SQL injection.Firstly,the collected data are preprocessed by decoding,generalization and word segmentation,and then the unlabeled data are augmented by calculating the TF-IDF value.The original data and augmented data are vectorized using TF-IDF and Word2Vec fusion algorithm.Finally,the FlexUDA model is used for training,and the trained model is compared with other models.Experimental results show the FlexUDA model only uses 1000 labeled data and 100000 unlabeled data for training,and achieves 99.42% accuracy and 99.23% recall.Compared with other supervised training models,it shows better generalization performance,and can well solve the over fitting problem caused by insufficient labeled data in SQL injection detection.

Key words: SQL injection detection, Semi-supervised learning, Unsupervised data augmentations, Dynamic threshold

中图分类号: 

  • TP393.08
[1]Top 10 Web Application Security Risks[EB/OL].https://owasp.org/www-project-top-ten.
[2]OWASP TOP 10 from 2003 to 2021 Releases[EB/OL].https://github.com/OWASP/Top10.
[3]WANG F.Research and implementation of SQL injection detection technology based on d-eep learning[D].Beijing:Beijing University of Posts and Telecommunications,2020.
[4]GOULD C,SU Z,DEVANBU P.Static checking of dynamically generated queries in database applications[C]//26th International Conference on Software Engineering.IEEE,2004:645-654.
[5]LIVSHITS V B,LAM M S.Finding Security Vulnerabilities in Java Applications with Static Analysis[C]//Proceedings of the 14th Conference on USENIX Security Symposium.2005:18.
[6]SHIN Y.Improving the identification of actual input manipula-tion vulnerabilities[C]//14th ACM SIGSOFT Symposium on Foundations of Software Engineering ACM.2006.
[7]DAS D,SHARMA U,BHATTACHARYYAD K.An Approach to Detection of SQL Injection Vulnerabilities Based on Dynamic Query Matching[J].International Journal of Computer Applications,2010,1(25):39-45.
[8]HALFOND W G J,ORSO A.AMNESIA:analysis and monito-ring for neutralizing SQL inj-ection attacks[C]//Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering.2005:174-183.
[9]XIAO Z,ZHOU Z,YANG W,et al.An approach for SQL injection detection based on behavior and response analysis[C]//2017 IEEE 9th International Conference on Communication Software and Networks(ICCSN).IEEE,2017:1437-1442.
[10]APPIAH B,OPOKU-MENSAH E,QIN Z.SQL injection attack detection using fingerprints and pattern matching technique[C]//2017 8th IEEE International Conference on Software Engineering and Service Science(ICSESS).IEEE,2017:583-587.
[11]WASSERMANN G,GOULD C,SU Z D,et al.Static Checking of Dynamically Generated Queries in Database Applications[J].ACM Transactions on Software Engineering and Methodology,2007,16(4):14.1-14.27.
[12]ISHITAKI T,OBUKATA R,ODAT,et al.Application of deep recurrent neural networks for prediction of user behavior in tor networks[C]//2017 31st International Conference on Advanced Information Networking and Applications Workshops(WAINA).IEEE,2017:238-243.
[13]ZHANG X Y.Research on patriotism in class-ical poetry based on textcnn[D].Shanghai:Shanghai Normal University,2020.
[14]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[15]CHEN Y.Convolutional neural networks for sentence classification[D].Waterloo:University of Waterloo,2015.
[16]XIE Q,DAI Z,HOVY E,et al.Unsupervised data augmentation for consistency training[J].Advances in Neural Information Processing Systems,2020,33:6256-6268.
[17]ZHANG B,WANG Y,HOU W,et al.Flexmatch:Boostingsemi-supervised learning with curriculum pseudo labeling[J].Advances in Neural Information Processing Systems, 2021,34:18408-18419.
[18]SQL injection dataset[EB/OL].[https://github.com/client9/libinjection.
[19]JOSHI A,GEETHA V.SQL Injection detection using machine learning[C]//2014 International Conference on Control,Instrumentation,Communication and Computational Technologies(ICCICCT).IEEE,2014:1111-1115.
[20]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estima-tion of word representations in vector space[J].arXiv:1301.3781,2013.
[21]LI C.Research on SQL injection detection technology based on Naive Bayes and LST-M recurrent neural network[D].Changsha:Hunan University,2018.
[22]CAO X B.Research on SQL injection detect-ion based on deep learning[D].Nanning:Guangxi University,2020.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!