Computer Science ›› 2023, Vol. 50 ›› Issue (6): 283-290.doi: 10.11896/jsjkx.220600131

• Computer Network • Previous Articles     Next Articles

Function Level Code Vulnerability Detection Method of Graph Neural Network Based on Extended AST

GU Shouke, CHEN Wen   

  1. School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China
  • Received:2022-06-13 Revised:2023-03-17 Online:2023-06-15 Published:2023-06-06
  • About author:GU Shouke,born in 1998,postgra-duate.His main research interests include graph neural network,cyber security and vulnerability miningCHEN Wen,born in 1983,Ph.D,asso-ciate professor,master supervisor,is a member of China Computer Federation.His main research interests include network security,information hiding and data mining.
  • Supported by:
    National Key Research and Development Program of China(020YFB1805405,2019QY0800),National Natural Science Foundation of China(U1736212,61872255,U19A2068) and Key Laboratory of Pattern Recognition and Intelligent Information Proces-sing,Institutions of Higher Education of Sichuan Province(MSSB-2020-01).

Abstract: With the increase of software vulnerabilities year by year,security problems are becoming more and more serious.Vulnerability detection of original code in the delivery stage of software project can effectively avoid security vulnerabilities in later run-time,and the discovery of code vulnerability depends on effective code characterization.The traditional characterization me-thods based on software metrics have weak correlation with vulnerabilities,so it is difficult to characterize vulnerability information efficiently.In recent years,machine learning has provided a new idea for intelligent discovery of vulnerabilities,but this method also has the problem of missing key information of code feature.To solve the above problems,control flow edge,data flow edge and next token edge are added to the traditional abstract syntax tree(AST) to generate an expanded abstract syntax tree (EXAST) graph structure,characterize the original code to better process the code structure information,and the word vector embedding model(word2vec) is used to initialize the code information into a numerical vector that the machine can recognize and learn.At the same time,the gate recurrent unit(GRU) is introduced into the traditional graph neural network(GNN) to build the model,which can alleviate the disappearance of the gradient,enhance the dissemination of long-term information in the graph structure to strengthen the timing relationship of code execution and improve the accuracy of vulnerability detection.Finally,the model is trained and tested on the SARD data sets to realize the function granularity code vulnerability detection,which can improve the accuracy of 32.54% and the F1 score of 44.99 compared with the traditional vulnerability detection method.Experimental results confirm the effectiveness of the method for code vulnerability detection.

Key words: Vulnerability mining, Graph neural network, Deep learning, Abstract syntax tree, Gate recurrent unit

CLC Number: 

  • TP309
[1]NIST.CVSS Severity Distribution Over Time [EB/OL].[2021-12-10].https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cvss-severity-distribution-over-time.
[2]PEISERT S,SCHNEIER B,OKHRAVI H,et al.Perspectiveson the solarwinds incident[J].IEEE Security & Privacy,2021,19(2):7-13.
[3]CVE[EB/OL].https://www.cve.org/CVERecord?id=CVE-2021-44228.
[4]Dwheeler.Flawfinder software official website[EB/OL].https://dwheeler.com/flawfinder/.
[5]KlockWork:Best Static Code Analyzer for Developer Productivity[EB/OL].https://www.perforce.com/products/klocwork.
[6]GAO Q,ZHANG S,CHEN X,et al.CoBOT:Static C/C++ bugdetection in the presence of incomplete code[C]//IEEE/ACM 26th International Conference on Program Comprehension.2018.
[7]AFL[OL].https://lcamtuf.coredump.cx/afl.
[8]LibFuzzer[OL].https://llvm.org/docs/LibFuzzer.html.
[9]LIN G,ZHANG J,LUO W,et al.Software Vulnerability Discovery via Learning Multi-Domain Knowledge Bases[J].IEEE Transactions on Dependable and Secure Computing,2021,18(5):2469-2485.
[10]PERL H,DECHAND S,SMITH M,et al.VCCFinder:Finding potential vulnerabilities in open-source projects to assist code audits[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer & Communications Security.2015:426-437.
[11]SHIN Y,MENEELY A,WILLIAMS L,et al.Evaluating Complexity,Code Churn,and Developer Activity Metrics as Indicators of Software Vulnerabilities[J].IEEE Transactions on Software Engineering,2011,37(6):772-787.
[12]RUSSELL R,KIM L,HAMILTON L,et al.Automated Vulne-rability Detection in Source Code Using Deep Representation Learning[C]//2018 17th IEEE international conference on machine learning and applications.2018.757-762.
[13]SHEN Y,MARICONTI E,VERVIER P A,et al.Tiresias:Predicting security events through deep learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.2018:592-605.
[14]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376.
[15]Joern[OL].https://joern.readthedocs.io/en/latest.
[16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[17]GRIECO G,GRINBLAT G L,UZAL L,et al.Toward large-scale vulnerability discovery using machine learning[C]//Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy.2016:85-96.
[18]YOUNIS A,MALAIYA Y,ANDERSON C,et al.To fear or not to fear that is the question:Code characteristics of a vulnerable function with an existing exploit[C]//the Sixth ACM Confe-rence on Data & Applications Security & Privacy.2016:97-104.
[19]YAMAGUCHI F,RIECK K.Vulnerability extrapolation:Assisted discovery of vulnerabilities using machine learning[C]//5th USENIX Workshop on Offensive Technologies(WOOT 11).2011.
[20]LI Z,ZOU D,XU S,et al.Vuldeepecker:A deep learning-based system for vulnerability detection[J].arXiv:1801.01681,2018.
[21]ZOU D,WANG S,XU S,et al.μvuldeepecker:A deep learning-based system for multiclass vulnerability detection[J].IEEE Transactions on Dependable and Secure Computing,2019,18(5):2224-2236.
[22]LIN G,WEN S,HAN Q L,et al.Software vulnerability detection using deep neural networks:a survey[J].Proceedings of the IEEE,2020,108(10):1825-1848.
[23]LI Z,ZOU D,XU S,et al.Sysevr:A framework for using deep learning to detect software vulnerabilities[J].IEEE Transactions on Dependable and Secure Computing,2022,19(4):2244-2258.
[24]LIN G,ZHANG J,LUO W,et al.POSTER:Vulnerability dis-covery with function representation learning from unlabeled projects[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:2539-2541.
[25]HARER J A,KIM L Y,RUSSELL R L,et al.Automated software vulnerability detection with machine learning[J].arXiv:1803.04497,2018.
[26]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376.
[27]YU Z,CAO R,TANG Q,et al.Order matters:semantic-aware neural networks for binary code similarity detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(1):1145-1152.
[28]DUAN X,WU J,JI S,et al.Vulsniper:Focus your attention to shoot fine-grained vulnerabilities[C]//International Joint Conference on Artificial Intelligence.2019:4665-4671.
[29]YAMAGUCHI F,GOLDE N,ARP D,et al.Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy.IEEE,2014:590-604.
[30]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[31]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[32]LI Y,TARLOW D,BROCKSCHMIDT M,et al.Gated graph sequence neural networks[J].arXiv:1511.05493,2015.
[33]CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[34]NIST software assurance reference dataset project[EB/OL].https://www.nist.gov/itl/ssd/software-quality-group/software-assurance-reference-dataset-sard-manual.
[1] YU Jiabao, YAO Junmei, XIE Ruitao, WU Kaishun, MA Junchao. Tag Identification for UHF RFID Systems Based on Deep Learning [J]. Computer Science, 2023, 50(6A): 220200151-6.
[2] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[3] ZENG Wu, MAO Guojun. Few-shot Learning Method Based on Multi-graph Feature Aggregation [J]. Computer Science, 2023, 50(6A): 220400029-10.
[4] HOU Yanrong, LIU Ruixia, SHU Minglei, CHEN Changfang, SHAN Ke. Review of Research on Denoising Algorithms of ECG Signal [J]. Computer Science, 2023, 50(6A): 220300094-11.
[5] GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[6] HAN Junling, LI Bo, KANG Xiaodong, YANG Jingyi, LIU Hanqing, WANG Xiaotian. Cardiac MRI Image Segmentation Based on Faster R-CNN and U-net [J]. Computer Science, 2023, 50(6A): 220600047-9.
[7] LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[8] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[9] XIE Puxuan, CUI Jinrong, ZHAO Min. Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5 [J]. Computer Science, 2023, 50(6A): 220500005-6.
[10] WAN Haibo, JIANG Lei, WANG Xiao. Real-time Detection of Motorcycle Lanes Based on Deep Learning [J]. Computer Science, 2023, 50(6A): 220200066-5.
[11] WANG Xiaotian, LI Bo, KANG Xiaodong, LIU Hanqing, HAN Junling, YANG Jingyi. Study on Phased Target Detection in CT Image [J]. Computer Science, 2023, 50(6A): 220200063-10.
[12] LIANG Mingxuan, WANG Shi, ZHU Junwu, LI Yang, GAO Xiang, JIAO Zhixiang. Survey of Knowledge-enhanced Natural Language Generation Research [J]. Computer Science, 2023, 50(6A): 220200120-8.
[13] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
[14] GAO Xiang, WANG Shi, ZHU Junwu, LIANG Mingxuan, LI Yang, JIAO Zhixiang. Overview of Named Entity Recognition Tasks [J]. Computer Science, 2023, 50(6A): 220200119-8.
[15] LI Yang, WANG Shi, ZHU Junwu, LIANG Mingxuan, GAO Xiang, JIAO Zhixiang. Summarization of Aspect-level Sentiment Analysis [J]. Computer Science, 2023, 50(6A): 220400077-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!