计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 283-290.doi: 10.11896/jsjkx.220600131
顾守珂, 陈文
GU Shouke, CHEN Wen
摘要: 软件漏洞逐年递增,安全问题愈发严重。在软件项目的交付阶段对原始代码进行漏洞检测可以有效避免后期运行时的安全漏洞,而代码漏洞检测依赖于有效的代码表征。传统的基于软件度量的表征方法与漏洞关联性较弱,难以对漏洞信息进行有效表征。近年来,机器学习为漏洞的智能化发现提供了新的思路,但该方法同样可能遗漏关键的代码特征信息。针对以上问题,文中在传统抽象语法树(AST)上增加控制依赖、数据依赖和语句序列边生成增强抽象语法树(EXAST)图结构,对原始代码进行表征以更好地处理代码结构化信息,并采用词向量嵌入算法(Word2Vec)将代码信息初始化为机器能够识别和学习的数值向量。同时,在传统的图神经网络(GNN)中引入门控循环单元(GRU),构建图识别模型,以缓解梯度消失并加强图结构中长期信息的传播,从而增强了代码执行的时序关系,提高了漏洞检测的准确度。最后在SARD公开数据集上对模型进行对比测试,实现了函数粒度的代码漏洞检测,相比传统的漏洞检测方法,准确率和F1分值分别最大提高了32.54%和44.99,实验结果证明了所提方法对代码漏洞检测的有效性。
中图分类号:
[1]NIST.CVSS Severity Distribution Over Time [EB/OL].[2021-12-10].https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cvss-severity-distribution-over-time. [2]PEISERT S,SCHNEIER B,OKHRAVI H,et al.Perspectiveson the solarwinds incident[J].IEEE Security & Privacy,2021,19(2):7-13. [3]CVE[EB/OL].https://www.cve.org/CVERecord?id=CVE-2021-44228. [4]Dwheeler.Flawfinder software official website[EB/OL].https://dwheeler.com/flawfinder/. [5]KlockWork:Best Static Code Analyzer for Developer Productivity[EB/OL].https://www.perforce.com/products/klocwork. [6]GAO Q,ZHANG S,CHEN X,et al.CoBOT:Static C/C++ bugdetection in the presence of incomplete code[C]//IEEE/ACM 26th International Conference on Program Comprehension.2018. [7]AFL[OL].https://lcamtuf.coredump.cx/afl. [8]LibFuzzer[OL].https://llvm.org/docs/LibFuzzer.html. [9]LIN G,ZHANG J,LUO W,et al.Software Vulnerability Discovery via Learning Multi-Domain Knowledge Bases[J].IEEE Transactions on Dependable and Secure Computing,2021,18(5):2469-2485. [10]PERL H,DECHAND S,SMITH M,et al.VCCFinder:Finding potential vulnerabilities in open-source projects to assist code audits[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer & Communications Security.2015:426-437. [11]SHIN Y,MENEELY A,WILLIAMS L,et al.Evaluating Complexity,Code Churn,and Developer Activity Metrics as Indicators of Software Vulnerabilities[J].IEEE Transactions on Software Engineering,2011,37(6):772-787. [12]RUSSELL R,KIM L,HAMILTON L,et al.Automated Vulne-rability Detection in Source Code Using Deep Representation Learning[C]//2018 17th IEEE international conference on machine learning and applications.2018.757-762. [13]SHEN Y,MARICONTI E,VERVIER P A,et al.Tiresias:Predicting security events through deep learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.2018:592-605. [14]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376. [15]Joern[OL].https://joern.readthedocs.io/en/latest. [16]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013. [17]GRIECO G,GRINBLAT G L,UZAL L,et al.Toward large-scale vulnerability discovery using machine learning[C]//Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy.2016:85-96. [18]YOUNIS A,MALAIYA Y,ANDERSON C,et al.To fear or not to fear that is the question:Code characteristics of a vulnerable function with an existing exploit[C]//the Sixth ACM Confe-rence on Data & Applications Security & Privacy.2016:97-104. [19]YAMAGUCHI F,RIECK K.Vulnerability extrapolation:Assisted discovery of vulnerabilities using machine learning[C]//5th USENIX Workshop on Offensive Technologies(WOOT 11).2011. [20]LI Z,ZOU D,XU S,et al.Vuldeepecker:A deep learning-based system for vulnerability detection[J].arXiv:1801.01681,2018. [21]ZOU D,WANG S,XU S,et al.μvuldeepecker:A deep learning-based system for multiclass vulnerability detection[J].IEEE Transactions on Dependable and Secure Computing,2019,18(5):2224-2236. [22]LIN G,WEN S,HAN Q L,et al.Software vulnerability detection using deep neural networks:a survey[J].Proceedings of the IEEE,2020,108(10):1825-1848. [23]LI Z,ZOU D,XU S,et al.Sysevr:A framework for using deep learning to detect software vulnerabilities[J].IEEE Transactions on Dependable and Secure Computing,2022,19(4):2244-2258. [24]LIN G,ZHANG J,LUO W,et al.POSTER:Vulnerability dis-covery with function representation learning from unlabeled projects[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:2539-2541. [25]HARER J A,KIM L Y,RUSSELL R L,et al.Automated software vulnerability detection with machine learning[J].arXiv:1803.04497,2018. [26]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376. [27]YU Z,CAO R,TANG Q,et al.Order matters:semantic-aware neural networks for binary code similarity detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(1):1145-1152. [28]DUAN X,WU J,JI S,et al.Vulsniper:Focus your attention to shoot fine-grained vulnerabilities[C]//International Joint Conference on Artificial Intelligence.2019:4665-4671. [29]YAMAGUCHI F,GOLDE N,ARP D,et al.Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy.IEEE,2014:590-604. [30]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016. [31]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013. [32]LI Y,TARLOW D,BROCKSCHMIDT M,et al.Gated graph sequence neural networks[J].arXiv:1511.05493,2015. [33]CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014. [34]NIST software assurance reference dataset project[EB/OL].https://www.nist.gov/itl/ssd/software-quality-group/software-assurance-reference-dataset-sard-manual. |
|