计算机科学 ›› 2020, Vol. 47 ›› Issue (5): 295-300.doi: 10.11896/jsjkx.190800046
龚扣林, 周宇, 丁笠, 王永超
GONG Kou-lin, ZHOU Yu, DING Li, WANG Yong-chao
摘要: 随着计算机技术应用的不断深化,软件的数量和需求不断增加,开发难度不断升级。代码复用以及代码本身的复杂度,使得软件中不可避免地引入了大量漏洞。这些漏洞隐藏在海量代码中很难被发现,但一旦被人利用,将导致不可挽回的经济损失。为了及时发现软件漏洞,首先从源代码中提取方法体,形成方法集;为方法集中的每个方法构建抽象语法树,借助抽象语法树抽取方法中的语句,形成语句集;替换语句集中程序员自定义的变量名、方法名及字符串,并为每条语句分配一个独立的节点编号,形成节点集。其次,运用数据流和控制流分析提取节点间的数据依赖和控制依赖关系。然后,将从方法体中提取的节点集、节点间的数据依赖关系以及控制依赖关系组合成方法对应的特征表示,并运用one-hot编码进一步将其处理为特征矩阵。最后,为每个矩阵贴上是否含有漏洞的标签以生成训练样本,并利用神经网络训练出相应的漏洞分类模型。为了更好地学习序列的上下文信息,选取了双向长短时记忆网络(Bidirectional Long Short-Term Memory Networks,BiLSTM)神经网络,并在其上增加了Attention层,以进一步提升模型性能。实验中,漏洞检测结果的精确率和召回率分别达到了95.3%和93.5%,证实了所提方法能够较为准确地检测到代码中的安全漏洞。
中图分类号:
[1]GHAFFARIAN S M,SHAHRIARI H R.Software vulnerability analysis and discovery using machine-learning and data-mining techniques[J].ACM Computing Surveys,2017,50(4):1-36. [2]US-CERT[OL].http://us-cert.gov. [3]ZIMMERMANN T,NAGAPPAN N,WILLIAMS L.Searching for a needle in a haystack:predicting security vulnerabilities for windows vista[C]//2010 Third International Conference on Software Testing,Verification and Validation.Paris,France:IEEE,2010. [4]WOO M,CHA S K,GOTTLIEB S,et al.Scheduling black-box mutational fuzzing[C]//Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security-CCS '13.New York:ACM Press,2013. [5]American fuzzy lop[OL].http://lcamtuf.coredump.cx/a?/. [6]WANG T L,WEI T,GU G F,et al.TaintScope:a checksum-aware directed fuzzing tool for automatic software vulnerability detection[C]//2010 IEEE Symposium on Security and Privacy.Oakland:IEEE,2010. [7]BÖHME M,PHAM V T,ROYCHOUDHURY A.Coverage-based greybox fuzzing as Markov chain[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(CCS'16).Vienna,Austria.New York:ACM Press,2016. [8]RAWAT S,JAIN V,KUMAR A,et al.VUzzer:Application-aware Evolutionary Fuzzing[C]//NDSS.2017. [9]MOLNAR D A.Automated Whitebox Fuzz Testing[C]//Network & Distributed System Security Symposium.DBLP,2011. [10]BABIĆ D,MARTIGNONI L,MCCAMANT S,et al.Statically-directed dynamic automated test generation[C]//Proceedings of the 2011 International Symposium on Software Testing and Analysis-ISSTA'11.New York:ACM Press,2011. [11]NEUHAUS S,ZIMMERMANN T,HOLLER C,et al.Predicting vulnerable software components[C]//Proceedings of the 14th ACM conference on Computer and communications security-CCS'07.New York:ACM Press,2007. [12]YAMAGUCHI F,GOLDE N,ARP D,et al.Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy.San Jose,CA:IEEE,2014. [13]CHANDRAMOHAN M,XUE Y X,XU Z Z,et al.BinGo:cross-architecture cross-OS binary search[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering(FSE 2016).New York:ACM Press,2016:678-689. [14]XU Z Z,CHEN B H,CHANDRAMOHAN M,et al.SPAIN:security patch analysis for binaries towards understanding the pain and pills[C]//2017 IEEE/ACM 39th International Conferenceon Software Engineering (ICSE).Buenos Aires:IEEE,2017. [15]LI Z,ZOU D Q,XU S H,et al.VulPecker:an automated vulnerability detection system based on code similarity analysis[C]//Proceedings of the 32nd Annual Conference on Computer Securi-ty Applications.2016:201-213. [16]KIM S,WOO S,LEE H,et al.VUDDY:a scalable approach for vulnerable code clone discovery[C]//2017 IEEE Symposium on Security and Privacy (SP).San Jose:IEEE,2017. [17]SCANDARIATO R,WALDEN J,HOVSEPYAN A,et al.Predicting vulnerable software components via text mining[J].IEEE Transactions on Software Engineering,2014,40(10):993-1006. [18]YAMAGUCHI F,LINDNER F,RIECK K.Vulnerability ex-trapolation:assisted discovery of vulnerabilities using machine learning[C]//Proceedings of the 5th USENIX Conference on Offensive Technologies.2011:13. [19]RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnerability detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).Orlando:IEEE,2018. [20]HARER J A,KIM L Y,RUSSELL R L,et al.Automated software vulnerability detection with machine learning[J].arXiv:1803.04497,2018. [21]LI Z,ZOU D Q,XU S H,et al.VulDeePecker:a deep learning-based system for vulnerability detection[C]//Proceedings 2018 Network and Distributed System Security Symposium.Reston,VA:Internet Society,2018. [22]LI Z,ZOU D,XU S,et al.SySeVR:A Framework for Using Deep Learning to Detect Software Vulnerabilities[J].arXiv:1807.06756,2018. [23]ANTLR4[OL].https://github.com/antlr/antlr4. [24]Common Weakness Enumeration[OL].https://cwe.mitre.org. [25]Software Assurance Reference Dataset of National Institute of Standards and Technology[OL].https://samate.nist.gov/SARD. [26]LI Y C,HUANG R,LAI F G,et al.Open source software vulnerability detection method based on deep clustering[J].Application Research of Computers,2020,37(4):1107-1110,1114. |
[1] | 于家畦, 康晓东, 白程程, 刘汉卿. 一种新的中文电子病历文本检索模型 New Text Retrieval Model of Chinese Electronic Medical Records 计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198 |
[2] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[3] | 张潆藜, 马佳利, 刘子昂, 刘新, 周睿. 以太坊Solidity智能合约漏洞检测方法综述 Overview of Vulnerability Detection Methods for Ethereum Solidity Smart Contracts 计算机科学, 2022, 49(3): 52-61. https://doi.org/10.11896/jsjkx.210700004 |
[4] | 董哲, 邵若琦, 陈玉梁, 翟维枫. 基于BERT和对抗训练的食品领域命名实体识别 Named Entity Recognition in Food Field Based on BERT and Adversarial Training 计算机科学, 2021, 48(5): 247-253. https://doi.org/10.11896/jsjkx.200800181 |
[5] | 李明磊, 黄晖, 陆余良, 朱凯龙. SymFuzz:一种复杂路径条件下的漏洞检测技术 SymFuzz:Vulnerability Detection Technology Under Complex Path Conditions 计算机科学, 2021, 48(5): 25-31. https://doi.org/10.11896/jsjkx.200600128 |
[6] | 陈明豪, 祝跃飞, 芦斌, 翟懿, 李玎. 基于Attention-CNN的加密流量应用类型识别 Classification of Application Type of Encrypted Traffic Based on Attention-CNN 计算机科学, 2021, 48(4): 325-332. https://doi.org/10.11896/jsjkx.200900155 |
[7] | 刘全明, 李尹楠, 郭婷, 李岩纬. 基于Borderline-SMOTE和双Attention的入侵检测方法 Intrusion Detection Method Based on Borderline-SMOTE and Double Attention 计算机科学, 2021, 48(3): 327-332. https://doi.org/10.11896/jsjkx.200600025 |
[8] | 柴冰, 李冬冬, 王喆, 高大启. 融合频率和通道卷积注意的脑电(EEG)情感识别 EEG Emotion Recognition Based on Frequency and Channel Convolutional Attention 计算机科学, 2021, 48(12): 312-318. https://doi.org/10.11896/jsjkx.201000141 |
[9] | 涂良琼, 孙小兵, 张佳乐, 蔡杰, 李斌, 薄莉莉. 智能合约漏洞检测工具研究综述 Survey of Vulnerability Detection Tools for Smart Contracts 计算机科学, 2021, 48(11): 79-88. https://doi.org/10.11896/jsjkx.210600117 |
[10] | 肖潇, 孔凡芝. 三角坐标系下人脸表情表示方法 New Representation of Facial Affect Based on Triangular Coordinate System 计算机科学, 2020, 47(6A): 250-253. https://doi.org/10.11896/JsJkx.190700081 |
[11] | 陈俊芬,张明,赵佳成. 复杂高维数据的密度峰值快速搜索聚类算法 Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional Data 计算机科学, 2020, 47(3): 79-86. https://doi.org/10.11896/jsjkx.190400123 |
[12] | 高楠,李利娟,李伟,祝建明. 融合语义特征的关键词提取方法 Keywords Extraction Method Based on Semantic Feature Fusion 计算机科学, 2020, 47(3): 110-115. https://doi.org/10.11896/jsjkx.190700041 |
[13] | 杜琳, 曹东, 林树元, 瞿溢谦, 叶辉. 基于BERT与Bi-LSTM融合注意力机制的中医病历文本的提取与自动分类 Extraction and Automatic Classification of TCM Medical Records Based on Attention Mechanism of BERT and Bi-LSTM 计算机科学, 2020, 47(11A): 416-420. https://doi.org/10.11896/jsjkx.200200020 |
[14] | 崔丹丹, 刘秀磊, 陈若愚, 刘旭红, 李臻, 齐林. 基于Lattice LSTM的古汉语命名实体识别 Named Entity Recognition in Field of Ancient Chinese Based on Lattice LSTM 计算机科学, 2020, 47(11A): 18-23. https://doi.org/10.11896/jsjkx.200500090 |
[15] | 阳小华, 闫仕宇, 刘杰, 李萌. 科学计算程序蜕变关系层次分类模型 Hierarchical Classification Model for Metamorphic Relations of Scientific Computing Programs 计算机科学, 2020, 47(11A): 557-561. https://doi.org/10.11896/jsjkx.200200015 |
|