Computer Science ›› 2020, Vol. 47 ›› Issue (5): 295-300.doi: 10.11896/jsjkx.190800046

• Information Security • Previous Articles     Next Articles

Vulnerability Detection Using Bidirectional Long Short-term Memory Networks

GONG Kou-lin, ZHOU Yu, DING Li, WANG Yong-chao   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
    Ministry Key Laboratory for Safety-critical Software Development and Verification,Nanjing 211100,China
  • Received:2019-08-09 Online:2020-05-15 Published:2020-05-19
  • About author:GONG Kou-lin,born in 1995,postgra-duate,is a member of China Computer Federation.His main research interests include software evolution analysis and mining software repositories.
    ZHOU Yu,born in 1981,Ph.D,professor,is a member of China Computer Federation .His main research interests include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61972197)and Fundamental Research Funds for the Central Universities(NS2019055)

Abstract: With the continuous development of the application of computer technology,the number and demand of software continue to increase,and the difficulty of development is constantly escalating.Code reuse and the complexity of the code itself have inevitably introduced a number of vulnerabilities in software.These vulnerabilities hidden in massive code are hard to find.But once they are exploited by people,it will lead to irreparable economic losses.In order to discover software vulnerabilities in time,firstly,this paper extracts the method body from the source code to form a method set,and then constructs an abstract syntax tree for each method in the method set.The statements in the method are extracted by means of the abstract syntax tree to form a statement set.The customized variable name,method name and string with some uniform identifiers are replaced.A separate node number is assigned to each statement to form a node set.Secondly,data flow and control flow analysis are used to extract data dependencies and control dependencies between nodes.Then,the node set extracted from the method body,the inter-node data dependency relationship and control dependency relationship are combined into a feature representation corresponding to the method,and further processed into a feature matrix by using one-hot encoding.Finally,each matrix is labeled with a vulnerability tag to generate training samples,and a neural network is used to train the corresponding vulnerability classification model.In order to learn the context information of the sequence better,the BiLSTM network is selected and the Attention layer is added to further improve the performance of the model.In the experiment,the accuracy and recall rate of the vulnerability detection results reach 95.3% and 93.5% respectively,which confirmes that the proposed method can detect the security vulnerabilities in the code more accurately.

Key words: Vulnerability detection, Feature representation, BiLSTM, Attention, Classification model

CLC Number: 

  • TP305
[1] GHAFFARIAN S M,SHAHRIARI H R.Software vulnerability analysis and discovery using machine-learning and data-mining techniques[J].ACM Computing Surveys,2017,50(4):1-36.
[2] US-CERT[OL].http://us-cert.gov.
[3] ZIMMERMANN T,NAGAPPAN N,WILLIAMS L.Searching for a needle in a haystack:predicting security vulnerabilities for windows vista[C]//2010 Third International Conference on Software Testing,Verification and Validation.Paris,France:IEEE,2010.
[4] WOO M,CHA S K,GOTTLIEB S,et al.Scheduling black-box mutational fuzzing[C]//Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security-CCS '13.New York:ACM Press,2013.
[5] American fuzzy lop[OL].http://lcamtuf.coredump.cx/a?/.
[6] WANG T L,WEI T,GU G F,et al.TaintScope:a checksum-aware directed fuzzing tool for automatic software vulnerability detection[C]//2010 IEEE Symposium on Security and Privacy.Oakland:IEEE,2010.
[7] BÖHME M,PHAM V T,ROYCHOUDHURY A.Coverage-based greybox fuzzing as Markov chain[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(CCS'16).Vienna,Austria.New York:ACM Press,2016.
[8] RAWAT S,JAIN V,KUMAR A,et al.VUzzer:Application-aware Evolutionary Fuzzing[C]//NDSS.2017.
[9] MOLNAR D A.Automated Whitebox Fuzz Testing[C]//Network & Distributed System Security Symposium.DBLP,2011.
[10] BABIĆ D,MARTIGNONI L,MCCAMANT S,et al.Statically-directed dynamic automated test generation[C]//Proceedings of the 2011 International Symposium on Software Testing and Analysis-ISSTA'11.New York:ACM Press,2011.
[11] NEUHAUS S,ZIMMERMANN T,HOLLER C,et al.Predicting vulnerable software components[C]//Proceedings of the 14th ACM conference on Computer and communications security-CCS'07.New York:ACM Press,2007.
[12] YAMAGUCHI F,GOLDE N,ARP D,et al.Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy.San Jose,CA:IEEE,2014.
[13] CHANDRAMOHAN M,XUE Y X,XU Z Z,et al.BinGo:cross-architecture cross-OS binary search[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering(FSE 2016).New York:ACM Press,2016:678-689.
[14] XU Z Z,CHEN B H,CHANDRAMOHAN M,et al.SPAIN:security patch analysis for binaries towards understanding the pain and pills[C]//2017 IEEE/ACM 39th International Conferenceon Software Engineering (ICSE).Buenos Aires:IEEE,2017.
[15] LI Z,ZOU D Q,XU S H,et al.VulPecker:an automated vulnerability detection system based on code similarity analysis[C]//Proceedings of the 32nd Annual Conference on Computer Securi-ty Applications.2016:201-213.
[16] KIM S,WOO S,LEE H,et al.VUDDY:a scalable approach for vulnerable code clone discovery[C]//2017 IEEE Symposium on Security and Privacy (SP).San Jose:IEEE,2017.
[17] SCANDARIATO R,WALDEN J,HOVSEPYAN A,et al.Predicting vulnerable software components via text mining[J].IEEE Transactions on Software Engineering,2014,40(10):993-1006.
[18] YAMAGUCHI F,LINDNER F,RIECK K.Vulnerability ex-trapolation:assisted discovery of vulnerabilities using machine learning[C]//Proceedings of the 5th USENIX Conference on Offensive Technologies.2011:13.
[19] RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnerability detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).Orlando:IEEE,2018.
[20] HARER J A,KIM L Y,RUSSELL R L,et al.Automated software vulnerability detection with machine learning[J].arXiv:1803.04497,2018.
[21] LI Z,ZOU D Q,XU S H,et al.VulDeePecker:a deep learning-based system for vulnerability detection[C]//Proceedings 2018 Network and Distributed System Security Symposium.Reston,VA:Internet Society,2018.
[22] LI Z,ZOU D,XU S,et al.SySeVR:A Framework for Using Deep Learning to Detect Software Vulnerabilities[J].arXiv:1807.06756,2018.
[23] ANTLR4[OL].https://github.com/antlr/antlr4.
[24] Common Weakness Enumeration[OL].https://cwe.mitre.org.
[25] Software Assurance Reference Dataset of National Institute of Standards and Technology[OL].https://samate.nist.gov/SARD.
[26] LI Y C,HUANG R,LAI F G,et al.Open source software vulnerability detection method based on deep clustering[J].Application Research of Computers,2020,37(4):1107-1110,1114.
[1] ZHAO Jia-qi, WANG Han-zheng, ZHOU Yong, ZHANG Di, ZHOU Zi-yuan. Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement [J]. Computer Science, 2021, 48(1): 190-196.
[2] LIU Yang, JIN Zhong. Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism [J]. Computer Science, 2021, 48(1): 197-203.
[3] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[4] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[5] WANG Run-zheng, GAO Jian, HUANG Shu-hua, TONG Xin. Malicious Code Family Detection Method Based on Knowledge Distillation [J]. Computer Science, 2021, 48(1): 280-286.
[6] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[7] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[8] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[9] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[10] YU Yi-lin, TIAN Hong-tao, GAO Jian-wei and WAN Huai-yu. Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features [J]. Computer Science, 2020, 47(6A): 40-44.
[11] XIAO Xiao and KONG Fan-zhi. New Representation of Facial Affect Based on Triangular Coordinate System [J]. Computer Science, 2020, 47(6A): 250-253.
[12] NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware [J]. Computer Science, 2020, 47(6): 74-78.
[13] HUANG Yong-tao, YAN Hua. Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion [J]. Computer Science, 2020, 47(6): 133-137.
[14] ZHANG Zhi-yang, ZHANG Feng-li, CHEN Xue-qin, WANG Rui-jin. Information Cascade Prediction Model Based on Hierarchical Attention [J]. Computer Science, 2020, 47(6): 201-209.
[15] ZHENG Wei-zhe, QIU Peng, WEI Juan. Sound Recognition and Detection Based on Multi-scale Attention Fusion in Weak LabelEnvironment [J]. Computer Science, 2020, 47(5): 120-123.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .