Computer Science ›› 2020, Vol. 47 ›› Issue (5): 295-300.doi: 10.11896/jsjkx.190800046

• Information Security • Previous Articles     Next Articles

Vulnerability Detection Using Bidirectional Long Short-term Memory Networks

GONG Kou-lin, ZHOU Yu, DING Li, WANG Yong-chao   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
    Ministry Key Laboratory for Safety-critical Software Development and Verification,Nanjing 211100,China
  • Received:2019-08-09 Online:2020-05-15 Published:2020-05-19
  • About author:GONG Kou-lin,born in 1995,postgra-duate,is a member of China Computer Federation.His main research interests include software evolution analysis and mining software repositories.
    ZHOU Yu,born in 1981,Ph.D,professor,is a member of China Computer Federation .His main research interests include software evolution analysis,mining software repositories,software architecture,and reliability analysis.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61972197)and Fundamental Research Funds for the Central Universities(NS2019055)

Abstract: With the continuous development of the application of computer technology,the number and demand of software continue to increase,and the difficulty of development is constantly escalating.Code reuse and the complexity of the code itself have inevitably introduced a number of vulnerabilities in software.These vulnerabilities hidden in massive code are hard to find.But once they are exploited by people,it will lead to irreparable economic losses.In order to discover software vulnerabilities in time,firstly,this paper extracts the method body from the source code to form a method set,and then constructs an abstract syntax tree for each method in the method set.The statements in the method are extracted by means of the abstract syntax tree to form a statement set.The customized variable name,method name and string with some uniform identifiers are replaced.A separate node number is assigned to each statement to form a node set.Secondly,data flow and control flow analysis are used to extract data dependencies and control dependencies between nodes.Then,the node set extracted from the method body,the inter-node data dependency relationship and control dependency relationship are combined into a feature representation corresponding to the method,and further processed into a feature matrix by using one-hot encoding.Finally,each matrix is labeled with a vulnerability tag to generate training samples,and a neural network is used to train the corresponding vulnerability classification model.In order to learn the context information of the sequence better,the BiLSTM network is selected and the Attention layer is added to further improve the performance of the model.In the experiment,the accuracy and recall rate of the vulnerability detection results reach 95.3% and 93.5% respectively,which confirmes that the proposed method can detect the security vulnerabilities in the code more accurately.

Key words: Attention, BiLSTM, Classification model, Feature representation, Vulnerability detection

CLC Number: 

  • TP305
[1]GHAFFARIAN S M,SHAHRIARI H R.Software vulnerability analysis and discovery using machine-learning and data-mining techniques[J].ACM Computing Surveys,2017,50(4):1-36.
[2]US-CERT[OL].http://us-cert.gov.
[3]ZIMMERMANN T,NAGAPPAN N,WILLIAMS L.Searching for a needle in a haystack:predicting security vulnerabilities for windows vista[C]//2010 Third International Conference on Software Testing,Verification and Validation.Paris,France:IEEE,2010.
[4]WOO M,CHA S K,GOTTLIEB S,et al.Scheduling black-box mutational fuzzing[C]//Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security-CCS '13.New York:ACM Press,2013.
[5]American fuzzy lop[OL].http://lcamtuf.coredump.cx/a?/.
[6]WANG T L,WEI T,GU G F,et al.TaintScope:a checksum-aware directed fuzzing tool for automatic software vulnerability detection[C]//2010 IEEE Symposium on Security and Privacy.Oakland:IEEE,2010.
[7]BÖHME M,PHAM V T,ROYCHOUDHURY A.Coverage-based greybox fuzzing as Markov chain[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(CCS'16).Vienna,Austria.New York:ACM Press,2016.
[8]RAWAT S,JAIN V,KUMAR A,et al.VUzzer:Application-aware Evolutionary Fuzzing[C]//NDSS.2017.
[9]MOLNAR D A.Automated Whitebox Fuzz Testing[C]//Network & Distributed System Security Symposium.DBLP,2011.
[10]BABIĆ D,MARTIGNONI L,MCCAMANT S,et al.Statically-directed dynamic automated test generation[C]//Proceedings of the 2011 International Symposium on Software Testing and Analysis-ISSTA'11.New York:ACM Press,2011.
[11]NEUHAUS S,ZIMMERMANN T,HOLLER C,et al.Predicting vulnerable software components[C]//Proceedings of the 14th ACM conference on Computer and communications security-CCS'07.New York:ACM Press,2007.
[12]YAMAGUCHI F,GOLDE N,ARP D,et al.Modeling and discovering vulnerabilities with code property graphs[C]//2014 IEEE Symposium on Security and Privacy.San Jose,CA:IEEE,2014.
[13]CHANDRAMOHAN M,XUE Y X,XU Z Z,et al.BinGo:cross-architecture cross-OS binary search[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering(FSE 2016).New York:ACM Press,2016:678-689.
[14]XU Z Z,CHEN B H,CHANDRAMOHAN M,et al.SPAIN:security patch analysis for binaries towards understanding the pain and pills[C]//2017 IEEE/ACM 39th International Conferenceon Software Engineering (ICSE).Buenos Aires:IEEE,2017.
[15]LI Z,ZOU D Q,XU S H,et al.VulPecker:an automated vulnerability detection system based on code similarity analysis[C]//Proceedings of the 32nd Annual Conference on Computer Securi-ty Applications.2016:201-213.
[16]KIM S,WOO S,LEE H,et al.VUDDY:a scalable approach for vulnerable code clone discovery[C]//2017 IEEE Symposium on Security and Privacy (SP).San Jose:IEEE,2017.
[17]SCANDARIATO R,WALDEN J,HOVSEPYAN A,et al.Predicting vulnerable software components via text mining[J].IEEE Transactions on Software Engineering,2014,40(10):993-1006.
[18]YAMAGUCHI F,LINDNER F,RIECK K.Vulnerability ex-trapolation:assisted discovery of vulnerabilities using machine learning[C]//Proceedings of the 5th USENIX Conference on Offensive Technologies.2011:13.
[19]RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnerability detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).Orlando:IEEE,2018.
[20]HARER J A,KIM L Y,RUSSELL R L,et al.Automated software vulnerability detection with machine learning[J].arXiv:1803.04497,2018.
[21]LI Z,ZOU D Q,XU S H,et al.VulDeePecker:a deep learning-based system for vulnerability detection[C]//Proceedings 2018 Network and Distributed System Security Symposium.Reston,VA:Internet Society,2018.
[22]LI Z,ZOU D,XU S,et al.SySeVR:A Framework for Using Deep Learning to Detect Software Vulnerabilities[J].arXiv:1807.06756,2018.
[23]ANTLR4[OL].https://github.com/antlr/antlr4.
[24]Common Weakness Enumeration[OL].https://cwe.mitre.org.
[25]Software Assurance Reference Dataset of National Institute of Standards and Technology[OL].https://samate.nist.gov/SARD.
[26]LI Y C,HUANG R,LAI F G,et al.Open source software vulnerability detection method based on deep clustering[J].Application Research of Computers,2020,37(4):1107-1110,1114.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[3] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[4] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[5] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[7] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[8] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[9] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[10] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[11] FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.
[12] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[13] WEI Kai-xuan, FU Ying. Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising [J]. Computer Science, 2022, 49(8): 120-126.
[14] LIU Dong-mei, XU Yang, WU Ze-bin, LIU Qian, SONG Bin, WEI Zhi-hui. Incremental Object Detection Method Based on Border Distance Measurement [J]. Computer Science, 2022, 49(8): 136-142.
[15] CHEN Kun-feng, PAN Zhi-song, WANG Jia-bao, SHI Lei, ZHANG Jin. Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation [J]. Computer Science, 2022, 49(8): 165-171.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!