Computer Science ›› 2019, Vol. 46 ›› Issue (7): 86-90.doi: 10.11896/j.issn.1002-137X.2019.07.013

• Information Security • Previous Articles     Next Articles

Study on Malicious Program Detection Based on Recurrent Neural Network

WANG Le-le1,WANG Bin-qiang1,LIU Jian-gang2,ZHANG Jian-hui1,MIAO Qi-guang3   

  1. (National Digital Switching System Engineering and Technological Research Center,Zhengzhou 450000,China)1
    (Nanjing Information Technology Institute,Nanjing 210000,China)2
    (Department of Computer Science,Xidian University,Xi’an 710071,China)3
  • Received:2018-09-08 Online:2019-07-15 Published:2019-07-15

Abstract: In view of the low efficiency of traditional malicious program detection and the lack of automatic analysis of malicious programs,this paper studied to use recurrent neural networks to detect and classify malicious programs in deep learning environment.First,the QEMU is used to capture the API and its parameter sequence that are called when the malicious program runs,after the behavior abstraction,the characteristic sequence of the malicious program is formed.Then the feature sequence is mapped to a fixed length word vector by using a logarithmic bilinear model (HLBL),and these word vectors are synthesized into an input matrix of a recursive neural network (RNN).Through the training of the recursive neural network model,a multi-layer semantic aggregation model of malicious programs is established to complete the classification detection of malicious programs.The experimental data show that the recursive neural network model can detect malicious program effectively in the classification of malicious program detection.Compared with the traditional machine learning algorithm,its detection rate has increased by 17%.In particular,when the concept of tensors is introduced,after using the Recursive Neural Tensor Network (RNTN) model,the detection rate is increased by 7% compared to the RNN model by reducing the overall number of parameters and the amount of calculations.The experimental data fully show that the recursive neural network model can complete the detection and classification of malicious programs in big data environment.

Key words: Quick emulator, Hierarchical log-bilinear language model, Word vector, Recursive neural network, Multi-levelsemantic aggregate model

CLC Number: 

  • TP393
[1] 360互联网安全中心.2018年上半年互联网安全报告[EB/OL].www.anquanke.com/post/id/156689.
[2] HINTON G,OSINDERO S,WELLING M,et al.Unsupervised discovery of nonlinear structure using contrastive backpropagation [J].Cognitive Science,2006,30(4):725-731.
[3] LV Y,DUAN Y,KANG W,et al.Traffic Flow Prediction With Big Data:A Deep Learning Approach [J].IEEE Transactions on Intelligent Transportation Systems,2015,16(2):865-873.
[4] CUI Z,XUE F,CAI X,et al.Detection of Malicious Code Va- riants Based on Deep Learning [J].IEEE Transactions on Industrial Informatics,2018,14(7):3187-3196.
[5] DING Y,ZHU S.Malware detection based on deep learning algorithm [J].Neural Computing & Applications,2017(1):1-12.
[6] IDIKA N,MATHUR A P.A survey of malware detection techniques[R].Purdue University,2007.
[7] PEREVOZCHIKOV V A,SHAYMARDANOV T A,CHU- GUNKOV I V.New techniques of malware detection using FTP Honeypot systems[C]∥Young Researchers in Electrical and Electronic Engineering.IEEE,2017:204-207.
[8] YE Y,LI T,ADJEROH D,et al.A survey on malware detection using data mining techniques [J].ACM Computing Surveys (CSUR),2017,50(3):1-40.
[9] MAHINDRU A,SINGH P.Dynamic Permissions based An- droid Malware Detection using Machine Learning Techniques[C]∥Innovations in Software Engineering Conference.ACM,2017:202-210.
[10] BELLARD F.QEMU,a fast and portable dynamic translator [C]∥Conference on Usenix Technical Conference.USENIX Association,2005:41.
[11] HINTON G E.Learning distributed representations of concepts [C]∥Eighth Conference of the Cognitive Science Society.1989.
[12] BENGIO Y,VINCENT P,JANVIN C.A neural probabilistic language model [J].Journal of Machine Learning Research,2003,3(6):1137-1155.
[13] MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥International Conference on Machine Learning.ACM,2007:641-648.
[14] MNIH A,HINTON G.A scalable hierarchical distributed language model[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.2008:1081-1088.
[15] PENNINGTON J,SOCHER R,MANNING C.Glove:Global Vectors for Word Representation[C]∥Conference on Empirical Methods in Natural Language Processing.2014:1532-1543.
[16] SOCHER R,MANNING C D,NG A Y.Learning continuous phrase representations and syntactic parsing with recursive neural networks[C]∥Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.2010:1-9.
[17] SOCHER R,PERELYGIN A,WU J,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]∥Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013:1631-1642.
[1] LI Zhou-jun,WANG Chang-bao. Survey on Deep-learning-based Machine Reading Comprehension [J]. Computer Science, 2019, 46(7): 7-12.
[2] SHAN Yi-dong, WANG Heng-jun, HUANG He, YAN Qian. Study on Named Entity Recognition Model Based on Attention Mechanism——Taking Military Text as Example [J]. Computer Science, 2019, 46(6A): 111-114.
[3] ZHENG Cheng, HONG Tong-tong, XUE Man-yi. BLSTM_MLPCNN Model for Short Text Classification [J]. Computer Science, 2019, 46(6): 206-211.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .
[2] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .
[3] CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing. History and Development Tendency of Data Quality[J]. Computer Science, 2018, 45(4): 1 -10 .
[4] ZHANG Pan-pan, PENG Chang-gen, HAO Chen-yan. Privacy Protection Model and Privacy Metric Methods Based on Privacy Preference[J]. Computer Science, 2018, 45(6): 130 -134 .
[5] JI Hai-juan, ZHOU Cong-hua, LIU Zhi-feng. Symbolic Aggregate Approximation Method of Time Series Based on Beginning and End Distance[J]. Computer Science, 2018, 45(6): 216 -221 .
[6] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Geographic Routing Algorithm Based on Location Prediction in WSN[J]. Computer Science, 2018, 45(5): 59 -63 .
[7] ZHONG Rui, WU Huai-yu, HE Yun. Fast Face Recognition Algorithm Based on Local Fusion Feature and Hierarchical Incremental Tree[J]. Computer Science, 2018, 45(6): 308 -313 .
[8] NIU Jiao-jiao, FAN Min, LI Jin-hai and YIN Yun-qiang. Knowledge Discovery Method for Heterogeneous Data Based on Concept Lattice[J]. Computer Science, 2017, 44(9): 62 -66 .
[9] XU Jian-feng, HE Yu-fan, ZHANG Yuan-jian and TANG Tao. Similarity Algorithm Based on Three Way Decision of Time Warping Distance[J]. Computer Science, 2017, 44(9): 40 -44, 61 .
[10] WU Wei-jian, CHEN Shi-guo,LI Dan. Application of Dual Keeloq Algorithm in Intelligent Access Control System[J]. Computer Science, 2018, 45(6A): 573 -575 .