计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 86-90.doi: 10.11896/j.issn.1002-137X.2019.07.013
王乐乐1,汪斌强1,刘建港2,张建辉1,苗启广3
WANG Le-le1,WANG Bin-qiang1,LIU Jian-gang2,ZHANG Jian-hui1,MIAO Qi-guang3
摘要: 针对传统恶意程序检测判定效率低及自动分析恶意程序能力不足的问题,在深度学习环境下,研究利用递归神经网络进行恶意程序的检测分类的问题。首先,用快速模拟器(Quick Emulator,QEMU)捕获到恶意程序运行时所调用的API及其参数序列,经过行为抽象,形成恶意程序的特征序列。然后使用对数化的双线性模型(Hierarchical Log-bilinear Language Model,HLBL)将特征序列映射成固定长度的词向量,并将这些词向量合成递归神经网络(Recursive Neural Network,RNN)所需要的输入矩阵。通过对递归神经网络模型的训练,建立恶意程序的多层语义聚合模型,完成对恶意程序的分类检测。实验数据表明,递归神经网络模型在恶意程序检测分类中能够有效地检测出恶意程序,与传统机器学习算法相比,其检测率提高了17%。特别是在引入张量(Tensor)的概念,采用递归张量神经网络(Recursive Neural Tensor Network,RNTN)模型后,通过降低整体的参数数量和计算量,使检测率较RNN模型又提高了7%。实验数据充分说明,采用递归神经网络模型完全可以完成大数据环境下恶意程序的检测分类任务。
中图分类号:
[1]360互联网安全中心.2018年上半年互联网安全报告[EB/OL].www.anquanke.com/post/id/156689. [2]HINTON G,OSINDERO S,WELLING M,et al.Unsupervised discovery of nonlinear structure using contrastive backpropagation [J].Cognitive Science,2006,30(4):725-731. [3]LV Y,DUAN Y,KANG W,et al.Traffic Flow Prediction With Big Data:A Deep Learning Approach [J].IEEE Transactions on Intelligent Transportation Systems,2015,16(2):865-873. [4]CUI Z,XUE F,CAI X,et al.Detection of Malicious Code Va- riants Based on Deep Learning [J].IEEE Transactions on Industrial Informatics,2018,14(7):3187-3196. [5]DING Y,ZHU S.Malware detection based on deep learning algorithm [J].Neural Computing & Applications,2017(1):1-12. [6]IDIKA N,MATHUR A P.A survey of malware detection techniques[R].Purdue University,2007. [7]PEREVOZCHIKOV V A,SHAYMARDANOV T A,CHU- GUNKOV I V.New techniques of malware detection using FTP Honeypot systems[C]∥Young Researchers in Electrical and Electronic Engineering.IEEE,2017:204-207. [8]YE Y,LI T,ADJEROH D,et al.A survey on malware detection using data mining techniques [J].ACM Computing Surveys (CSUR),2017,50(3):1-40. [9]MAHINDRU A,SINGH P.Dynamic Permissions based An- droid Malware Detection using Machine Learning Techniques[C]∥Innovations in Software Engineering Conference.ACM,2017:202-210. [10]BELLARD F.QEMU,a fast and portable dynamic translator [C]∥Conference on Usenix Technical Conference.USENIX Association,2005:41. [11]HINTON G E.Learning distributed representations of concepts [C]∥Eighth Conference of the Cognitive Science Society.1989. [12]BENGIO Y,VINCENT P,JANVIN C.A neural probabilistic language model [J].Journal of Machine Learning Research,2003,3(6):1137-1155. [13]MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥International Conference on Machine Learning.ACM,2007:641-648. [14]MNIH A,HINTON G.A scalable hierarchical distributed language model[C]∥International Conference on Neural Information Processing Systems.Curran Associates Inc.2008:1081-1088. [15]PENNINGTON J,SOCHER R,MANNING C.Glove:Global Vectors for Word Representation[C]∥Conference on Empirical Methods in Natural Language Processing.2014:1532-1543. [16]SOCHER R,MANNING C D,NG A Y.Learning continuous phrase representations and syntactic parsing with recursive neural networks[C]∥Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.2010:1-9. [17]SOCHER R,PERELYGIN A,WU J,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]∥Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013:1631-1642. |
[1] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[2] | 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071 |
[3] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[4] | 刘硕, 王庚润, 彭建华, 李柯. 基于混合字词特征的中文短文本分类算法 Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words 计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027 |
[5] | 刘凯, 张宏军, 陈飞琼. 基于领域适应嵌入的军事命名实体识别 Name Entity Recognition for Military Based on Domain Adaptive Embedding 计算机科学, 2022, 49(1): 292-297. https://doi.org/10.11896/jsjkx.201100007 |
[6] | 杨进才, 曹元, 胡泉, 沈显君. 基于Transformer模型与关系词特征的汉语因果类复句关系自动识别 Relation Classification of Chinese Causal Compound Sentences Based on Transformer Model and Relational Word Feature 计算机科学, 2021, 48(6A): 295-298. https://doi.org/10.11896/jsjkx.200500019 |
[7] | 杨青, 张亚文, 朱丽, 吴涛. 基于注意力机制和BiGRU融合的文本情感分析 Text Sentiment Analysis Based on Fusion of Attention Mechanism and BiGRU 计算机科学, 2021, 48(11): 307-311. https://doi.org/10.11896/jsjkx.201000075 |
[8] | 张玉帅, 赵欢, 李博. 基于BERT和BiLSTM的语义槽填充 Semantic Slot Filling Based on BERT and BiLSTM 计算机科学, 2021, 48(1): 247-252. https://doi.org/10.11896/jsjkx.191200088 |
[9] | 程婧, 刘娜娜, 闵可锐, 康昱, 王新, 周扬帆. 一种低频词词向量优化方法及其在短文本分类中的应用 Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification 计算机科学, 2020, 47(8): 255-260. https://doi.org/10.11896/jsjkx.191000163 |
[10] | 李舟军,范宇,吴贤杰. 面向自然语言处理的预训练技术研究综述 Survey of Natural Language Processing Pre-training Techniques 计算机科学, 2020, 47(3): 162-173. https://doi.org/10.11896/jsjkx.191000167 |
[11] | 霍丹, 张生杰, 万路军. 基于上下文的情感词向量混合模型 Context-based Emotional Word Vector Hybrid Model 计算机科学, 2020, 47(11A): 28-34. https://doi.org/10.11896/jsjkx.191100114 |
[12] | 景丽, 李曼曼, 何婷婷. 结合扩充词典与自监督学习的网络评论情感分类 Sentiment Classification of Network Reviews Combining Extended Dictionary and Self-supervised Learning 计算机科学, 2020, 47(11A): 78-82. https://doi.org/10.11896/jsjkx.200400061 |
[13] | 李苑,李智星,滕磊,王化明,王国胤. 基于注意力机制的评论情感分析及情感词检测 Comment Sentiment Analysis and Sentiment Words Detection Based on Attention Mechanism 计算机科学, 2020, 47(1): 186-192. https://doi.org/10.11896/jsjkx.181002011 |
[14] | 杨丹浩,吴岳辛,范春晓. 一种基于注意力机制的中文短文本关键词提取模型 Chinese Short Text Keyphrase Extraction Model Based on Attention 计算机科学, 2020, 47(1): 193-198. https://doi.org/10.11896/jsjkx.181202261 |
[15] | 李舟军,王昌宝. 基于深度学习的机器阅读理解综述 Survey on Deep-learning-based Machine Reading Comprehension 计算机科学, 2019, 46(7): 7-12. https://doi.org/10.11896/j.issn.1002-137X.2019.07.002 |
|