Computer Science ›› 2019, Vol. 46 ›› Issue (12): 132-137.doi: 10.11896/jsjkx.181102171

• Information Security • Previous Articles     Next Articles

Attention Mechanism Based Detection of Malware Call Sequences

ZHANG Lan, LAI Yao, YE Xiao-jun   

  1. (School of Software,Tsinghua University,Beijing 100084,China)
  • Received:2018-11-25 Online:2019-12-15 Published:2019-12-17

Abstract: Typical machine learning approaches,which learn a classifier based on hand crafted features,are not sufficiently robust.Attackers can reorder the malware code or insert useless code to avoid detection.Aiming at the problems of the large number of malware,confusion technology progress and the cost of artificially constructed feature in the Internet environment,this paper proposed a different malware detection approach G2ATTbased on API call sequence and attention mechanism in natural language process.First,dynamic API call sequences are extracted by using the sandbox environment and split them into several subsequences by using a sliding window.Then,the concept of multi-instance learning and attention mechanism are introduced to design the hierarchical feature extraction neural networks.Recurrent neural networks are used for API-level features.Two attention mechanism are combined to extract window-level features and sequence-level features.Then,those sequence-level features are used for malware detection.Ultimately,the model is trained and used to detect malware.The experimental results based on real dataset show that the window-level feature extraction layer learns effectively attention scores in the subsequences.In addition,the sequence-level feature extraction layer improves the performance of malware detection model on precision and recall by calculating attention scores across the subsequences.G2ATT achieves 98.19% on detection accuracy rate,98.78% on precision rate,97.60% on recall rate and 99% on AUC (Area Under the Curve of ROC),which improves by 10% compared with othermachine learning approaches based on API call sequences on detection accuracy.

Key words: API, Attention mechanism, Deep learning, Malware detection

CLC Number: 

  • TP309.5
[1]HU G,VENUGOPAL D.A malware signature extraction and detection method applied to mobile networks[C]//IEEE Internationl Conference on Performance,Computing,and Communications Conference,2007(IPCCC 2007).IEEE,2007:19-26.
[2]ZHU P B.Research on malware detection using machine lear- ning[D].Beijing:Beijing University of Posts and Telecommani Cations,2018.(in Chinese)
朱鹏博.基于机器学习算法的恶意代码检测技术研究[D].北京:北京邮电大学,2018.
[3]WANG R,FENG D G,YANG Y,et al.Semantics-Based Mal- ware Behavior Signature Extraction and Detection Method[J].Journal of Software,2012,23(2):378-393.(in Chinese)
王蕊,冯登国,杨轶,等.基于语义的恶意代码行为特征提取及检测方法[J].软件学报,2012,23(2):378-393.
[4]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[5]SAXE J,BERLIN K.Deep neural network based malware detection using two dimensional binary program features[C]//2015 10th International Conference on Malicious and Unwanted Software (MALWARE).IEEE,2015:11-20.
[6]ARP D,SPREITZENBARTH M,HUBNER M,et al.DREBIN:Effective and Explainable Detection of Android Malware in Your Pocket[C]//Network and Distributed System Security Sympo-sium.San Diego,CA,2014,14:23-26.
[7]NATARAJ L,KARTHIKEYAN S,JACOB G,et al.Malware images:visualization and automatic classification[C]//Procee-dings of the 8th International Symposium on Visualization for Cyber Security.ACM,2011:4.
[8]KOLOSNJAJI B,ZARRAS A,WEBSTER G,et al.Deep lear- ning for classification of malware system call sequences[C]//Australasian Joint Conference on Artificial Intelligence.Cham:Springer,2016:137-149.
[9]XU J Y,SUNG A H,CHAVEZ P,et al.Polymorphic malicious executable scanner by API sequence analysis[C]//Fourth International Conference on Hybrid Intelligent Systems,2004(HIS’04).IEEE,2004:378-383.
[10]TOBIYAMA S,YAMAGUCHI Y,SHIMADA H,et al.Malware detection with deep neural network using process behavior[C]//2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).IEEE,2016,2:577-582.
[11]ROSENBERG I,SHABTAI A,ROKACH L,et al.Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers[C]//International Symposium on Research in Attacks,Intrusions,and Defenses.Cham:Springer,2018:490-510.
[12]ZHOU P,SHI W,TIAN J,et al.Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.IEEE,2016:207-212
[13]LIN Y,SHEN S,LIU Z,et al.Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.IEEE,2016:2124-2133.
[14]CHO K,VAN MERRIENBOER B,GULCEHRE C,et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[J].arXiv:1406.1078.
[15]DIEDERIK K,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[16]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:A Simple Way to Prevent Neural Networks from Overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[17]PASCANU R,STOKES J W,SANOSSIAN H,et al.Malware classification with recurrent networks[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).IEEE,2015:1916-1920.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[5] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[6] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[7] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[8] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[9] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[12] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[13] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[14] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[15] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!