计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 388-395.doi: 10.11896/jsjkx.230100002

• 信息安全 • 上一篇    

基于GCN和BiLSTM的Android恶意软件检测方法

贺娇君, 蔡满春, 芦天亮   

  1. 中国人民公安大学信息网络安全学院 北京100038
  • 收稿日期:2023-01-03 修回日期:2023-04-03 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 蔡满春(caimanchun@ppsuc.edu.cn)
  • 作者简介:(hjj8138@163.com)
  • 基金资助:
    中国人民公安大学2022年基科费项目(2022JKF02009);公共安全风险防控与应急技术装备重点专项(20200017)

Android Malware Detection Method Based on GCN and BiLSTM

HE Jiaojun, CAI Manchun, LU Tianliang   

  1. School of Information Cyber Security,People's Public Security University of China,Beijing 100038,China
  • Received:2023-01-03 Revised:2023-04-03 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    Major Project of Basic Scientific Research Expenses of People's Public Security University of China in 2022(2022JKF02009)and Key Project of Public Security Risk Prevention,Control and Emergency Technical Equipment(20200017).

摘要: 现有Android恶意软件检测方法大多是对单一结构类型的特征进行学习,在分析应用程序语义方面有所缺失。针对传统检测方法捕获特征语义不够全面的问题,文中创新性地提出了一种基于GCN和BiLSTM的Android恶意软件检测模型,在精准提取样本结构信息的同时对恶意行为语义进行重点分析。首先以图的方式表征26类关键系统调用间的拓扑关系,使用双层GCN网络聚合系统调用图中节点的高阶结构信息,有效提高特征学习效率;然后利用带有自注意力机制的BiLSTM网络获取操作码序列的上下文语义,通过为具有恶意特征的序列赋予高权重得到特征内部的强相关性;最后使用Softmax输出融合结构信息和上下文特征的样本分类概率。在基于Drebin和AndroZoo数据集的实验中,所提模型准确率达到了93.95%,F1值达到了97.09%,相较于基准算法有显著提高,充分证明了基于GCN和BiLSTM的模型能有效提升Android恶意软件的检测效果。

关键词: Android, 恶意软件检测, GCN, BiLSTM

Abstract: Most of the existing Android malware detection methods learn features of a single structure type,and there are shortcomings in analyzing application semantics.Aiming at the problem that the traditional detection methods are not comprehensive enough in capturing feature semantics,this paper innovatively proposes an Android malware detection model based on GCN and BiLSTM.At the same time,the semantic of malicious behavior is analyzed emphatically while the sample structure information is extracted accurately.Firstly,the topological relationship between 26 types of key system calls is represented in the graph,and the two-layer GCN network is used to aggregate the high-order structure information of nodes in the system call graph to effectively improve the feature learning efficiency.Then,the BiLSTM network with self-attention mechanism is used to obtain the context semantics of opcode sequence.By assigning high weights to sequences with malicious features,the strong correlation within features is obtained.Finally,Softmax is used to output the sample classification probability fused with structural information and context features.In the experiments based on Drebin and AndroZoo datasets,the accuracy of the proposed model reaches 93.95%,and the F1 value reaches 0.97,which is significantly improved compared with the benchmark algorithm.It fully proves that the proposed model based on GCN and BiLSTM can effectively discriminate the properties of applications and improve the detection effect of Android malware.

Key words: Android, Malware detection, GCN, BiLSTM

中图分类号: 

  • TP309
[1]360 Internet Security Center.2021 China Mobile Phone Security Status Report.[EB/OL].(2022-01-25)[2022-02-08].https://pop.shouji.360.cn/safe_report/Mobile-Security-Report-202112.pdf.
[2]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[3]YAO J P,YUAN C,LI X J,et al.Interpretive subgraph generation model for knowledge graph link prediction task[J].Application Research of Computers,2024,41(2):357-380.
[4]PFEIFER B,SARANTI A,HOLZINGER A.Gnn-subnet:Di-sease subnetwork detection with explainable graph neural networks[J].Bioinformatics,2022,38(Supplement_2):ii120-ii126.
[5]LI K,HUANG Z H.Noise Filtering and Feature Enhancement Based Graph Neural Network Method for Fraud Detection.[J].Acta Electronica Sinica,2023,51(11):3053-3060.
[6]LI L X.Research and implementation of heterogeneous graphembedding for Android malware detection[D].Beijing:Beijing University of Posts and Telecommunications,2021.
[7]FAN Y,HOU S,ZHANG Y,et al.Gotcha-Sly malware! scor-pion a metagraph2vec based malware detection system[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:253-262.
[8]YE Y,HOU S,CHEN L,et al.Out-of-sample node representation learning for heterogeneous graph inreal-time android malware detection[C]//28th International Joint Conference on Artificial Intelligence(IJCAI).2019.
[9]HOU S,YE Y,SONG Y,et al.Hindroid:An intelligent android malware detection system based on structured heterogeneous information network[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:1507-1515.
[10]JOHN T S,THOMAS T,EMMANUEL S.Graph convolutional networks for Android malware detection with system call graphs[C]//2020 Third ISEA Conference on Security and Privacy(ISEA-ISAP).IEEE,2020:162-170.
[11]WU Y,ZOU D,YANG W,et al.HomDroid:detecting Android covert malware by social-network homophily analysis[C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.2021:216-229.
[12]MCLAUGHLIN N,MARTINEZ DEL RINCON J,KANG B J,et al.Deep android malware detection[C]//Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy.2017:301-308.
[13]LI J L,WANG Y Z,LUO L G,et al.A Survey of Adversarial Attack Techniques for Android Malware Detection[J].Journal of Cyber Security,2021,6(4):28-43.
[14]TANG Y W,LIU X.A Malicious Code Detection Method Based On Bi-Lstm And Self-Attention[J].Computer Applications and Software,2021,38:327-329.
[15]ONWUZURIKE L,MARICONTI E,ANDRIOTIS P,et al.Mamadroid:Detecting android malware by building markov chains of behavioral models(extended version)[J].ACM Transa-ctions on Privacy and Security(TOPS),2019,22(2):1-34.
[16]LIU H,ZHENG C,LI D,et al.Multi-perspective social recommendation method with graph representation learning[J].Neurocomputing,2022,468:469-481.
[17]VELIČKOVIĆ P,CUCURULL G,CASANOVA A,et al.Graph attention networks[J].arXiv:1710.10903,2017.
[18]HAMILTON W L,YING R,LESKOVEC J.Inductive rep-resentation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1025-1035.
[19]ZHANG X,ZHANG Y,ZHONG M,et al.Enhancing state-of-the-art classifiers with api semantics to detect evolved android malware[C]//Proceedings of the 2020 ACM SIGSAC Confe-rence on Computer and Communications Security.2020:757-770.
[20]WU Y,LI X,ZOU D,et al.Malscan:Fast market-wide mobile malware scanning by social-network centrality analysis[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2019:139-150.
[21]KIPF T N.WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[22]HOU S,FAN Y,JU M,et al.Disentangled representation lear-ning in heterogeneous information network for large-scale android malware detection in the covid-19 era and beyond[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7754-7761.
[23]GAO H,CHENG S,ZHANG W.GDroid:Android malware detection and classification with graph convolutional network[J].Computers & Security,2021,106:102264.
[24]YE Z,KUMAR Y J,SING G O,et al.A comprehensive survey of graph neural networks for knowledge graphs[J].IEEE Access,2022,10:75729-75741.
[25]LI Q,HAN Z,WU X M.Deeper insights into graph convolu-tional networks for semi-supervised learning[C]//Thirty-Se-cond AAAI Conference on Artificial Intelligence.2018.
[26]BAYAZIT E C,SAHINGOZ O K,DOGAN B.A Deep Learning Based Android Malware Detection System with Static Analysis[C]//2022 International Congress on Human-Computer Interaction,Optimization and Robotic Applications(HORA).IEEE,2022:1-6.
[27]ARP D,SPREITZENBARTH M,HUBNER M,et al.Drebin:Effective and explainable detection of android malware in your pocket[C]//NDSS.2014,14:23-26.
[28]ALLIX K,BISSYANDÉ T F,KLEIN J,et al.Androzoo:Collecting millions of android apps for the research commu-nity[C]//2016 IEEE/ACM 13th Working Conference on Mining Software Repositories(MSR).IEEE,2016:468-471.
[29]XU C,MCAULEY J.A survey on model compression for natural language processing[J].arXiv:2202.07105,2022.
[30]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!