计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200120-11.doi: 10.11896/jsjkx.250200120

• 信息安全 • 上一篇    下一篇

基于MOBSF_rule的安卓恶意软件检测方法

陈维国1, 张高峰2, 贾晟1, 徐本柱2, 郑利平1   

  1. 1 合肥工业大学计算机与信息学院 合肥 230601
    2 合肥工业大学软件学院 合肥 230601
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 张高峰(g.zhang@hfut.edu.cn)
  • 作者简介:2022111134@mail.hfut.edu.cn
  • 基金资助:
    国家重点研发计划项目(2022YFC3900800)

MOBSF_rule Based Android Malware Detection Method

CHEN Weiguo1, ZHANG Gaofeng2, JIA Sheng1, XU Benzhu2, ZHENG Liping1   

  1. 1 School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China
    2 School of Software,Hefei University of Technology,Hefei 230601,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Key R&D Program of China(2022YFC3900800).

摘要: 在安卓应用安全研究领域的静态分析中,一种有效的方式是使用逆向工程工具对应用程序进行反编译,并从反编译后的代码文件中提取函数调用图(Function Call Graph,FCG)作为恶意软件识别的主要特征,特别是基于敏感API的FCG调用子图已经得到广泛验证。然而,现有的此类基于敏感API的研究工作大多依赖较早的敏感API集,没有随着系统API迭代继续更新。通过实验可以发现,使用传统的敏感API集从应用的函数调用图(FCG)中提取特征节点时,很多情况下无法获取到所需的特征节点。例如随着安卓系统的迭代更新,出现显著的API调整和更换,或者使用反射机制(Reflect)相关技术可以动态隐式调用系统API。对此,文中根据最新的安卓应用综合研究框架,提出了一种基于MOBSF_rule提取FCG子图的安卓恶意软件检测方法。该方法首先从应用程序反编译的代码文件中生成函数调用图(FCG);然后利用MOBSF_rule规则集提取特征节点,生成包含这些特征节点的五点图、六点图和七点图,统计不同构型子图的出现频率;最后把频率矩阵输入机器学习方法中进行训练、推理。相比现有的敏感API集,所提方法有如下优势。1)MOBSF_rule过滤规则集在提取特征节点方面表现出色,能够有效提取包括反射机制、组件交互、签名验证、网络通信和客户端/服务器(C/S)架构通信等关键API特征,对比传统敏感API集,在最新恶意软件数据集中特征提取有效率提升了69.765%。2)MOBSF_rule规则集在不同时间标签下提取特征节点的能力表现出色,具有较强的稳定性。它不仅能够适应安卓系统的持续更新,还能在不同版本之间保持高度一致的特征提取能力。2012-2022年期间,相比传统敏感API集,MOBSF_rule规则集的特征提取有效率的多年总体方差降低了98.747%。3)采用了Stacking集成学习方法,对比随机森林集成学习方法和多层感知机方法,准确率提升了4.32%。

关键词: 安卓, 函数调用图, 敏感API, 反射机制, 集成学习

Abstract: In the field of Android application security research,a highly effective method within static analysis involves utilizing reverse engineering tools to decompile the application and subsequently extracting the Function Call Graph(FCG) from the decompiled code files,which serves as a primary feature for malware identification.Notably,FCG subgraphs based on sensitive APIs have been widely validated.However,the majority of existing research efforts in this area rely on older sets of sensitive APIs and have not continued to update with the iteration of system APIs.Through experimentation,it has been discovered that when using traditional sensitive API sets to extract feature nodes from the application’s FCG,many cases fail to obtain the desired feature nodes.For instance,with the iterative updates of the Android system,there are significant API adjustments and replacements,or dynamic implicit invocation of system APIs can be achieved using reflection mechanism(Reflect)-related technologies.In response to this,based on the latest comprehensive research framework for Android applications,this paper proposes an Android malware detection method that extracts FCG subgraphs using MOBSF_rule.The method first generates the FCG from the decompiled code files of the application.Then,it utilizes the MOBSF_rule set to extract feature nodes,generating five-node,six-node,and seven-node graphs containing these feature nodes,and statistically analyzing the occurrence frequency of different configuration subgraphs.Finally,the frequency matrix is input into the machine learning method for training and inference.Compared to existing sensitive API sets,the proposed method has the following advantages.1)The MOBSF_rule filtering rule set demonstrates outstanding performance in extracting feature nodes,effectively extracting key API features including reflection mechanisms,component interactions,signature verification,network communication,and client/server(C/S) architecture communication.Compared to traditional sensitive API sets,the effective rate of feature extraction in the latest malware datasets has increased by 69.765%.2)The MOBSF_rule set shows excellent capability in extracting feature nodes across different time tags,exhibiting strong stability.It can not only adapt to the continuous updates of the Android system but also maintain a highly consistent feature extraction capability across different versions.Between 2012 and 2022,compared to traditional sensitive API sets,the overall variance in feature extraction effectiveness over multiple years decreases by 98.747%.3)The method employs the Stacking ensemble learning approach,and compared to the random forest ensemble learning method and the multilayer perceptron method,the accuracy rate has increased by 4.32%.

Key words: Android, Function call graph, Sensitive API, Reflection mechanisms, Ensemble Learning

中图分类号: 

  • TP181
[1]AU K W Y,ZHOU Y F,HUANG Z,et al.PScout:analyzing the Android permission specification [J].Proceedings of the 2012 ACM Conference on Computer and communications security,2012:217-228.
[2]ARZT S,RASTHOFER S,BODDEN E.SuSi:A tool for thefully automated classification and categorization of android sources and sinks [R].Darmstadt:University of Darmstadt,2013.
[3]FAN M,LIU J,LUO X,et al.Android malware familial classification and representative sample selection via frequent subgraph analysis [J].IEEE Transactions on Information Forensics and Security,2018,13(8):1890-1905.
[4]OU F,XU J.S3Feature:A static sensitive subgraph-based feature for android malware detection [J].Computers & Security,2022,112:102513.
[5]LIU Z,WANG R,JAPKOWICZ N,et al.SeGDroid:An Android malware detection method based on sensitive function call graph learning [J].Expert Systems with Applications,2024,235:121125.
[6]ARP D,SPREITZENBARTH M,HUBNER M,et al.Drebin:Effective and explainable detection of android malware in your pocket [C]//Proceedings of the Ndss 2014.2014:23-26.
[7]BRUZZESE R.Building visual malware dataset using Vir-usShare data and comparing machine learning baseline model to CoAtNet for malware classification [C]//Proceedings of the 2024 16th International Conference on Machine Learning and Computing.2024:185-193.
[8]QURESHI M A,GILL A M,SADAF M.APK Insight:Revolutionizing Forensic Analysis with a User-Friendly Approach [C]//2024 International Conference on Engineering & Computing Technologies(ICECT).IEEE,2024:1-6.
[9]BHOOSHAN P,SONKAR N.Comprehensive Android Malware Detection:Leveraging Machine Learning and Sandboxing Techniques Through Static and Dynamic Analysis [C]//2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems(MASS).IEEE,2024:580-585.
[10]HALL-ANDERSEN M,SIMKIN M,WAGNER B.FRIDA:Data availabilitysampling from FRI [C]//Annual International Cryptology Conference.Cham:Springer Nature Switzerland,2024:289-324.
[11]KHAN S A,ADNAN M,ALI A,et al.An Android applications vulnerability analysis usingMobSF [C]//International Confe-rence on Engineering & Computing Technologies(ICECT 2024).IEEE,2024:1-7.
[12]YANG C,XU Z,GU G,et al.Droidminer:Automated mining and characterization of fine-grained malicious behaviors in android applications [C]//19th European Symposium on Research in Computer Security(ESORICS 2014).Wroclaw,Poland,Part I.Wroclaw:Springer International Publishing,2014:163-182.
[13]YANG W,XIAO X,ANDOW B,et al.Appcontext:Differentiating malicious and benign mobile app behaviors using context [C]//Proceedings of the 37th IEEE/ACM International Conference on Software Engineering.IEEE,2015:303-313.
[14]PENDLEBURY F,PIERAZZI F,JORDANEY R,et al.TES-SERACT:Eliminating experimental bias in malware classification across space and time [C]//Proceedings of the 28th USENIX Conference on Security Symposium(SEC ’19).USA:USENIX Association,2019:729-746.
[15]ZHANG X,ZHANG Y,ZHONG M,et al.Enhancing state-of-the-art classifiers with API semantics to detect evolvedAndroid malware [C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:757-770.
[16]CAI M,JIANG Y,GAO C,et al.Learning features from en-hanced function call graphs for Android malware detection [J].Neurocomputing,2021,423:301-307.
[17]LO WW,LAYEGHY S,SARHAN M,et al.Graph neural network-based Android malware classification with jumping knowledge [C]//IEEE Conference on Dependable and Secure Computing(DSC 2022).IEEE,2022:1-9.
[18]WANG L,WANG H,HE R,et al.MalRadar:Demystifying Android malware in the new era [J].Proceedings of the ACM on Measurement and Analysis of Computing Systems,2022,6(2):1-27.
[19]AZEEM M,KHAN D,IFTIKHAR S,et al.Analyzing and comparing the effectiveness of malware detection:A study of machine learning approaches [J].Heliyon,2024,10(1).
[20]BAI KV,THIRUMARAN M.Hybrid Deep Learning and Behavioral Analysis for Enhanced Malware Detection in Banking [C]//8th International Conference on Electronics,Communication and Aerospace Technology(ICECA 2024).IEEE,2024:1168-1173.
[21]ALLIX K,BISSYANDÉ T F,KLEIN J,et al.Androzoo:Collecting millions of Android apps for the research community [C]//Proceedings of the 13th International Conference on Mining Software Repositories.2016:468-471.
[22]SUN Z,WANG G,LI P,et al.An improved random forest based on the classification accuracy and correlation measurement of decision trees [J].Expert Systems with Applications,2024,237:121549.
[23]DESAI M,SHAH M.An anatomizationon breast cancer detection and diagnosis employing multi-layer perceptron neural network(MLP) and Convolutional neural network(CNN) [J].Clinical eHealth,2021,4:1-1.
[24]KOOPIALIPOOR M,ASTERIS P G,MOHAMMED A S,et al.Introducing stacking machine learning approaches for the prediction of rock deformation [J].Transportation Geotechnics,2022,34:100756.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!