计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 95-101.doi: 10.11896/j.issn.1002-137X.2019.02.015

• 信息安全 • 上一篇    下一篇

多维敏感特征的Android恶意应用检测

谢念念1, 曾凡平1,2, 周明松1, 秦晓霞1, 吕成成1, 陈钊1   

  1. 中国科学技术大学计算机科学与技术学院 合肥2300261
    安徽省计算与通讯软件重点实验室 合肥2300262
  • 收稿日期:2018-01-18 出版日期:2019-02-25 发布日期:2019-02-25
  • 通讯作者: 曾凡平(1967-),男,博士,副教授,主要研究方向为信息安全,E-mail:billzeng@ustc.edu.cn
  • 作者简介:谢念念(1993-),女,硕士生,主要研究方向为信息安全;周明松(1994-),男,硕士生,主要研究方向为信息安全;秦晓霞(1994-),女,硕士生,主要研究方向为信息安全;吕成成(1994-),男,硕士生,主要研究方向为信息安全;陈 钊(1995-),男,硕士生,主要研究方向为信息安全。
  • 基金资助:
    本文受国家自然科学基金项目(61772487)资助。

Android Malware Detection with Multi-dimensional Sensitive Features

XIE Nian-nian1, ZENG Fan-ping1,2, ZHOU Ming-song1, QIN Xiao-xia1, LV Cheng-cheng1, CHEN Zhao1   

  1. School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China1
    Anhui Province Key Lab of Software in Computing and Communication,Hefei 230026,China2
  • Received:2018-01-18 Online:2019-02-25 Published:2019-02-25

摘要: 应用程序的行为语义在Android恶意应用检测中起着关键作用。为了区分应用的行为语义,文中提出适合用于Android恶意应用检测的特征和方法。首先定义广义敏感API,强调要考虑广义敏感API的触发点是否与UI事件相关,并且要结合应用实际使用的权限。该方法将广义敏感API及其触发点抽象为语义特征,将应用实际使用的权限作为语法特征,再利用机器学习分类方法自动检测应用是否具有恶意性。在13226个样本上进行了对比实验,实验结果表明,该方法的分析速度快且开销小,选取的特征集使Android恶意应用检测得到很好的结果;经机器学习分类技术的比较,我们选择随机森林作为检测方案中的分类技术,所提特征策略的分类准确率达到96.5%,AUC达到0.99,恶意应用的分类精度达到98.8%。

关键词: 安卓恶意应用检测, 机器学习, 静态分析, 语法特征, 语义特征

Abstract: The behavior semantics of applications play a key role in Android malware detection.In order to distinguish the behavior semantics of applications,this paper presented suitable features and method for Android malware detection.This paper first defined the generalizdd-sensitive API,and emphasized to consider whether the trigger point of the generalized-sensitive API is UI-related as well as combined the really-used permission.The approach first abstracts the generalized-sensitive API and their trigger points as the semantic feature,extracts the really-used permission as the syntax feature,and then leverages machine learning-based classification method to automatically detect whether the application is benign or malicious.Comparative experiments were conducted on 13226 samples.The experimental results show that the proposed approach costs little time and the feature set is reasonable,and it can get good classification results.Through the comparison of several machine learning-based techniques,Random Forest is chosen as the classification method,and the results demanstrate that the accuracy achieves 96.5%,AUC reaches 0.99,and a classification precision of malware reaches 98.8%.

Key words: Android malware detection, Machine learning, Semantic feature, Static analysis, Syntax feature

中图分类号: 

  • TP311
[1]ENCK W,GILBERT P,HAN S,et al.TaintDroid:an information-flow tracking system for realtime privacy monitoring on smartphones[J].ACM Transactions on Computer Systems (TOCS),2014,32(2):1-29.
[2]WEI F,ROY S,OU X.Amandroid:A precise and general inter-component data flow analysis framework for security vetting of android apps[C]∥Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.ACM,2014:1329-1341.
[3]YAN L K,YIN H.DroidScope:Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis[C]∥USENIX Security Symposium.2012:569-584.
[4]ARZT S,RASTHOFER S,FRITZ C,et al.FlowDroid:Precise context,flow,field,object-sensitive and lifecycle-aware taint analysis for Android apps[J].Acm Sigplan Notices,2014,49(6):259-269.
[5]LI L,BARTEL A,BISSYANDE T F,et al.Iccta:Detecting inter-component privacy leaks in android apps[C]∥Proceedings of the 37th International Conference on Software Engineering-Vo-lume 1.IEEE Press,2015:280-291.
[6]ARP D,SPREITZENBARTH M,HUBNER M,et al.DREBIN:Effective and Explainable Detection of Android Malware in Your Pocket[C]∥ Network and Distributed System Security Symposium.2014.
[7]YANG W,XIAO X,ANDOW B,et al.AppContext:Differentiating malicious and benign mobile app behaviors using context[C]∥ IEEE/ACM,IEEE International Conference on Software Engineering.IEEE,2015:303-313.
[8]PENG H,GATES C,SARMA B,et al.Using probabilistic ge- nerative models for ranking risks of android apps[C]∥Procee-dings of the 2012 ACM conference on Computer and communications security.ACM,2012:241-252.
[9]MOONSAMY V,RONG J,LIU S.Mining permission patterns for contrasting clean and malicious android applications[J].Future Generation Computer Systems,2014,36:122-132.
[10]TALHA K A,ALPER D I,AYDIN C.APK Auditor:Permission-based Android malware detection system[J].Digital Investigation,2015,13:1-14.
[11]CEN L,GATES C S,SI L,et al.A probabilistic discriminative model for android malware detection with decompiled source code[J].IEEE Transactions on Dependable and Secure Computing,2015,12(4):400-412.
[12]AAFER Y,DU W,YIN H.DroidAPIMiner:Mining API-Level Features for Robust Malware Detection in Android[C]∥ International Conference on Security and Privacy in Communication Systems.Springer International Publishing,2013:86-103.
[13]YERIMA S Y,SEZER S,MCWILLIAMS G,et al.A new android malware detection approach using bayesian classification[C]∥2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).IEEE,2013:121-128.
[14]FENG Y,ANAND S,DILLIG I,et al.Apposcopy:semantics- based detection of Android malware through static analysis[C]∥ ACM Sigsoft International Symposium on Foundations of Software Engineering.ACM,2014:576-587.
[15]ZHANG M,DUAN Y,YIN H,et al.Semantics-Aware Android Malware Classification Using Weighted Contextual API Depen-dency Graphs[C]∥ ACM.2014:1105-1116.
[16]GASCON H,YAMAGUCHI F,ARP D,et al.Structural detection of android malware using embedded call graphs[C]∥ ACM Workshop on Artificial Intelligence and Security.ACM,2013:45-54.
[17]MIAO X C,WANG R,XU L,et al.Security Analysis for Android Applications Using Sensitive Path Identification [J].Journal of Software,2017,28(9):2248-2263.(in Chinese)
缪小川,汪睿,许蕾,等.使用敏感路径识别方法分析安卓应用安全性[J].软件学报,2017,28(9):2248-2263.
[18]AU K W Y,ZHOU Y F,HUANG Z,et al.PScout:analyzing the Android permission specification[C]∥ ACM Conference on Computer and Communications Security.ACM,2012:217-228.
[19]LI L,OCTEAU D,KLEIN J.DroidRA:taming reflection to support whole-program analysis of Android apps[C]∥Internatio-nal Symposium on Software Testing and Analysis.ACM,2016:318-329.
[20]FRANK E,HALL M,HOLMES G,et al.Weka-a machine learning workbench for data mining[M]∥Data Mining and Knowledge Discovery Handbook.Springer,Boston,MA,2009:1269-1277.
[21]RASTHOFER S,ARZT S,BODDEN E.A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks[C]∥Proc.NDSS,2014.
[22]ZHOU Y,JIANG X.Dissecting Android Malware:Characterization and Evolution[C]∥ IEEE Symposium on Security and Privacy.IEEE Computer Society,2012:95-109.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[8] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[9] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[10] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[11] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[12] 赵静文, 付岩, 吴艳霞, 陈俊文, 冯云, 董继斌, 刘嘉琪.
多线程数据竞争检测技术研究综述
Survey on Multithreaded Data Race Detection Techniques
计算机科学, 2022, 49(6): 89-98. https://doi.org/10.11896/jsjkx.210700187
[13] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[14] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[15] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!