计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 145-151.doi: 10.11896/j.issn.1002-137X.2019.02.023

• 信息安全 • 上一篇    下一篇

基于N-gram 的Android恶意检测

章宗美1, 桂盛霖1,2, 任飞2   

  1. 电子科技大学计算机科学与工程学院 成都6117311
    中国电子科技集团公司第三十研究所 成都 6100412
  • 收稿日期:2018-01-18 出版日期:2019-02-25 发布日期:2019-02-25
  • 通讯作者: 章宗美(1993-),男,硕士生,主要研究方向为Android安全研究,E-mail:zach_41@163.com;桂盛霖(1983-),男,博士,副教授,CCF会员,主要研究方向为网络空间安全,E-mail:shenglin_gui@uestc.edu.cn
  • 作者简介:任 飞(1983-),男,硕士,高级工程师,主要研究方向为密码算法。基于N-gram的Android
  • 基金资助:
    本文受国家自然科学基金(61401067)资助。

Android Malware Detection Based on N-gram

ZHANG Zong-mei1, GUI Sheng-lin1,2, REN Fei2   

  1. School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China1
    The 30th Institute of China Electronics Technology Group Corporation,Chengdu 610041,China2
  • Received:2018-01-18 Online:2019-02-25 Published:2019-02-25

摘要: 随着Android系统的广泛应用,Android平台下的恶意应用层出不穷,并且恶意应用躲避现有检测工具的手段也越来越复杂,亟需更有效的检测技术来分析恶意行为。文中提出并设计了一种基于N-gram的静态恶意检测模型,该模型通过逆向手段反编译Android APK文件,利用N-gram技术在字节码上提取特征,以此避免传统检测中专家知识的依赖。同时,该模型使用深度置信网络,能够快速而准确地学习训练。通过对1267个恶意样本和1200个善意样本进行测试,结果显示模型整体的检测准确率最高可以达到98.34%。实验进一步比较了该模型和其他算法的检测结果,并对比了相关工作的检测效果,结果表明该模型有更好的准确率和鲁棒性。

关键词: Android应用, N-gram, 恶意检测, 静态检测, 深度置信网络

Abstract: With the widespread use of Android operating system,malicious applications are constantly emerging on the Android platform,meanwhile,the means by which malicious applications evade existing detection tools are becoming increasingly complicated.In order to effectively analyze malicious behavior,more efficient detection technology is required.This paper presented and designed a static malicious detection model based on N-gram technology.The model decompiles Android APK files by reversing engineering and uses N-gram technology to extract features from bytecodes.In this way,the model avoids dependence on expert knowledge in traditional detection.At the same time,the model combines with deep belief network,which allows it to rapidly and accurately train and detect application samples.1267 malicious samples and 1200 benign samples were tested.The results show that the overall accuracy is up to 98.34%.Further more,the results of the model were compared with those of other machine learning algorithms,and the detection results of the related work were also compared.The results show that the model has better accuracy and robustness.

Key words: Android application, Deep belief network, Malware detection, N-gram, Static detection

中图分类号: 

  • TP309.5
[1]TAM K,FEIZOLLAH A,ANUAR N B,et al.The evolution of android malware and android analysis techniques[J].ACM Computing Surveys (CSUR),2017,49(4):76.
[2]ZHOU Y,JIANG X.Dissecting android malware:Characterization and evolution[C]∥2012 IEEE Symposium on Security and Privacy (SP).IEEE,2012:95-109.
[3]QING S H.Research progress on Android security[J].Journal of Software,2016,27(1):45-71.(in Chinese)
卿斯汉.Android安全研究进展[J].软件学报,2016,27(1):45-71.
[4]DESNOS A,GUEGUEN G.Android:From reversing to decompilation[C]∥Proceedings of Black Hat Abu Dhabi.2011:77-101.
[5]LI T,DONG H,YUAN C Y,et al.Description of Android malware feature based on Dalvik instructions[J].Journal of Computer Research and Development,2014,51(7):1458-1466.(in Chinese)
李挺,董航,袁春阳,等.基于Dalvik指令的Android恶意代码特征描述及验证[J].计算机研究与发展,2014,51(7):1458-1466.
[6]HOU S,DU Y H,LU T L,et al.Research on Android permission detection mechanism based on K-means algorithm[J].Application Research of Computers,2018,35(4):1165-1168.(in Chinese)
侯苏,杜彦辉,芦天亮,等.基于K-means算法的Android权限检测机制研究[J].计算机应用研究,2018,35(4):1165-1168.
[7]SHAO S D,YU H Q,FAN G S.Detecting Malware by combining API and Permission Features[J].Computer Science,2017,44(4):135-139.(in Chinese)
邵舒迪,虞慧群,范贵生.基于权限和API特征结合的Android恶意软件检测方法[J].计算机科学,2017,44(4):135-139.
[8]YANG H,ZHANG Y Q,HU Y P,et al.A malware behavior detection system of Android applications based on multi-class features[J].Chinese Journal of Computers,2014,37(1):15-27.(in Chinese)
杨欢,张玉清,胡予濮,等.基于多类特征的Android应用恶意行为检测系统[J].计算机学报,2014,37(1):15-27.
[9]HOU S,SAAS A,YE Y,et al.DroidDelver:An Android Mal- ware Detection System Using Deep Belief Network Based on API Call Blocks[C]∥International Conference on Web-Age Information Management.Springer International Publishing,2016:54-66.
[10]HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
[11]YANG Z,YANG M.Leakminer:Detect information leakage on android with static taint analysis[C]∥2012 Third World Congress on Software Engineering (WCSE).IEEE,2012:101-104.
[12]GIBLER C,CRUSSELL J,ERICKSON J,et al. AndroidLeaks:Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale[C]∥Proceedings of International Conference on Trust and Trustworthy Computing.Heidelberg:Springer,2012:291-307.
[13]ARZT S,RASTHOFER S,FRITZ C,et al.Flowdroid:Precise context,flow,field,object-sensitive and lifecycle-aware taint analysis for android apps[J].Acm Sigplan Notices,2014,49(6):259-269.
[14]BODDEN E.Inter-procedural data-flow analysis with ifds/ide and soot[C]∥Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis.ACM,2012:3-8.
[15]OCTEAU D,MCDANIEL P,JHA S,et al.Effective inter- component communication mapping in android with epicc:An essential step towards holistic security analysis[C]∥Procee-dings of USENIX Security Symposium.Berkeley:USENIX Association,2013:543-558.
[16]SANTOS I,PENYA Y K,DEVESA J,et al.N-grams-based File Signatures for Malware Detection[C]∥Proceedings of the 2009 International Conference on Enterprise Information Systems (ICEIS).Heidelberg:Springer,2009:317-320.
[17]APKTOOL.A tool for reverse engineering Android apk files (Version 2.3.4)[EB/OL].https://ibotpeaches.github.io/Apktool.
[18]HINTON G.A practical guide to training restricted Boltzmann machines[J].Momentum,2010,9(1):926-947.
[19]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117.
[20]LIU X M.Anomaly Detection of Malicious Android Application based on K-nearest Neighbor[D].Beijing:Beijing Jiaotong University,2016.(in Chinese)
刘晓明.基于KNN算法的Android应用异常检测技术研究[D].北京:北京交通大学,2016.
[21]BARROS R C,BASGALUPP M P,RÉ C,et al.A Survey of Evo- lutionary Algorithms for Decision-Tree Induction[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(3):291-312.
[22]GU B,SHENG V S,TAY K Y,et al.Incremental Support Vector Learning for Ordinal Regression[J].IEEE Transactions on Neural Networks & Learning Systems,2015,26(7):1403-1416.
[1] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[2] 王毅, 李政浩, 陈星.
基于用户场景的Android 应用服务推荐方法
Recommendation of Android Application Services via User Scenarios
计算机科学, 2022, 49(6A): 267-271. https://doi.org/10.11896/jsjkx.210700123
[3] 张光华, 杨耀红, 张冬雯, 李军.
物联网中基于信任抗丢包攻击的安全路由机制
Secure Routing Mechanism Based on Trust Against Packet Dropping Attack in Internet of Things
计算机科学, 2019, 46(6): 153-161. https://doi.org/10.11896/j.issn.1002-137X.2019.06.023
[4] 龙星延, 屈丹, 张文林.
结合瓶颈特征的注意力声学模型
Attention Based Acoustics Model Combining Bottleneck Feature LONG Xing-yan QU Dan ZHANG Wen-lin
计算机科学, 2019, 46(1): 260-264. https://doi.org/10.11896/j.issn.1002-137X.2019.01.040
[5] 杨燕,蒋国平.
基于N-Gram的计算机病毒特征码自动提取的改进方法
Improved Method of Computer Virus Signature Automatic Extraction Based on N-Gram
计算机科学, 2017, 44(Z11): 338-341. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.071
[6] 叶益林,吴礼发,颜慧颖.
一种基于双层语义的Android原生库安全性检测方法
Two-layer Semantics-based Security Detection Approach for Android Native Libraries
计算机科学, 2017, 44(6): 161-167. https://doi.org/10.11896/j.issn.1002-137X.2017.06.027
[7] 邵舒迪,虞慧群,范贵生.
基于权限和API特征结合的Android恶意软件检测方法
Detecting Malware by Combining API and Permission Features
计算机科学, 2017, 44(4): 135-139. https://doi.org/10.11896/j.issn.1002-137X.2017.04.029
[8] 秦越,禹龙,田生伟,赵建国,冯冠军.
基于深度置信网络的维吾尔语人称代词待消解项识别
Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network
计算机科学, 2017, 44(10): 228-233. https://doi.org/10.11896/j.issn.1002-137X.2017.10.041
[9] 燕季薇,李明素,卢琼,严俊,高红雨.
基于Android平台的隐私泄漏静态检测工具的分析与比较
Analysis and Comparison of Privacy Leak Static Detection Tools for Android Applications
计算机科学, 2017, 44(10): 127-133. https://doi.org/10.11896/j.issn.1002-137X.2017.10.025
[10] 孙劲光,全纹敬.
基于耦合关系模型的文本分类研究
Research on Coupling Model of Text Classification
计算机科学, 2016, 43(8): 273-276. https://doi.org/10.11896/j.issn.1002-137X.2016.08.055
[11] 曾安,郑齐弥.
基于MIC的深度置信网络研究
Deep Belief Networks Research Based on Maximum Information Coefficient
计算机科学, 2016, 43(8): 249-253. https://doi.org/10.11896/j.issn.1002-137X.2016.08.050
[12] 张红斌,姬东鸿,尹兰,任亚峰.
基于梯度核特征及N-gram模型的商品图像句子标注
Product Image Sentence Annotation Based on Gradient Kernel Feature and N-gram Model
计算机科学, 2016, 43(5): 269-273. https://doi.org/10.11896/j.issn.1002-137X.2016.05.051
[13] 余一骄,刘芹.
大规模中文语料库检索技术研究
Key Retrieval Technologies in Large-scale Chinese Corpus
计算机科学, 2015, 42(2): 217-223. https://doi.org/10.11896/j.issn.1002-137X.2015.02.045
[14] 余一骄,刘芹.
面向超大规模的中文文本N-gram串统计
N-gram Chinese Characters Counting for Huge Text Corpora
计算机科学, 2014, 41(4): 263-268.
[15] 刘解放,赵斌,周宁.
基于有效载荷的多级实时入侵检测系统框架
Multilevel Real-time Payload-based Intrusion Detection System Framework
计算机科学, 2014, 41(4): 126-133.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!