基于N-gram 的Android恶意检测

doi:10.11896/j.issn.1002-137X.2019.02.023

计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 145-151.doi: 10.11896/j.issn.1002-137X.2019.02.023

基于N-gram 的Android恶意检测

章宗美¹, 桂盛霖^1,2, 任飞²

电子科技大学计算机科学与工程学院成都611731¹
中国电子科技集团公司第三十研究所成都 610041²

收稿日期:2018-01-18 出版日期:2019-02-25 发布日期:2019-02-25
通讯作者: 章宗美(1993-),男,硕士生,主要研究方向为Android安全研究,E-mail:zach_41@163.com;桂盛霖(1983－),男,博士,副教授,CCF会员,主要研究方向为网络空间安全,E-mail:shenglin_gui@uestc.edu.cn
作者简介:任飞(1983-),男,硕士,高级工程师,主要研究方向为密码算法。基于N-gram的Android
基金资助:
本文受国家自然科学基金(61401067)资助。

Android Malware Detection Based on N-gram

ZHANG Zong-mei¹, GUI Sheng-lin^1,2, REN Fei²

School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China¹
The 30th Institute of China Electronics Technology Group Corporation,Chengdu 610041,China²

Received:2018-01-18 Online:2019-02-25 Published:2019-02-25

摘要/Abstract

摘要： 随着Android系统的广泛应用,Android平台下的恶意应用层出不穷,并且恶意应用躲避现有检测工具的手段也越来越复杂,亟需更有效的检测技术来分析恶意行为。文中提出并设计了一种基于N-gram的静态恶意检测模型,该模型通过逆向手段反编译Android APK文件,利用N-gram技术在字节码上提取特征,以此避免传统检测中专家知识的依赖。同时,该模型使用深度置信网络,能够快速而准确地学习训练。通过对1267个恶意样本和1200个善意样本进行测试,结果显示模型整体的检测准确率最高可以达到98.34%。实验进一步比较了该模型和其他算法的检测结果,并对比了相关工作的检测效果,结果表明该模型有更好的准确率和鲁棒性。

关键词: Android应用, N-gram, 恶意检测, 静态检测, 深度置信网络

Abstract: With the widespread use of Android operating system,malicious applications are constantly emerging on the Android platform,meanwhile,the means by which malicious applications evade existing detection tools are becoming increasingly complicated.In order to effectively analyze malicious behavior,more efficient detection technology is required.This paper presented and designed a static malicious detection model based on N-gram technology.The model decompiles Android APK files by reversing engineering and uses N-gram technology to extract features from bytecodes.In this way,the model avoids dependence on expert knowledge in traditional detection.At the same time,the model combines with deep belief network,which allows it to rapidly and accurately train and detect application samples.1267 malicious samples and 1200 benign samples were tested.The results show that the overall accuracy is up to 98.34%.Further more,the results of the model were compared with those of other machine learning algorithms,and the detection results of the related work were also compared.The results show that the model has better accuracy and robustness.

Key words: Android application, Deep belief network, Malware detection, N-gram, Static detection

中图分类号:

TP309.5

章宗美, 桂盛霖, 任飞. 基于N-gram 的Android恶意检测[J]. 计算机科学, 2019, 46(2): 145-151. https://doi.org/10.11896/j.issn.1002-137X.2019.02.023

ZHANG Zong-mei, GUI Sheng-lin, REN Fei. Android Malware Detection Based on N-gram[J]. Computer Science, 2019, 46(2): 145-151. https://doi.org/10.11896/j.issn.1002-137X.2019.02.023

参考文献

[1]TAM K,FEIZOLLAH A,ANUAR N B,et al.The evolution of android malware and android analysis techniques[J].ACM Computing Surveys (CSUR),2017,49(4):76.
[2]ZHOU Y,JIANG X.Dissecting android malware:Characterization and evolution[C]∥2012 IEEE Symposium on Security and Privacy (SP).IEEE,2012:95-109.
[3]QING S H.Research progress on Android security[J].Journal of Software,2016,27(1):45-71.(in Chinese)
卿斯汉.Android安全研究进展[J].软件学报,2016,27(1):45-71.
[4]DESNOS A,GUEGUEN G.Android:From reversing to decompilation[C]∥Proceedings of Black Hat Abu Dhabi.2011:77-101.
[5]LI T,DONG H,YUAN C Y,et al.Description of Android malware feature based on Dalvik instructions[J].Journal of Computer Research and Development,2014,51(7):1458-1466.(in Chinese)
李挺,董航,袁春阳,等.基于Dalvik指令的Android恶意代码特征描述及验证[J].计算机研究与发展,2014,51(7):1458-1466.
[6]HOU S,DU Y H,LU T L,et al.Research on Android permission detection mechanism based on K-means algorithm[J].Application Research of Computers,2018,35(4):1165-1168.(in Chinese)
侯苏,杜彦辉,芦天亮,等.基于K-means算法的Android权限检测机制研究[J].计算机应用研究,2018,35(4):1165-1168.
[7]SHAO S D,YU H Q,FAN G S.Detecting Malware by combining API and Permission Features[J].Computer Science,2017,44(4):135-139.(in Chinese)
邵舒迪,虞慧群,范贵生.基于权限和API特征结合的Android恶意软件检测方法[J].计算机科学,2017,44(4):135-139.
[8]YANG H,ZHANG Y Q,HU Y P,et al.A malware behavior detection system of Android applications based on multi-class features[J].Chinese Journal of Computers,2014,37(1):15-27.(in Chinese)
杨欢,张玉清,胡予濮,等.基于多类特征的Android应用恶意行为检测系统[J].计算机学报,2014,37(1):15-27.
[9]HOU S,SAAS A,YE Y,et al.DroidDelver:An Android Mal- ware Detection System Using Deep Belief Network Based on API Call Blocks[C]∥International Conference on Web-Age Information Management.Springer International Publishing,2016:54-66.
[10]HINTON G E,OSINDERO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18(7):1527-1554.
[11]YANG Z,YANG M.Leakminer:Detect information leakage on android with static taint analysis[C]∥2012 Third World Congress on Software Engineering (WCSE).IEEE,2012:101-104.
[12]GIBLER C,CRUSSELL J,ERICKSON J,et al. AndroidLeaks:Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale[C]∥Proceedings of International Conference on Trust and Trustworthy Computing.Heidelberg:Springer,2012:291-307.
[13]ARZT S,RASTHOFER S,FRITZ C,et al.Flowdroid:Precise context,flow,field,object-sensitive and lifecycle-aware taint analysis for android apps[J].Acm Sigplan Notices,2014,49(6):259-269.
[14]BODDEN E.Inter-procedural data-flow analysis with ifds/ide and soot[C]∥Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis.ACM,2012:3-8.
[15]OCTEAU D,MCDANIEL P,JHA S,et al.Effective inter- component communication mapping in android with epicc:An essential step towards holistic security analysis[C]∥Procee-dings of USENIX Security Symposium.Berkeley:USENIX Association,2013:543-558.
[16]SANTOS I,PENYA Y K,DEVESA J,et al.N-grams-based File Signatures for Malware Detection[C]∥Proceedings of the 2009 International Conference on Enterprise Information Systems (ICEIS).Heidelberg:Springer,2009:317-320.
[17]APKTOOL.A tool for reverse engineering Android apk files (Version 2.3.4)[EB/OL].https://ibotpeaches.github.io/Apktool.
[18]HINTON G.A practical guide to training restricted Boltzmann machines[J].Momentum,2010,9(1):926-947.
[19]SCHMIDHUBER J.Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117.
[20]LIU X M.Anomaly Detection of Malicious Android Application based on K-nearest Neighbor[D].Beijing:Beijing Jiaotong University,2016.(in Chinese)
刘晓明.基于KNN算法的Android应用异常检测技术研究[D].北京:北京交通大学,2016.
[21]BARROS R C,BASGALUPP M P,RÉ C,et al.A Survey of Evo- lutionary Algorithms for Decision-Tree Induction[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(3):291-312.
[22]GU B,SHENG V S,TAY K Y,et al.Incremental Support Vector Learning for Ordinal Regression[J].IEEE Transactions on Neural Networks & Learning Systems,2015,26(7):1403-1416.

相关文章 15

[1]	张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[2]	王毅, 李政浩, 陈星. 基于用户场景的Android 应用服务推荐方法 Recommendation of Android Application Services via User Scenarios 计算机科学, 2022, 49(6A): 267-271. https://doi.org/10.11896/jsjkx.210700123
[3]	张光华, 杨耀红, 张冬雯, 李军. 物联网中基于信任抗丢包攻击的安全路由机制 Secure Routing Mechanism Based on Trust Against Packet Dropping Attack in Internet of Things 计算机科学, 2019, 46(6): 153-161. https://doi.org/10.11896/j.issn.1002-137X.2019.06.023
[4]	龙星延, 屈丹, 张文林. 结合瓶颈特征的注意力声学模型 Attention Based Acoustics Model Combining Bottleneck Feature LONG Xing-yan QU Dan ZHANG Wen-lin 计算机科学, 2019, 46(1): 260-264. https://doi.org/10.11896／j.issn.1002-137X.2019.01.040
[5]	杨燕,蒋国平. 基于N-Gram的计算机病毒特征码自动提取的改进方法 Improved Method of Computer Virus Signature Automatic Extraction Based on N-Gram 计算机科学, 2017, 44(Z11): 338-341. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.071
[6]	叶益林,吴礼发,颜慧颖. 一种基于双层语义的Android原生库安全性检测方法 Two-layer Semantics-based Security Detection Approach for Android Native Libraries 计算机科学, 2017, 44(6): 161-167. https://doi.org/10.11896/j.issn.1002-137X.2017.06.027
[7]	邵舒迪,虞慧群,范贵生. 基于权限和API特征结合的Android恶意软件检测方法 Detecting Malware by Combining API and Permission Features 计算机科学, 2017, 44(4): 135-139. https://doi.org/10.11896/j.issn.1002-137X.2017.04.029
[8]	秦越,禹龙,田生伟,赵建国,冯冠军. 基于深度置信网络的维吾尔语人称代词待消解项识别 Anaphoricity Determination of Uyghur Personal Pronouns Based on Deep Belief Network 计算机科学, 2017, 44(10): 228-233. https://doi.org/10.11896/j.issn.1002-137X.2017.10.041
[9]	燕季薇,李明素,卢琼,严俊,高红雨. 基于Android平台的隐私泄漏静态检测工具的分析与比较 Analysis and Comparison of Privacy Leak Static Detection Tools for Android Applications 计算机科学, 2017, 44(10): 127-133. https://doi.org/10.11896/j.issn.1002-137X.2017.10.025
[10]	孙劲光,全纹敬. 基于耦合关系模型的文本分类研究 Research on Coupling Model of Text Classification 计算机科学, 2016, 43(8): 273-276. https://doi.org/10.11896/j.issn.1002-137X.2016.08.055
[11]	曾安,郑齐弥. 基于MIC的深度置信网络研究 Deep Belief Networks Research Based on Maximum Information Coefficient 计算机科学, 2016, 43(8): 249-253. https://doi.org/10.11896/j.issn.1002-137X.2016.08.050
[12]	张红斌,姬东鸿,尹兰,任亚峰. 基于梯度核特征及N-gram模型的商品图像句子标注 Product Image Sentence Annotation Based on Gradient Kernel Feature and N-gram Model 计算机科学, 2016, 43(5): 269-273. https://doi.org/10.11896/j.issn.1002-137X.2016.05.051
[13]	余一骄,刘芹. 大规模中文语料库检索技术研究 Key Retrieval Technologies in Large-scale Chinese Corpus 计算机科学, 2015, 42(2): 217-223. https://doi.org/10.11896/j.issn.1002-137X.2015.02.045
[14]	余一骄,刘芹. 面向超大规模的中文文本N-gram串统计 N-gram Chinese Characters Counting for Huge Text Corpora 计算机科学, 2014, 41(4): 263-268.
[15]	刘解放,赵斌,周宁. 基于有效载荷的多级实时入侵检测系统框架 Multilevel Real-time Payload-based Intrusion Detection System Framework 计算机科学, 2014, 41(4): 126-133.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于N-gram 的Android恶意检测

Android Malware Detection Based on N-gram

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0