Computer Science ›› 2016, Vol. 43 ›› Issue (7): 13-18.doi: 10.11896/j.issn.1002-137X.2016.07.002

Previous Articles     Next Articles

Review of Malware Detection Based on Data Mining

HUANG Hai-xin, ZHANG Lu and DENG Li   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Data mining is a method for automatically discovering data rule based on statistics which can analyze huge amounts of sample statistics to establish discriminative model,so that an attacker can not master the law to avoid detection.It has attracted widespread interests and has developed rapidly in recent years.In this paper,the research on malware detection based on data mining was summarized.The research results on feature extraction,feature selection,classification model and its performance evaluation methods were analyzed and compared in detail.At last,the challenges and prospect were provided in the field.

Key words: Data mining,Machine learning,Malware detection,Feature extraction,Feature selection

[1] Lee D H,Song I S,Kim K J,et al.A study on malicious codes pattern analysis using visualization[C]∥2011 International Conference on Information Science and Applications (ICISA).IEEE,2011:1-5
[2] Zhang Jia,Guan Yun-tao,Jiang Xiao-xin,et al.AMCAS:An Automatic Malicious Code Analysis System[C]∥Proc.of the 9th International Conference on Web-age Information Management.IEEE Press,2008:501-507
[3] Shabtai A,Moskovitch R,Elovici Y,et al.Detection of malicious code by applying machine learning classifiers on static features:A state-of-the-art survey[J].Information Security Technical Report,2009,14(1):16-29
[4] Kolter J Z,Maloof M A.Learning to detect and classify malicious executables in the wild[J].The Journal of Machine Lear-ning Research,2006,7:2721-2744
[5] Schultz M G,Eskin E,Zadok E,et al.Data mining methods for detection of new malicious executables[C]∥2001 IEEE Symposium on Security and Privacy,2001(S&P 2001).IEEE,2001:38-49
[6] 宋宗成.统计自然语言处理[M].北京:清华大学出版社,2013
[7] Abou-Assaleh T,Cercone N,Keselj V,et al.N-gram-based detection of new malicious code[C]∥Proceedings of the 28th Annual International Computer Software and Applications Con-ference,2004(COMPSAC 2004).IEEE,2004,2:41-42
[8] Shabtai A,Moskovitch R,Feher C,et al.Detecting unknownmalicious code by applying classification techniques on opcode patterns[J].Security Informatics,2012,1(1):1-22
[9] Karim M E,Walenstein A,Lakhotia A,et al.Malware phylogeny generation using permutations of code[J].Journal in Computer Virology,2005,1(1/2):13-23
[10] Bilar D.Opcodes as predictor for malware[J].InternationalJournal of Electronic Security and Digital Forensics,2007,1(2):156-168
[11] Moskovitch R,Feher C,Tzachar N,et al.Unknown malcode detection using OPCODE epresentation[M]∥Intelligence and Security Informatics.Springer Berlin Heidelberg,2008:204-215
[12] Moskovitch R,Stopel D,Feher C,et al.Unknown malcode detection via text categorization and the imbalance problem[C]∥IEEE International Conference on Intelligence and Security Informatics,2008(ISI 2008).IEEE,2008:156-161
[13] Lai Y.A feature selection for malicious detection[C]∥ Ninth ACIS International Conference on Software Engineering,Artificial Intelligence,Networking,and Parallel/Distributed Computing,2008(SNPD’08).IEEE,2008:365-370
[14] Ding Y,Yuan X,Tang K,et al.A fast malware detection algorithm based on objective-oriented association mining[J].Computers & Security,2013,39:315-324
[15] Wang Xin-yu,Du Xiao-ping,Xie Kun-qing.Research on Implementation of the FP-growth Algorithm[J].Computer Enginee-ring and Application,2004,40(9):174-176(in Chinese) 王新宇,杜孝平,谢昆青.FP-growth 算法的实现方法研究[J].计算机工程与应用,2004,40(9):174-176
[16] Zhao Z,Wang J,Wang C.An unknown malware detection sche-me based on the features of graph[J].Security and Communication Networks,2013,6(2):239-246
[17] Wang Yun-yun,Chen Song-can.A Survey of Evaluation and Design for AUC Based Classifier[J].Pattern Recognition and Artificial Intelligence,2011,24(1):64-71(in Chinese) 汪云云,陈松灿.基于AUC的分类器评价和设计综述[J].模式识别与人工智能,2011,24(1):64-71
[18] Komashinskiy D,Kotenko I.Malware detection by data mining techniques based on positionally dependent features[C]∥2010 18th Euromicro International Conference on Parallel,Distributed and Network-Based Processing (PDP).2010:617-623
[19] Brown P F,Desouza P V,Mercer R L,et al.Class-based n-gram models of natural language[J].Computational Linguistics,1992,18(4):467-479
[20] Cavnar W B,Trenkle J M.N-gram-based text categorization[J].Ann Arbor MI,1994,48113(2):161-175
[21] Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys (CSUR),2002,34(1):1-47
[22] Su J S,Zhang B F,Xu X.Advances in machine learning based text categorization[J].Journal of Software,2006,17(9):1848-1859(in Chinese) 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859
[23] Mitchell T M.机器学习[M].北京:机械工业出版社,2003
[24] Yang Y,Pedersen J O.A comparative study on feature selection in text categorization[C]∥ICML.1997,97:412-420
[25] Jiawei H,Kamber M.Data mining:concepts and techniques[M].San Francisco,CA,itd:Morgan Kaufmann,2001
[26] 李航.统计学习方法[M].北京:清华大学出版社,2012
[27] Cortes C,Vapnik V.Support-vector networks[J].Machine lear-ning,1995,20(3):273-297
[28] Kim Y H,Hahn S Y.Text filtering by boosting naive Bayesclassifiers[C]∥Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval.ACM,2000:168-175
[29] Salton G,Wong A.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620
[30] Zhang Xiao-kang.The Study of Malicious Code Detection Based on Data Mining and Machine Learning[D].Hefei:School of automation,University of Science and Technology of Chaina,2009(in Chinese) 张小康.基于数据挖掘和机器学习的恶意代码检测技术研究[D].合肥:中国科学技术大学自动化学院,2009
[31] Siddiqui M,Wang M C,Lee J.Data mining methods for malware detection using instruction sequences[C]∥Proceedings of Artificial Intelligence and Applications(AIA).2008
[32] Gu Ya-xiang,Ding Shi-fei.Advances of Support Vector Ma-chines[J].Computer Science,2011,38(2):14-17(in Chinese) 顾亚祥,丁世飞.支持向量机研究进展[J].计算机科学,2011,38(2):14-17
[33] Liang Dao-lei,Huang Guo-xing,Jin Jian.A New Multivariate Decision Tree Algorithm[J].Computer Science,2008,35(1):211-212(in Chinese) 梁道雷,黄国兴,金健.一种多变量决策树方法研究[J].计算机科学,2008,35(1):211-212
[34] Liu Jun-qiang,Sun Xiao-ying,Pan Yun-he.Survey on Association Rules Mining Technology[J].Computer Science,2004,31(1):40-47(in Chinese) 刘君强,孙晓莹,潘云鹤.关联规则挖掘技术研究的新进展[J].计算机科学,2004,31(1):40-47
[35] Hsu C W,Lin C J.A comparison of methods for multiclass support vector machines[J].IEEE Transactions on Neural Networks,2002,13(2):415-425
[36] Kotsiantis S B,Zaharakis I,Pintelas P.Supervised machinelearning:A review of classification techniques[J].Informatica,2009,3(3):249-268
[37] 国务院.中华人民共和国计算机信息系统安全保护条例[Z].1994
[38] 傅建明,彭国军,张焕国.计算机病毒分析与对抗[M].武汉:武汉大学出版社,2004
[39] Li Jia-jing,Liang Zhi-yin,Wei Tao,et al.A Malicious Behavior Analysis Method Based on Program Semantic[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2008,44(4):537-542(in Chinese) 李佳静,梁知音,丰韬,等.一种基于语义的恶意行为分析方法[J].北京大学学报:自然科学版,2008,44(4):537-542
[40] Santos I,Brezo F,Ugarte-Pedrero X,et al.Opcode sequences as representation of executables for data-mining-based unknown malware detection[J].Information Sciences,2013,231:64-82

No related articles found!
Full text



No Suggested Reading articles found!