Computer Science ›› 2019, Vol. 46 ›› Issue (5): 92-99.doi: 10.11896/j.issn.1002-137X.2019.05.014

Previous Articles     Next Articles

Malware Detection Algorithm for Improving Active Learning

LI Yi-hong, LIU Fang-zheng, DU Zhen-yu   

  1. (Electronic Countermenaure Institute,National University of Defense Technology,Hefei 230037,China)
  • Received:2018-04-26 Revised:2018-08-15 Published:2019-05-15

Abstract: The traditional malware detection technology relies on a large number of labeled samples.However,the number of marked labels is often less for the new malwares,so the traditional machine learning detection methods are difficult to get good detection results.Therefore,this paper proposed a malware detection algorithm based on active lear-ning.It contains a sample selection strategy based on Maximum Distance and a sample tagging strategy based on Minimum Risk Estimate,which can achieve better detection results with a small number of marked samples.Experimental results show that the proposed algorithm performs better than the overall detection method without active lear-ning,and the active learning effect is better when the number of labeled samples is 10% compared with the random selection strategy.Moreover,the algorithm has better temporal performance than the active learning strategy of artificial tagging strategy.

Key words: Active learning, Estimated risk, Features, Malware, Sample

CLC Number: 

  • TP393.08
[1]LIU J,SU P R,YANG M,et al.Software and Cyber Security-A Survey [J].Journal of Software,2018,29(1):42-68.(in Chinese)刘剑,苏璞睿,杨珉,等.软件与网络安全研究综述[J].软件学报,2018,29(1):42-68.
[2]TONG S,CHANG E.Support vector machine active learning for image retrieval[C]∥Proceedings of the 9th ACM International Conference on Multimedia.New York:ACM,2001:107-118.
[3]TONG S,KOLLER D.Support vector machine active learning with applications to text classiflcation[J].The Journal of Machine Learning Research,2002,2(1):999-1006.
[4]CHEN Y D,WANG T,CHEN H W.Combining Semi-Super-vised Learning and Active Learning for Shallow Semantic Parsing[J].Journal of Chinese Information Processing,2008,22(2):70-75.(in Chinese)陈耀东,王挺,陈火旺.半监督学习和主动学习相结合的浅层语义分析[J].中文信息学报,2008,22(2):70-75.
[5]JOACHIMS T.Transductive Inference for Text Classification using Support Vector Machines[C]∥Sixteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1999:200-209.
[6]SEUNG H S,OPPER M,SOMPOLINSKY H.Query By Committee[C]∥Proceedings of the 15th Annual ACM Workshop on Computational Learning Theory.California:ACM,1992:287-294.
[7]FREUND Y,SEUNG H S,SAMIR E,et al.Selective Sampling Using the Query By Committee Algorithm[J].Machine Lear-ning,1997,28(23):133-168.
[8]MAO W X,CAI Z M,TONG L.Malware Detection MethodBased on Active Learning [J].Journal of Software,2017,28(2):384-397.(in Chinese)毛蔚轩,蔡忠闽,童力.一种基于主动学习的恶意代码检测方法[J].软件学报,2017,28(2):384-397.
[9]MANKU G S,JAIN A,SARMA A D.Detecting near-duplicates for web crawling[C]∥Proceeding of the 16th International Conference on World Wide Web.USA:ACM Press,2007:141-149.
[10]ZHENG Y,WANG Y J,XUE Z.Android Malware Detection of Calls Tracing with Android Manifest and API[J].Journal of Computer Research and Development,2017(3):126-130.(in Chinese)郑尧,王轶骏,薛质.通过Android Manifest和API调用追踪的恶意检测[J].计算机技术与发展,2017(3):126-130.
[11]DUAN X Y.Research on the Malware Detection Based on Windows API Call Behavior[D].Chengdu:Southwest Jiaotong University,2016.(in Chinese)段晓云.基于Windows API调用行为的恶意软件检测研究[D].成都:西南交通大学,2016.
[12]ZHANG H J.Text Similarity Computing Based on HammingDistance[J].Computer Engineering and Applications,2001,37(19):21-22.(in Chinese)张焕炯.基于汉明距离的文本相似度计算[J].计算机工程与应用,2001,37(19):21-22.
[13]LIU D Y,QIU W J.Active Learning for Multi-label Classification Based on SVM’s Expect Margin[J].Computer Science,2011,38(4):230-232.(in Chinese)刘端阳,邱卫杰.基于SVM期望间隔的多标签分类的主动学习[J].计算机科学,2011,38(4):230-232.
[14]GOKHAN T,DILEK H,ROBERT E.Combining active andsemi-supervised learning for spoken language understanding.Speech Communication,2005,45(2):171-186.
[15]LI Z Y.A Automatic Detection Method of Malware Behavior Based on Sandbox[D].Wuhan:Huazhong University of Science and Technology,2015.(in Chinese)李志勇.基于沙箱技术的恶意代码行为自动化检测方法[D].武汉:华中科技大学,2015.
[1] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[2] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[3] WANG Can, LIU Yong-jian, XIE Qing, MA Yan-chun. Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization [J]. Computer Science, 2022, 49(8): 157-164.
[4] ZHOU Hui, SHI Hao-chen, TU Yao-feng, HUANG Sheng-jun. Robust Deep Neural Network Learning Based on Active Sampling [J]. Computer Science, 2022, 49(7): 164-169.
[5] SU Dan-ning, CAO Gui-tao, WANG Yan-nan, WANG Hong, REN He. Survey of Deep Learning for Radar Emitter Identification Based on Small Sample [J]. Computer Science, 2022, 49(7): 226-235.
[6] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[7] WANG Shan, XU Chu-yi, SHI Chun-xiang, ZHANG Ying. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM [J]. Computer Science, 2022, 49(6A): 675-679.
[8] SUN Fu-quan, CUI Zhi-qing, ZOU Peng, ZHANG Kun. Brain Tumor Segmentation Algorithm Based on Multi-scale Features [J]. Computer Science, 2022, 49(6A): 12-16.
[9] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[10] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[11] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12] ZHU Xu-dong, XIONG Yun. Study on Multi-label Image Classification Based on Sample Distribution Loss [J]. Computer Science, 2022, 49(6): 210-216.
[13] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[14] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[15] QU Zhong, CHEN Wen. Concrete Pavement Crack Detection Based on Dilated Convolution and Multi-features Fusion [J]. Computer Science, 2022, 49(3): 192-196.
Full text



No Suggested Reading articles found!