Computer Science ›› 2021, Vol. 48 ›› Issue (1): 280-286.doi: 10.11896/jsjkx.200900099

• Information Security • Previous Articles     Next Articles

Malicious Code Family Detection Method Based on Knowledge Distillation

WANG Run-zheng1, GAO Jian1,2, HUANG Shu-hua1,2, TONG Xin1   

  1. 1 College of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China
    2 Key Laboratory of Safety Precautions and Risk Assessment,Beijing 102623,China
  • Received:2020-09-13 Revised:2020-11-10 Online:2021-01-15 Published:2021-01-15
  • About author:WANG Run-zheng,born in 1996,postgraduate,is a member of China Computer Federation.His main research interests include cyber security and malware.
    GAO Jian,born in 1982,Ph.D.His main research interests include cyber security,malware and botnet.
  • Supported by:
    National Key R&D Program of China,Key Program of the National Social Science Foundation of China(20AZD114),2020 Special Project of Science and Technology Strengthening Police of Ministry of Public Security(2020GABJC01) and Basic Scien-tific Research Operating Expenses of the People's Public Security University of China(2019JKF218).

Abstract: In recent years,the variety of malicious code emerges in an endless stream,and malware is more covert and persistent.It is urgent to identify malicious samples by rapid and effective detection methods.Aiming at the present situation,a method of malicious code family detection based on knowledge distillation is proposed.The model decompiles malicious samples in reverse and transforms binary text into images by malicious code visualization technology,so as to avoid dependence on traditional feature engineering.In the teacher network model,residual network is used to extract the deep-seated features of image texture,and channel domain attention mechanism is introduced to extract the key information from the image according to the change of channel weight.In order to speed up the identification efficiency of the samples to be tested and solve the problems of large parameters and serious consumption of computing resources based on deep neural network detection model,the teacher network model is used to guide the training of the student network model.The results show that the student network maintains the detection effect of malicious code family on the basis of reducing the complexity of the model.It is conducive to the detection of batch samples and the deployment of mobile terminal.

Key words: Malicious family, Knowledge distillation, Attention mechanism, Residual network

CLC Number: 

  • TP309
[1] CHEN J J,PENG B Z,WU P Z.Method for detecting malicious code based on dynamic behavior and machine learning[J/OL].Computer Engineering.[2020-06-06].
[2] ZHAO C R,ZHANG W J,FANG Y,et al.Malware detection based on semantic API dependency graph[J].Journal of Sichuan University(Natural Science Edition),2020,57(3):488-494.
[3] MOHANASRUTHI V,CHAKRABORTY A,THANUDAS B,et al.An Efficient Malware Detection Technique using Complex Network-based Approach[C]//2020 National Conference on Communications (NCC).2020.
[4] NARAYANAN B N,DAVULURU V S P.Ensemble Malware Classification System using Deep Neural Networks[J].Electro-nics,2020,9(5):721.
[5] HU J W,CHE X,ZHOU M,et al.Incremental clustering me-thod based on Gaussian mixture model to identify malware family[J].Journal on Communications,2019,40(6):148-159.
[6] ZENG Y Q,ZHANG L L,ZHANG R N,et al.Malware Family Classification Model Based on MobileNet[J].Computer Engineering,2020,46(4):162-168.
[7] SUN B W,ZHANG P,CHENG M Y,et al.Malware detection method based on enhanced code images[J].Journal of Tsinghua University(Science and Technology),2020,60(5):386-392.
[8] VASAN D,ALAZAB M,WASSAN S,et al.Image-Based Malware Classification using Ensemble of CNN Architectures(IMCEC)[J].Computers & Security,2020,92:101748.
[9] GHOUTI L.Malware Classification Using Compact Image Features and Multiclass Support Vector Machines[J].IET Information Security,2020,14(4):419-429.
[10] JAIN M,ANDREOPOULOS W,STAMP M.Convolutionalneural networks and extreme learning machines for malware classification[J].Journal of Computer Virology and Hacking Techniques,2020,16(3):229-244.
[11] REN Z J,CHEN G,LU W K.Malware visualization methods based on deep convolution neural networks[J].Multimedia Tools and Applications,2020,79(3):1-19.
[12] COHEN A,NISSIM N,ELOVICI Y.MalJPEG:Machine Learning Based Solution for the Detection of Malicious JPEG Images[J].IEEE Access,2020,8:19997-20011.
[13] AZAB A,KHASAWNEH M.MSIC:Malware Spectrogram Image Classification[J].IEEE Access,2020,8:102007-102021.
[14] CHEN J,JIA X,ZHAO C,et al.Using the Rgb Image of Machine Code to Classify the Malware[C]// 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA).IEEE,2020.
[15] HINTON G,VINYALS O,DEAN J.Distilling the Knowledgein a Neural Network[J].Computer Ence,2015,14(7):38-39.
[16] FURLANELLO T,LIPTON Z C,TSCHANNEN M,et al.Born again neural networks [C]// International Conference on Machine Learning.2018:1607-1616.
[17] GAO M Y,SHEN Y J,LI Q Q,et al.Residual knowledge distillation [EB/OL].(2020-02-21)[2020-07-04].
[18] NATARAJ L,KARTHIKEYAN S,JACOB G,et al.Malwareimages:visualization and automatic classification[C]//Procee-dings of the 8th International Symposium on Visualization for Cyber Security.New York,USA:ACM Press,2011:1-7.
[19] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[20] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2023.
[21] RONEN R,RADU M,FEUERSTEIN C,et al.Microsoft Mal-ware Classification Challenge[EB/OL].[2018-02-22].
[1] ZHAO Jia-qi, WANG Han-zheng, ZHOU Yong, ZHANG Di, ZHOU Zi-yuan. Remote Sensing Image Description Generation Method Based on Attention and Multi-scale Feature Enhancement [J]. Computer Science, 2021, 48(1): 190-196.
[2] LIU Yang, JIN Zhong. Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism [J]. Computer Science, 2021, 48(1): 197-203.
[3] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[4] PAN Zu-jiang, LIU Ning, ZHANG Wei, WANG Jian-yong. MTHAM:Multitask Disease Progression Modeling Based on Hierarchical Attention Mechanism [J]. Computer Science, 2020, 47(9): 185-189.
[5] ZHAO Wei, LIN Yu-ming, WANG Chao-qiang, CAI Guo-yong. Opinion Word-pairs Collaborative Extraction Based on Dependency Relation Analysis [J]. Computer Science, 2020, 47(8): 164-170.
[6] YUAN Ye, HE Xiao-ge, ZHU Ding-kun, WANG Fu-lee, XIE Hao-ran, WANG Jun, WEI Ming-qiang, GUO Yan-wen. Survey of Visual Image Saliency Detection [J]. Computer Science, 2020, 47(7): 84-91.
[7] LIU Yan, WEN Jing. Complex Scene Text Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(7): 135-140.
[8] YU Yi-lin, TIAN Hong-tao, GAO Jian-wei and WAN Huai-yu. Relation Extraction Method Combining Encyclopedia Knowledge and Sentence Semantic Features [J]. Computer Science, 2020, 47(6A): 40-44.
[9] SONG Ya-fei, CHEN Yu-zhang, SHEN Jun-feng and ZENG Zhang-fan. Underwater Image Reconstruction Based on Improved Residual Network [J]. Computer Science, 2020, 47(6A): 500-504.
[10] NI Hai-qing, LIU Dan, SHI Meng-yu. Chinese Short Text Summarization Generation Model Based on Semantic-aware [J]. Computer Science, 2020, 47(6): 74-78.
[11] HUANG Yong-tao, YAN Hua. Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion [J]. Computer Science, 2020, 47(6): 133-137.
[12] ZHU Wei, WANG Tu-qiang, CHEN Yue-feng, HE De-feng. Object-level Edge Detection Algorithm Based on Multi-scale Residual Network [J]. Computer Science, 2020, 47(6): 144-150.
[13] ZHANG Zhi-yang, ZHANG Feng-li, CHEN Xue-qin, WANG Rui-jin. Information Cascade Prediction Model Based on Hierarchical Attention [J]. Computer Science, 2020, 47(6): 201-209.
[14] QIAO Meng-yu, WANG Peng, WU Jiao, ZHANG Kuan. Lightweight Convolutional Neural Networks for Land Battle Target Recognition [J]. Computer Science, 2020, 47(5): 161-165.
[15] DENG Yi-jiao, ZHANG Feng-li, CHEN Xue-qin, AI Qing, YU Su-zhe. Collaborative Attention Network Model for Cross-modal Retrieval [J]. Computer Science, 2020, 47(4): 54-59.
Full text



[1] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[2] HAN Kui-kui, XIE Zai-peng and LV Xin. Fog Computing Task Scheduling Strategy Based on Improved Genetic Algorithm[J]. Computer Science, 2018, 45(4): 137 -142 .
[3] ZHAN Yun-jiao, WEI Ou and HU Jun. Formal Description of Requirement of Slats and Flaps Control System for DO-178C Case[J]. Computer Science, 2018, 45(4): 196 -202 .
[4] ZHU Hong, LI Qian-mu and LI De-qiang. Facial Multi-landmarks Localization Based on Single Convolution Neural Network[J]. Computer Science, 2018, 45(4): 273 -277 .
[5] HUANG Dong-mei, DU Yan-ling, HE Qi, SUI Hong-yun, LI Yao. Marine Monitoring Data Replica Layout Strategy Based on Multiple Attribute Optimization[J]. Computer Science, 2018, 45(6): 72 -75 .
[6] LI Hang, ZANG Lie, GAN Lu. Search of Speculative Symbolic Execution Path Based on Ant Colony Algorithm[J]. Computer Science, 2018, 45(6): 145 -150 .
[7] ZHOU Yang, XU Qing, LUO Xiang-yang, LIU Fen-lin, ZHANG Long and HU Xiao-fei. Research on Definition and Technological System of Cyberspace Surveying and Mapping[J]. Computer Science, 2018, 45(5): 1 -4 .
[8] XU Feng-sheng, YU Xiu-qing and SHI Kai-quan. P-data Model and Intelligent Acquisition of P-data[J]. Computer Science, 2018, 45(5): 176 -179 .
[9] WANG Ying and YANG Yu-wang. KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence[J]. Computer Science, 2018, 45(5): 196 -200 .
[10] LI Xiao-xin, WU Ke-song, QI Pan-pan, ZHOU Xuan and LIU Zhi-yong. Local Sphere Normalization Embedding:An Improved Scheme for PCANet[J]. Computer Science, 2018, 45(5): 238 -242 .