计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 240800062-6.doi: 10.11896/jsjkx.240800062

• 信息安全 • 上一篇    下一篇

基于指令流图特征的恶意文件的分类算法研究

邢昱阳, 王宝会   

  1. 北京航空航天大学软件学院 北京 100191
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 王宝会(wangbh@buaa.edu.cn)
  • 作者简介:15530312352@163.com

Research on Malware Classification Algorithm Based on Instruction Flow Graph

XING Yuyang, WANG Baohui   

  1. School of Software,Beihang University,Beijing 100191,China
  • Online:2025-11-15 Published:2025-11-10

摘要: 近年来,恶意代码愈加泛滥,数量和种类均呈快速增长趋势。因此,机器学习方法被广泛用于提升对恶意代码识别和分类的效率。聚焦恶意代码多分类任务,采用静态分析方法,结合反汇编、图构造、图论特征分析等技术,对恶意代码样本的原始文件进行特征提取。在传统的CFG特征和字节码特征的基础上,提出一种指令流程图(Instruction Flow Graph,IFG)特征。将IFG特征、CFG特征和字节码特征分别用于训练机器学习模型,并进行横向对比实验。从训练效果来看:相比CFG特征,采用IFG特征,模型精确率提高5%左右;相比字节码特征,采用IFG特征,模型精确率提高0.3%,模型训练时间缩短60%以上。

关键词: 恶意代码, 指令流程图, 分类, 机器学习

Abstract: In recent years,malicious codes have become increasingly rampant,with both the quantity and types showing a rapid growth trend.Therefore,machine learning methods have been widely introduced to improve the efficiency of malicious code identification and classification.This paper focuses on the multi-classification task of malicious codes,adopts static analysis methods,and combines technologies such as disassembly,graph construction,as well as graph theories to extract features from the original files of malicious code samples.Based on the traditional CFG features and bytecode features,the IFG feature is proposed.The IFG feature,CFG feature,and bytecode feature are respectively used to train machine learning models for a horizontal comparison experiment.From the training effect:Compared with the CFG feature,using the IFG feature,the model’saccuracy rate increases by about 5%;compared with the bytecode feature,using the IFG feature,the model’s accuracy rate increases by 0.3%,and the mo-del’s training time is shortened by more than 60%.

Key words: Malicious code, Instruction flow graph, Classification, Machine learning

中图分类号: 

  • TP309
[1]AV-TEST:The Independent IT-Security Institute[EB/OL].https://www.av-test.org/en/statistics/malware.
[2]BHATIA T,KAUSHAL R.Malware detection in android based on dynamic analysis[C]//2017 International Conference on Cyber Security and Protection of Digital Services(Cyber Security).IEEE,2017.
[3]JIANG H,TURKI T,WANG J T L.DLGraph:Malware detection using deep learning and graph embedding[C]//2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE,2018:1029-1033.
[4]YANG S,LI S,CHEN W,et al.A real-time and adaptive-lear-ning malware detection method based on API-pair graph[J].IEEE Access,2020,8:208120-208135.
[5]ABUSNAINA A,ABUHAMAD M,ALASMARY H,et al.Dl-fhmc:Deep learning-based fine-grained hierarchical learning approach for robust malware classification[J].IEEE Transactions on Dependable and Secure Computing,2021,19(5):3432-3447.
[6]AGUIRRE J,PAPO D,BULDÚ J M.Successful strategies forcompeting networks[J].Nature Physics,2013,9(4):230-234.
[7]GOYAL M,KUMAR R.Machine Learning for Malware Detection on Balanced and Imbalanced Datasets[C]//2020 International Conference on Decision Aid Sciences and Application(DASA).IEEE,2020:867-871.
[8]KONG Z,XUE J,WANG Y,et al.MalFSM:Feature Subset Selection Method for Malware Family Classification[J].Chinese Journal of Electronics,2023,32(1):26-38.
[9]WU Z,ZHANG J,KOU L.A Model for Malware DetectionMethod based on API call Sequence Clustering[C]//2022 9th International Conference on Dependable Systems and Their Applications(DSA).IEEE,2022:1049-1050.
[10]SRIRAM S,VINAYAKUMAR R,SOWMYA V,et al.Multi-scale learning based malware variant detection using spatial pyramid pooling network[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops(INFOCOM WKSHPS).IEEE,2020:740-745.
[11]ALAM M,AKRAM A,SAEED T,et al.DeepMalware:A Deep Learning based Malware Images Classification[C]//2021 International Conference on Cyber Warfare and Security(ICCWS).IEEE,2021:93-99.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!