计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 321-330.doi: 10.11896/jsjkx.250300056
杨一哲, 芦天亮, 彭舒凡, 李啸林
YANG Yizhe, LU Tianliang, PENG Shufan, LI Xiaolin
摘要: 基于API序列的恶意代码分析方法能够有效捕捉程序运行时的行为特征。然而,现有检测方法通常仅关注API名称,而忽略了参数以及返回值,或者难以充分挖掘它们的语义信息以及参数间的关联性,导致检测性能受限。为解决此问题,提出了一种结合系统化特征工程与深度神经网络架构的恶意代码检测方法。该方法针对API名称、参数及返回值的数据特性,对API序列实施结构化编码,继而通过多个RefConv卷积块来提取每个API调用的多尺度特征,最终将特征向量输入基于BiGRU-BiLSTM的并行循环神经网络,以学习API序列之间的长短期依赖关系。实验构建并开放了规模为2.5万的API序列数据集,在综合性能检测实验中,所提方法达到了93.55%的准确率;并通过时间概念漂移、空间概念漂移以及消融实验,验证了所提方法可以有效检测恶意代码。
中图分类号:
| [1]SonicWall.2024 Mid-Year Cyber Threat Report[EB/OL].(2024-08-30) [2024-12-01].https://www.sonicwall.com/resources/white-papers/mid-year-2024-sonicwall-cyber-threat-report. [2]AMER E,ZELINKA I.A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence[J].Computers & Security,2020,92:101760. [3]KAKISIM A G,GULMEZ S,SOGUKPINAR I.Sequential opcode embedding-based malware detection method[J].Computers &Electrical Engineering,2022,98:107703. [4]YAN J,YAN G,JIN D.Classifying malware represented as control flow graphs using deep graph convolutional neural network[C]//2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).IEEE,2019:52-63. [5]GOPINATH M,SETHURAMAN S C.A comprehensive survey on deeplearning based malware detection techniques[J].Computer Science Review,2023,47:100529. [6]DAMODARAN A,TROIA F D,VISAGGIO C A,et al.A comparison of static,dynamic,and hybrid analysis for malware detection[J].Journal of Computer Virology and Hacking Techniques,2017,13:1-12. [7]GAO Q Q,SHI Z B,QIN Y M,et al.Interpretable malicious code detection method based on API sequence[J].Computer Engineering and Design,2023,44(6):1642-1648. [8]CUI L,CUI J,JI Y,et al.Api2vec:Learning representations of api sequences for malware detection[C]//Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis.2023:261-273. [9]WANG P,TANG Z,WANG J.A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling[J].Computers & Security,2021,106:102273. [10]MANIRIHO P,MAHMOOD A N,CHOWDHURY M J M.API-MalDetect:Automated malware detection framework for windows based on API calls and deep learning techniques[J].Journal of Network and Computer Applications,2023,218:103704. [11]AHMED F,HAMEED H,SHAFIQ M Z,et al.Using spatio-temporal information in API calls with machine learning algorithms for malware detection[C]//Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence.2009:55-62. [12]CHEN X,HAO Z,LI L,et al.Cruparamer:Learning on parameter-augmented api sequences for malware detection[J].IEEE Transactions on Information Forensics and Security,2022,17:788-803. [13]LI C,CHENG Z,ZHU H,et al.DMalNet:Dynamic malwareanalysis based on API feature engineering and graph learning[J].Computers & Security,2022,122:102872. [14]ZHANG Z,QI P,WANG W.Dynamic malware analysis withfeature engineering and feature learning[C]//Proceedings of the AAAI Cconference on Artificial Intelligence.2020:1210-1217. [15]GUERRA-MANZANARES A,LUCKNER M,BAHSI H.Concept drift and cross-device behavior:Challenges and implications for effective android malware detection[J].Computers & Secu-rity,2022,120:102757. [16]CAI Z,DING X,SHEN Q,et al.Refconv:Re-parameterized refocusing convolution for powerful convnets[J].arXiv:2310.10563,2023. [17]TRINIUS P,WILLEMS C,HOLZ T,et al.A malware instruction set for behavior-based analysis[C]//Sicherheit 2010.Sicherheit,Schutz und Zuverlässigkeit.Gesellschaft für Informatik eV,2010:205-215. [18]QIAO Y,YANG Y,HE J,et al.CBM:free,automatic malware analysis framework using APIcall sequences[C]//Knowledge Engineering and Management:Proceedings of the Seventh International Conference on Intelligent Systems and Knowledge Engineering(ISKE 2012).Springer,2014:225-236. [19]YESIR S,SOĞUKPINAR I·.Malware detection and classification using fasttext and bert[C]//2021 9th International Symposium on Digital Forensics and Security(ISDFS).IEEE,2021:1-6. [20]WONG G W,HUANG Y T,GUO Y R,et al.Attention-based API locating for malware techniques[J].IEEE Transactions on Information Forensics and Security,2023,19:1199-1212. [21]CUI L,YIN J,CUI J,et al.API2Vec++:Boosting API Sequence Representation for Malware Detection and Classification[J].IEEE Transactions on Software Engineering,2024,50(8):2142-2162. [22]ZHOU B,HUANG H,XIA J,et al.A novel malware detection method based on API embedding and API parameters[J].The Journal of Supercomputing,2024,80(2):2748-2766. [23]CHEN T,ZENG H,LYU M,et al.CTIMD:Cyber threat intelligence enhanced malware detection using API call sequences with parameters[J].Computers & Security,2024,136:103518. [24]UPPAL D,SINHA R,MEHRA V,et al.Malware detection and classification based on extraction of API sequences[C]//2014 International Conference on Advances in Computing,Communications and Informatics(ICACCI).IEEE,2014:2337-2342. [25]PASCANU R,STOKES J W,SANOSSIAN H,et al.Malware classification with recurrent networks[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2015:1916-1920. [26]KOLOSNJAJI B,ZARRAS A,WEBSTER G,et al.Deep lear-ning for classification of malware systemcall sequences[C]//AI 2016:Advances in Artificial Intelligence:29th Australasian Joint Conference.Springer,2016:137-149. [27]DAI Y,LI H,QIAN Y,et al.SMASH:A malware detection method based on multi-feature ensemble learning[J].IEEE Access,2019,7:112588-112597. [28]SALEHI Z,SAMI A,GHIASI M.MAAR:Robust features to detect malicious activity based on API calls,their arguments and return values[J].Engineering Applications of Artificial Intelligence,2017,59:93-102. [29]CERDA P,VAROQUAUX G,KÉGL B.Similarity encoding for learning with dirtycategorical variables[J].Machine Learning,2018,107(8):1477-1494. [30]ZHU S,SHI J,YANG L,et al.Measuring and modeling the label dynamics of online {Anti-Malware} engines[C]//29th USENIX Security Symposium(USENIX Security 20).2020:2361-2378. [31]KÜCHLER A,MANTOVANI A,HAN Y,et al.Does everysecond count? time-based evolution of malware behavior in sandboxes[C]//NDSS 2021,Network and Distributed Systems Security Symposium.Internet Society,2021. [32]SEBASTIÁN S,CABALLERO J.Avclass2:Massive malwaretag extraction from av labels[C]//Proceedings of the 36th Annual Computer Security Applications Conference.2020:42-53. [33]JIANG Y,LI G,LI S.TagClass:A tool for extracting class-determined tags from massive malware labels via incremental parsing[C]//2023 53rd Annual IEEE/IFIP International Confe-rence on Dependable Systems and Networks(DSN).IEEE,2023:193-200. |
|
||