Computer Science ›› 2025, Vol. 52 ›› Issue (12): 321-330.doi: 10.11896/jsjkx.250300056

• Information Security • Previous Articles     Next Articles

Malware Detection Based on API Sequence Feature Engineering and Feature Learning

YANG Yizhe, LU Tianliang, PENG Shufan, LI Xiaolin   

  1. College of Information Network Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2025-03-11 Revised:2025-06-04 Online:2025-12-15 Published:2025-12-09
  • About author:YANG Yizhe,born in 2001,postgra-duate,is a member of CCF(No.Z0786G).His main research interests include malware detection and so on.
    LU Tianliang,born in 1985,Ph.D,professor,Ph.D supervisor.His main research interests include cyber security and artificial intelligence.
  • Supported by:
    This work was supported by the Science and Technology Program of Ministry of Public Security(2023JSM09).

Abstract: API sequence-based malware analysis methods can effectively capture the behavioral characteristics of programs during runtime.However,existing detection approaches typically focus solely on API names while neglecting parameters and return values,or fail to adequately explore their semantic information and inter-parameter correlations,resulting in limited detection performance.To address this,this paper proposes a malware detection method combining systematic feature engineering with a deep neural network architecture.Specifically,the method implements structured encoding of API sequences based on the data characteristics of API names,parameters,and return values.Multiple RefConv convolutional blocks are then employed to extract multi-scale features for each API call.Finally,the feature vectors are fed into a parallel recurrent neural network based on BiGRU-BiLSTM to learn long-term and short-term dependencies within API sequences.Experiments conduct on a dataset containing 25 000 API sequences,this method achieves 93.55% accuracy in comprehensive performance tests.Validation through temporal concept drift,spatial concept drift,and ablation experiments demonstrates that the proposed method can effectively detect malware.

Key words: Malware detection, API sequence, Feature engineering, RefConv, BiGRU, BiLSTM

CLC Number: 

  • TP309
[1]SonicWall.2024 Mid-Year Cyber Threat Report[EB/OL].(2024-08-30) [2024-12-01].https://www.sonicwall.com/resources/white-papers/mid-year-2024-sonicwall-cyber-threat-report.
[2]AMER E,ZELINKA I.A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence[J].Computers & Security,2020,92:101760.
[3]KAKISIM A G,GULMEZ S,SOGUKPINAR I.Sequential opcode embedding-based malware detection method[J].Computers &Electrical Engineering,2022,98:107703.
[4]YAN J,YAN G,JIN D.Classifying malware represented as control flow graphs using deep graph convolutional neural network[C]//2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).IEEE,2019:52-63.
[5]GOPINATH M,SETHURAMAN S C.A comprehensive survey on deeplearning based malware detection techniques[J].Computer Science Review,2023,47:100529.
[6]DAMODARAN A,TROIA F D,VISAGGIO C A,et al.A comparison of static,dynamic,and hybrid analysis for malware detection[J].Journal of Computer Virology and Hacking Techniques,2017,13:1-12.
[7]GAO Q Q,SHI Z B,QIN Y M,et al.Interpretable malicious code detection method based on API sequence[J].Computer Engineering and Design,2023,44(6):1642-1648.
[8]CUI L,CUI J,JI Y,et al.Api2vec:Learning representations of api sequences for malware detection[C]//Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis.2023:261-273.
[9]WANG P,TANG Z,WANG J.A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling[J].Computers & Security,2021,106:102273.
[10]MANIRIHO P,MAHMOOD A N,CHOWDHURY M J M.API-MalDetect:Automated malware detection framework for windows based on API calls and deep learning techniques[J].Journal of Network and Computer Applications,2023,218:103704.
[11]AHMED F,HAMEED H,SHAFIQ M Z,et al.Using spatio-temporal information in API calls with machine learning algorithms for malware detection[C]//Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence.2009:55-62.
[12]CHEN X,HAO Z,LI L,et al.Cruparamer:Learning on parameter-augmented api sequences for malware detection[J].IEEE Transactions on Information Forensics and Security,2022,17:788-803.
[13]LI C,CHENG Z,ZHU H,et al.DMalNet:Dynamic malwareanalysis based on API feature engineering and graph learning[J].Computers & Security,2022,122:102872.
[14]ZHANG Z,QI P,WANG W.Dynamic malware analysis withfeature engineering and feature learning[C]//Proceedings of the AAAI Cconference on Artificial Intelligence.2020:1210-1217.
[15]GUERRA-MANZANARES A,LUCKNER M,BAHSI H.Concept drift and cross-device behavior:Challenges and implications for effective android malware detection[J].Computers & Secu-rity,2022,120:102757.
[16]CAI Z,DING X,SHEN Q,et al.Refconv:Re-parameterized refocusing convolution for powerful convnets[J].arXiv:2310.10563,2023.
[17]TRINIUS P,WILLEMS C,HOLZ T,et al.A malware instruction set for behavior-based analysis[C]//Sicherheit 2010.Sicherheit,Schutz und Zuverlässigkeit.Gesellschaft für Informatik eV,2010:205-215.
[18]QIAO Y,YANG Y,HE J,et al.CBM:free,automatic malware analysis framework using APIcall sequences[C]//Knowledge Engineering and Management:Proceedings of the Seventh International Conference on Intelligent Systems and Knowledge Engineering(ISKE 2012).Springer,2014:225-236.
[19]YESIR S,SOĞUKPINAR I·.Malware detection and classification using fasttext and bert[C]//2021 9th International Symposium on Digital Forensics and Security(ISDFS).IEEE,2021:1-6.
[20]WONG G W,HUANG Y T,GUO Y R,et al.Attention-based API locating for malware techniques[J].IEEE Transactions on Information Forensics and Security,2023,19:1199-1212.
[21]CUI L,YIN J,CUI J,et al.API2Vec++:Boosting API Sequence Representation for Malware Detection and Classification[J].IEEE Transactions on Software Engineering,2024,50(8):2142-2162.
[22]ZHOU B,HUANG H,XIA J,et al.A novel malware detection method based on API embedding and API parameters[J].The Journal of Supercomputing,2024,80(2):2748-2766.
[23]CHEN T,ZENG H,LYU M,et al.CTIMD:Cyber threat intelligence enhanced malware detection using API call sequences with parameters[J].Computers & Security,2024,136:103518.
[24]UPPAL D,SINHA R,MEHRA V,et al.Malware detection and classification based on extraction of API sequences[C]//2014 International Conference on Advances in Computing,Communications and Informatics(ICACCI).IEEE,2014:2337-2342.
[25]PASCANU R,STOKES J W,SANOSSIAN H,et al.Malware classification with recurrent networks[C]//2015 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2015:1916-1920.
[26]KOLOSNJAJI B,ZARRAS A,WEBSTER G,et al.Deep lear-ning for classification of malware systemcall sequences[C]//AI 2016:Advances in Artificial Intelligence:29th Australasian Joint Conference.Springer,2016:137-149.
[27]DAI Y,LI H,QIAN Y,et al.SMASH:A malware detection method based on multi-feature ensemble learning[J].IEEE Access,2019,7:112588-112597.
[28]SALEHI Z,SAMI A,GHIASI M.MAAR:Robust features to detect malicious activity based on API calls,their arguments and return values[J].Engineering Applications of Artificial Intelligence,2017,59:93-102.
[29]CERDA P,VAROQUAUX G,KÉGL B.Similarity encoding for learning with dirtycategorical variables[J].Machine Learning,2018,107(8):1477-1494.
[30]ZHU S,SHI J,YANG L,et al.Measuring and modeling the label dynamics of online {Anti-Malware} engines[C]//29th USENIX Security Symposium(USENIX Security 20).2020:2361-2378.
[31]KÜCHLER A,MANTOVANI A,HAN Y,et al.Does everysecond count? time-based evolution of malware behavior in sandboxes[C]//NDSS 2021,Network and Distributed Systems Security Symposium.Internet Society,2021.
[32]SEBASTIÁN S,CABALLERO J.Avclass2:Massive malwaretag extraction from av labels[C]//Proceedings of the 36th Annual Computer Security Applications Conference.2020:42-53.
[33]JIANG Y,LI G,LI S.TagClass:A tool for extracting class-determined tags from massive malware labels via incremental parsing[C]//2023 53rd Annual IEEE/IFIP International Confe-rence on Dependable Systems and Networks(DSN).IEEE,2023:193-200.
[1] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[2] LI Yingjian, WANG Yongsheng, LIU Xiaojun, REN Yuan. Cloud Platform Load Data Forecasting Method Based on Spatiotemporal Graph AttentionNetwork [J]. Computer Science, 2025, 52(6A): 240700178-8.
[3] HE Chunhui, GE Bin, ZHANG Chong, XU Hao. Intelligent Error Correction Model for Chinese Idioms Fused with Fixed-length Seq2Seq Network [J]. Computer Science, 2025, 52(5): 227-234.
[4] YUAN Mengjiao, LU Tianliang, HUANG Wanxin, HE Houhan. Benign-salient Region Based End-to-End Adversarial Malware Generation Method [J]. Computer Science, 2025, 52(10): 382-394.
[5] ZHAO Chenyang, LIU Lei, JIANG He. Feature Construction for Effort-aware Just-In-Time Software Defect Prediction Based on Multi-objective Optimization [J]. Computer Science, 2025, 52(1): 232-241.
[6] CHEN Liang, SUN Cong. Deep-learning Based DKOM Attack Detection for Linux System [J]. Computer Science, 2024, 51(9): 383-392.
[7] CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[8] LI Minzhe, YIN Jibin. TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement [J]. Computer Science, 2024, 51(6A): 230900030-6.
[9] LI Jinxia, BIAN Huaxing, WEN Fuguo, HU Tianmu, QIN Shihan, WU Han, MA Hui. Performance Risk Prediction of Power Grid Material Suppliers Based on XGBoost [J]. Computer Science, 2024, 51(6A): 230400115-9.
[10] WANG Xiaolong, WANG Yanhui, ZHANG Shunxiang, WANG Caiqin, ZHOU Yuhao. Gender Discrimination Speech Detection Model Fusing Post Attributes [J]. Computer Science, 2024, 51(6): 338-345.
[11] HE Jiaojun, CAI Manchun, LU Tianliang. Android Malware Detection Method Based on GCN and BiLSTM [J]. Computer Science, 2024, 51(4): 388-395.
[12] WANG Yuhan, MA Fuyuan, WANG Ying. Construction of Fine-grained Medical Knowledge Graph Based on Deep Learning [J]. Computer Science, 2024, 51(11A): 230900157-7.
[13] XIANG Heng, YANG Mingyou, LI Meng. Study on Named Entity Recognition of NOTAM Based on BiLSTM-CRF [J]. Computer Science, 2024, 51(11A): 240300148-6.
[14] CAO Weikang, LIN Honggang. IoT Devices Identification Method Based on Weighted Feature Fusion [J]. Computer Science, 2024, 51(11A): 240100137-9.
[15] QIN Zhongpiao, ZHOU Yatong, LI Zhe. Bank Transaction Fraud Detection Method Based on Graph Neural Network [J]. Computer Science, 2024, 51(11A): 240200024-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!