Computer Science ›› 2026, Vol. 53 ›› Issue (5): 419-425.doi: 10.11896/jsjkx.250400070

• Information Security • Previous Articles     Next Articles

Attack Capability Feature Learning and Aggregation Method Based on Semantic Co-occurrenceNetwork

LI Jingwen1, ZHANG Ru1, LIU Gongshen2, ZHANG Tong1   

  1. 1 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2 School of Cyberspace Security, Shanghai Jiao Tong University, Shanghai 200030, China
  • Received:2025-04-15 Revised:2025-06-25 Published:2026-05-08
  • About author:LI Jingwen,born in 1997,postgraduate.Her main research interests include threat intelligence and APT detection.
    ZHANG Ru,Ph.D,professor,Ph.D supervisor.Her main research interests include content security and intelligent security analytics.
  • Supported by:
    Special Project for Industrial Foundation Reconstruction and High-quality Development of Manufacturing Industry(0747-2361SCCZA194).

Abstract: The attack capability features of malicious samples can reflect the technical tactics of attackers,which is an important clue for advanced persistent threat(APT) attribution analysis.However,the continuous evolution of APT attack techniques tends to cause feature drift,making traditional fixed-feature-engineering methods difficult to adapt to new feature distributions and thereby weakening the model’s generalization ability.To address this issue,a robust attribution framework for attack capability features is proposed.Firstly,a semantic co-occurrence network of attack capability features is constructed,and an inductive graph neural network is introduced to jointly learn the semantics and co-occurrence relationship of features.Semantic clustering is then used to compress the feature space to generate a more stable aggregated feature representation.Secondly,a feature induction method is designed to realize the semantic generalization of unknown features,and a soft voting mechanism is used to integrate the predictions of multiple models,thereby improving the robustness and generalization capability of the attribution analysis.Experiments on a malicious sample dataset containing 91 APT groups show this method achieves macro-average and micro-average F1 scores of 71.46% and 81.15%,respectively,demonstrating its effectiveness.

Key words: Malicious samples, Attack capability features, Advanced persistent threat, Attribution analysis, Semantic co-occurrence network

CLC Number: 

  • TP309
[1]SHENDEROVITZ G,NISSIM N.Bon-APT:Detection,attribution,and explainability of APT malware using temporal segmentation of API calls[J].Computers & Security,2024,142:103862.
[2]ZHANG Y,LIAO Z,ZHANG N,et al.Deep hashing for malware family classification and new malware identification[J].IEEE Internet of Things Journal,2024,11(16):26837-26851.
[3]LI S,TANG Z,LI H,et al.GMADV:An Android malware variant generation and classification adversarial training framework[J].Journal of Information Security and Applications,2024,84:103800.
[4]OU W,DING S,ZULKERNINE M,et al.VeriBin:a malwareauthorship verification approach for APT tracking through explainable and functionality-debiasing adversarial representation learning[J].ACM Transactions on Privacy and Security,2024,27(3):1-37.
[5]SUN Y,CHEN S,LIN S,et al.MGAP 3:Malware Group Attribution Based on PerceiverIO and Polytype Pre-Training[J].IEEE Transactions on Dependable and Secure Computing,2025,22(2):1024-1039.
[6]SAHA A,BLASCO J,CAVALLARO L,et al.ADAPT it! Automating APT Campaign and Group Attribution by Leveraging and Linking Heterogeneous Files[C]//Proceedings of the 27th International Symposium on Research in Attacks,Intrusions and Defenses(RAID).ACM,2024:114-129.
[7]WANG Z,ZHOU Y,LIU H,et al.ThreatInsight:InnovatingEarly Threat Detection Through Threat-Intelligence-Driven Analysis and Attribution[J].IEEE Transactions on Knowledge and Data Engineering,2025,36(12):9388-9402.
[8]VIRUSTOTAL[EB/OL].(2025-03-26)[2025-03-26].https://www.virustotal.com.
[9]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations(ICLR).OpenReview.net,2013.
[10]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).ACL,2014:1532-1543.
[11]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics(EACL).ACL,2017:427-431.
[12]INDYK P,MOTWANI R.Approximate nearest neighbors:towards removing the curse of dimensionality[C]//Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing.1998:604-613.
[13]BRODER A Z.On the resemblance and containment of documents[C]//Proceedings.Compression and Complexity of SEQUENCES 1997(Cat.No.97TB100171).IEEE,1997:21-29.
[14]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1025-1035.
[15]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining(KDD).AAAI ,1996:226-231.
[16]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[17]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]//Advances in Neural Information Processing Systems(NeurIPS).Curran Associates Inc.,2017:30.
[18]FRIEDMAN J H,HASTIE T,TIBSHIRANI R.Regularization paths for generalized linear models via coordinate descent[J].Journal of statistical software,2010,33:1-22.
[19]CyberMonitor.APT_CyberCriminal_Campagin_Collections[EB/OL].https://github.com/ CyberMonitor/APT_CyberCriminal_Campagin_Collections/.
[20]HuggingFace.Multi-qa-mpnet-base-dot-v1[EB/OL].https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1.
[21]Molerats[EB/OL].https://attack.mitre.org/groups/G0021/.
[1] SUN Hongbin, WANG Su, WANG Zhiliang, JIANG Zheyu, YANG Jiahai, ZHANG Hui. Augmenter:Event-level Intrusion Detection Based on Data Provenance Graph [J]. Computer Science, 2025, 52(2): 344-352.
[2] ZHANG Yuxiang, HAN Jiujiang, LIU Jian, XIAN Ming, ZHANG Hongjiang, CHEN Yu, LI Ziyuan. Network Advanced Threat Detection System Based on Event Sequence Correlation Under ATT&CK Framework [J]. Computer Science, 2023, 50(6A): 220600176-7.
[3] LIU Jie-ling, LING Xiao-bo, ZHANG Lei, WANG Bo, WANG Zhi-liang, LI Zi-mu, ZHANG Hui, YANG Jia-hai, WU Cheng-nan. Network Security Risk Assessment Framework Based on Tactical Correlation [J]. Computer Science, 2022, 49(9): 306-311.
[4] LIU Hai-bo,WU Tian-bo,SHEN Jing,SHI Chang-ting. Advanced Persistent Threat Detection Based on Generative Adversarial Networks and Long Short-term Memory [J]. Computer Science, 2020, 47(1): 281-286.
[5] ZHANG Hao, WANG Li-na, TAN Cheng and LIU Wei-jie. Review of Defense Methods Against Advanced Persistent Threat in Cloud Environment [J]. Computer Science, 2016, 43(3): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!