计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 419-425.doi: 10.11896/jsjkx.250400070

• 信息安全 • 上一篇    下一篇

基于语义共现网络的攻击能力特征学习与聚合方法

李婧雯1, 张茹1, 刘功申2, 张童1   

  1. 1 北京邮电大学网络空间安全学院 北京 100876
    2 上海交通大学网络空间安全学院 上海 200030
  • 收稿日期:2025-04-15 修回日期:2025-06-25 发布日期:2026-05-08
  • 通讯作者: 张茹(zhangru@bupt.edu.cn)
  • 作者简介:(lijingwen310@bupt.edu.cn)
  • 基金资助:
    产业基础再造和制造业高质量发展专项(0747-2361SCCZA194)

Attack Capability Feature Learning and Aggregation Method Based on Semantic Co-occurrenceNetwork

LI Jingwen1, ZHANG Ru1, LIU Gongshen2, ZHANG Tong1   

  1. 1 School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2 School of Cyberspace Security, Shanghai Jiao Tong University, Shanghai 200030, China
  • Received:2025-04-15 Revised:2025-06-25 Online:2026-05-08
  • About author:LI Jingwen,born in 1997,postgraduate.Her main research interests include threat intelligence and APT detection.
    ZHANG Ru,Ph.D,professor,Ph.D supervisor.Her main research interests include content security and intelligent security analytics.
  • Supported by:
    Special Project for Industrial Foundation Reconstruction and High-quality Development of Manufacturing Industry(0747-2361SCCZA194).

摘要: 恶意样本的攻击能力特征能够反映攻击者的技术水平,是高级持续性威胁(APT)归因分析的重要线索。然而,APT攻击手段的持续演化容易引发特征漂移,导致传统基于固定特征工程的方法难以适应新的特征分布,削弱模型的泛化能力。为缓解该问题,提出一种面向攻击能力特征的鲁棒归因框架。首先,构建攻击能力特征语义共现网络,引入归纳式图神经网络联合学习特征的语义与共现关系,利用语义聚类压缩特征空间,生成更具稳定性的聚合特征表示;其次,设计特征归纳方法,实现对未知特征的语义泛化,并采用软投票机制集成多模型预测结果,提升归因分析的鲁棒性与泛化能力。在涵盖91个APT组织的样本集上的实验结果显示,模型的宏平均F1值和微平均F1值分别达到71.46%和81.15%,验证了所提方法的有效性。

关键词: 恶意样本, 攻击能力特征, 高级持续性威胁, 归因分析, 语义共现网络

Abstract: The attack capability features of malicious samples can reflect the technical tactics of attackers,which is an important clue for advanced persistent threat(APT) attribution analysis.However,the continuous evolution of APT attack techniques tends to cause feature drift,making traditional fixed-feature-engineering methods difficult to adapt to new feature distributions and thereby weakening the model’s generalization ability.To address this issue,a robust attribution framework for attack capability features is proposed.Firstly,a semantic co-occurrence network of attack capability features is constructed,and an inductive graph neural network is introduced to jointly learn the semantics and co-occurrence relationship of features.Semantic clustering is then used to compress the feature space to generate a more stable aggregated feature representation.Secondly,a feature induction method is designed to realize the semantic generalization of unknown features,and a soft voting mechanism is used to integrate the predictions of multiple models,thereby improving the robustness and generalization capability of the attribution analysis.Experiments on a malicious sample dataset containing 91 APT groups show this method achieves macro-average and micro-average F1 scores of 71.46% and 81.15%,respectively,demonstrating its effectiveness.

Key words: Malicious samples, Attack capability features, Advanced persistent threat, Attribution analysis, Semantic co-occurrence network

中图分类号: 

  • TP309
[1]SHENDEROVITZ G,NISSIM N.Bon-APT:Detection,attribution,and explainability of APT malware using temporal segmentation of API calls[J].Computers & Security,2024,142:103862.
[2]ZHANG Y,LIAO Z,ZHANG N,et al.Deep hashing for malware family classification and new malware identification[J].IEEE Internet of Things Journal,2024,11(16):26837-26851.
[3]LI S,TANG Z,LI H,et al.GMADV:An Android malware variant generation and classification adversarial training framework[J].Journal of Information Security and Applications,2024,84:103800.
[4]OU W,DING S,ZULKERNINE M,et al.VeriBin:a malwareauthorship verification approach for APT tracking through explainable and functionality-debiasing adversarial representation learning[J].ACM Transactions on Privacy and Security,2024,27(3):1-37.
[5]SUN Y,CHEN S,LIN S,et al.MGAP 3:Malware Group Attribution Based on PerceiverIO and Polytype Pre-Training[J].IEEE Transactions on Dependable and Secure Computing,2025,22(2):1024-1039.
[6]SAHA A,BLASCO J,CAVALLARO L,et al.ADAPT it! Automating APT Campaign and Group Attribution by Leveraging and Linking Heterogeneous Files[C]//Proceedings of the 27th International Symposium on Research in Attacks,Intrusions and Defenses(RAID).ACM,2024:114-129.
[7]WANG Z,ZHOU Y,LIU H,et al.ThreatInsight:InnovatingEarly Threat Detection Through Threat-Intelligence-Driven Analysis and Attribution[J].IEEE Transactions on Knowledge and Data Engineering,2025,36(12):9388-9402.
[8]VIRUSTOTAL[EB/OL].(2025-03-26)[2025-03-26].https://www.virustotal.com.
[9]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations(ICLR).OpenReview.net,2013.
[10]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).ACL,2014:1532-1543.
[11]JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics(EACL).ACL,2017:427-431.
[12]INDYK P,MOTWANI R.Approximate nearest neighbors:towards removing the curse of dimensionality[C]//Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing.1998:604-613.
[13]BRODER A Z.On the resemblance and containment of documents[C]//Proceedings.Compression and Complexity of SEQUENCES 1997(Cat.No.97TB100171).IEEE,1997:21-29.
[14]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1025-1035.
[15]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining(KDD).AAAI ,1996:226-231.
[16]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[17]KE G,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[C]//Advances in Neural Information Processing Systems(NeurIPS).Curran Associates Inc.,2017:30.
[18]FRIEDMAN J H,HASTIE T,TIBSHIRANI R.Regularization paths for generalized linear models via coordinate descent[J].Journal of statistical software,2010,33:1-22.
[19]CyberMonitor.APT_CyberCriminal_Campagin_Collections[EB/OL].https://github.com/ CyberMonitor/APT_CyberCriminal_Campagin_Collections/.
[20]HuggingFace.Multi-qa-mpnet-base-dot-v1[EB/OL].https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1.
[21]Molerats[EB/OL].https://attack.mitre.org/groups/G0021/.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!