计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 40-48.doi: 10.11896/jsjkx.241100118
周昱辰1, 李鹏1,2, 韩科技1,2
ZHOU Yuchen1, LI Peng1,2, HAN Keji1,2
摘要: 恶意软件检测与分类面临复杂性和隐蔽性的挑战。图神经网络(Graph Neural Networks,GNNs)虽能有效建模控制流图,提升行为模式捕捉精度,但其“黑盒”特性限制了可解释性。此外,现有方法依赖大量标注数据,泛化能力较弱,难以应对新型恶意软件。大型语言模型(Large Language Models,LLMs)具备强大的特征提取和上下文理解能力,能够有效处理少样本数据,实现多模态信息融合,从而增强分析精度与泛化性。受大型语言模型的启发,结合对比学习策略,同时学习控制流图的结构和汇编指令,以提高恶意软件分析的效果和灵活性。基于此,设计了Instruct-Malware框架。该框架采用轻量级图-文本对齐投影,通过双阶段指令优化,显著增强了恶意软件分析的灵活性和鲁棒性;此外,提升了模型的解释能力,透明化了决策过程。实验结果表明,所提出的框架在恶意软件分类和子图识别任务中展现了显著的性能提升,超越了现有的主流方法,并大幅缩小了与专业模型之间的差距,为构建高效且可靠的恶意软件分析系统提供了新的思路。
中图分类号:
| [1]Av-Test.Malware Statistics & Trends Report by Av-Test[EB/OL].https://www.av-test.org/en/statistics/malware. [2]SHARMA O,SHARMA A,KALIA A.Windows and IoT Malware Visualization and Classification with Deep CNN and Xception CNN Using Markov Images[J].Journal of Intelligent Information Systems,2023,60(2):349-375. [3]ALZUBI O A,ALZUBI J A,ALZUBI T M,et al.QuantumMayfly Optimization with Encoder-Decoder Driven LSTM Networks for Malware Detection and Classification Model[J].Mobile Networks and Applications,2023,28(2):795-807. [4]XIAO G Q,LI X Q,CHEN Y D,et al.A Review of Large-Scale Graph Neural Networks[J].Journal of Computer Science,2024,47(1):148-171. [5]YAN J,YAN G,JIN D.Classifying Malware Represented asControl Flow Graphs Using Deep Graph Convolutional Neural Network[C]//2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).2019:52-63. [6]WU B,XU Y,ZOU F.Malware Classification by Learning Semantic and Structural Features of Control Flow Graphs[C]//2021 IEEE 20th International Conference on Trust,Security and Privacy in Computing and Communications.2021:540-547. [7]YING Z,BOURGEOIS D,YOU J,et al.Gnnexplainer:Generating Explanations for Graph Neural Networks[C]//Advances in Neural Information Processing Systems.2019:9240-9251. [8]YUAN H,YU H,WANG J,et al.On Explainability of GraphNeural Networks Via Subgraph Explorations[C]//International Conference on Machine Learning.2021:12241-12252. [9]HERATH J D,WAKODIKAR P P,YANG P,et al.Cfgexplainer:Explaining Graph Neural Network-Based Malware Classification From Control Flow Graphs[C]//2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).2022:172-184. [10]LUO D,CHENG W,XU D,et al.Parameterized Explainer for Graph Neural Network[C]//Advances in Neural Information Processing Systems.2020:19620-19631. [11]ZENG A,LIU X,DU Z,et al.Glm-130B:An Open Bilingual Pre-Trained Model[C]//Proceedings of the International Conference on Learning Representations.2023:1-56. [12]Openai.Chatgpt:A Language Model for Conversational AI[EB/OL].https://chatgpt.com/. [13]LIU H,LI C,WU Q,et al.Visual Instruction Tuning[C]//Advances in Neural Information Processing Systems.2024:34892-34916. [14]ZHU D,CHEN J,SHEN X,et al.Minigpt-4:Enhancing Vision-Language Understanding with Advanced Large Language Mo-dels[C]//Proceedings of the International Conference on Lear-ning Representations.2024:1-17. [15]YE Q,XU H,XU G,et al.Mplug-Owl:Modularization Empowers Large Language Models with Multimodality[J].arXiv:2304.14178,2023. [16]WEN Z,FANG Y.Prompt Tuning On Graph-Augmented Low-Resource Text Classification[J].IEEE Transactions on Know-ledge and Data Engineering,2024,36(12):9080-9095. [17]ZHANG H,LI X,BING L.Video-Llama:An Instruction-Tuned Audio-Visual Language Model for Video Understanding[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.2023:543-553. [18]TANG J,YANG Y,WEI W,et al.Graphgpt:Graph Instruction Tuning for Large Language Models[C]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval.2024:491-500. [19]WANG Y,KORDI Y,MISHRA S,et al.Self-Instruct:Aligning Language Models with Self-Generated Instructions[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:13484-13508. [20]VASWANI A.Attention is All You Need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [21]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models From Natural Language Supervision[C]//International Conference on Machine Learning.2021:8748-8763. [22]WEN Z,FANG Y.Augmenting Low-Resource Text Classifica-tion with Graph-Grounded Pre-Training and Prompting[C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.2023:506-516. [23]LIU S,NIE W,WANG C,et al.Multi-Modal Molecule Structure-Text Model for Text-Based Retrieval and Editing[J].Nature Machine Intelligence,2023,5(12):1447-1457. [24]LU Y,PENG J,ZHU Y,et al.Pre-Training Molecular GraphRepresentations with Motif-Enhanced Message Passing[C]//2024 International Joint Conference on Neural Networks(IJCNN).2024:1-8. [25]OORD A V D,LI Y,VINYALS O.Representation Learningwith Contrastive Predictive Coding[J].arXiv:1807.03748,2018. [26]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-Rank Adaptation of Large Language Models[C]//Proceedings of the International Conference on Learning Representations.2022:1-13. [27]HAMILTON W,YING Z,LESKOVEC J.Inductive Representation Learning On Large Graphs[C]//Advances in Neural Information Processing Systems.2017:1024-1034. [28]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[C]//Proceedings of the International Conference on Learning Representations.2017:1-14. [29]VELIČKOVIĆ P,CUCURULL G,CASANOVA A,et al.Graph Attention Networks[C]//Proceedings of the International Conference on Learning Representations.2018:1-12. [30]VELIČKOVIĆ P,FEDUS W,HAMILTON W L,et al.Deep Graph Infomax[C]//Proceedings of the International Confe-rence on Learning Representations.2018:1-17. [31]ZHANG S,LIU Y,SUN Y,et al.Graph-Less Neural Networks:Teaching Old Mlps New Tricks Via Distillation[C]//Procee-dings of the International Conference on Learning Representations.2022:1-21. [32]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-Thought Prompting Elicits Reasoning in Large Language Models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837. [33]LIU H,LI C,LI Y,et al.Improved Baselines with Visual Instruction Tuning[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2024:26296-26306. |
|
||