计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 424-432.doi: 10.11896/jsjkx.250200124
宋建华1,3,4,5, 何佳伟1, 张龑2,3,5
SONG Jianhua1,3,4,5, HE Jiawei1, ZHANG Yan2,3,5
摘要: 随着软件漏洞日益增多,系统安全正面临着严峻的挑战。源代码漏洞检测可以在软件开发阶段及时发现软件应用中的潜在安全威胁,对保障软件应用的安全性至关重要。目前,主流的源代码漏洞检测方式为基于深度学习模型的漏洞检测方式。然而,现有的许多深度学习模型仅依赖单一形式特征,未能充分挖掘源代码语义中的全局和局部信息,并且这些模型往往忽略了不同样本之间的差异性和相似性,导致其在处理复杂漏洞模式时表现不佳,误报率和漏报率较高。为了解决上述问题,提出了一种基于对比学习的双通道源代码漏洞检测模型。该模型使用不同通道来分别提取源代码语义中的全局特征和局部特征,并引入对比学习,使得模型能够学习不同样本之间的相似性和差异性,并以此来优化特征提取过程。实验结果表明,此模型在真实世界的漏洞数据集Devign和Reveal上的召回率、F1分数相较于基线模型显著提升。在Devign上平均提升14.65个百分点和6.30个百分点;在Reveal上平均提升31.18个百分点和22.44个百分点。
中图分类号:
| [1]Skybox Security.Vulnerability & Threat Trends Report 2023[EB/OL].[2024-11-18].https://www.skyboxsecurity.com/resources/report/vulnerability-threat-trends-report-2023/. [2]SU X H,ZHENG W L,JIANG Y,et al.Research and progress on learning-base source code vulnerability detection[J].Journal of Computers,2024,47(2):337-374. [3]CHAKRABORTY S,KRISHNA R,DING Y,et al.Deep lear-ning based vulnerability detection:Are we there yet?[J].IEEE Transactions on Software Engineering,2021,48(9):3280-3296. [4]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607. [5]LIN H,CHENG X,WU X,et al.Cat:Cross attention in vision transformer[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6. [6]FENG Z,GUO D,TANG D,et al.Codebert:A pre-trainedmodel for programming and natural languages[J].arXiv:2002.08155,2020. [7]LIU Y.Roberta:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019. [8]GUO D,REN S,LU S,et al.Graphcodebert:Pre-training coderepresentations with data flow[J].arXiv:2009.08366,2020. [9]RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnerability detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE,2018:757-762. [10]LI Z,ZOU D,XU S,et al.Vuldeepecker:A deep learning-based system for vulnerability detection[J].arXiv:1801.01681,2018. [11]HANIF H,MAFFEIS S.Vulberta:Simplified source code pre-training for vulnerability detection[C]//2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022:1-8. [12]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [13]ZHOU Y,LIU S,SIOW J,et al.Devign:Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Advances in neural Information Processing Systems.2019. [14]NGUYEN V A,NGUYEN D Q,NGUYEN V,et al.ReGVD:Revisiting graph neural networks for vulnerability detection[C]//Proceedings of the ACM/IEEE 44th International Confe-rence on Software Engineering:Companion Proceedings.2022:178-182. [15]LING M,TANG M,BIAN D,et al.A dual graph neural networks model using sequence embedding as graph nodes for vulnerability detection[J].Information and Software Technology,2025,177:107581. [16]JAIN P,JAIN A,ZHANG T,et al.Contrastive code representation learning[J].arXiv:2007.04973,2020. [17]NEELAKANTAN A,XU T,PURI R,et al.Text and code embeddings by contrastive pre-training[J].arXiv:2201.10005,2022. [18]LIU S,WU B,XIE X,et al.Contrabert:Enhancing code pre-trained models via contrastive learning[C]//2023 IEEE/ACM 45th International Conference on Software Engineering(ICSE).IEEE,2023:2476-2487. [19]CHEN Y,SUN Z,GONG Z,et al.Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection[C]//Proceedings of the IEEE/ACM 46th International Confe-rence on Software Engineering.2024:1-11. [20]LLVM Team.libclang:C Interface to Clang[EB/OL].[2024-11-18].https://clang.llvm.org/doxygen/group__CINDEX.html. [21]GAGE P.A new algorithm for data compression[J].The CUsers Journal,1994,12(2):23-38. [22]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141. [23]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357. |
|
||