计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 424-432.doi: 10.11896/jsjkx.250200124

• 信息安全 • 上一篇    下一篇

基于对比学习的双通道源代码漏洞检测模型

宋建华1,3,4,5, 何佳伟1, 张龑2,3,5   

  1. 1 湖北大学网络空间安全学院 武汉 430062
    2 湖北大学计算机学院 武汉 430062
    3 智能感知系统与安全教育部重点实验室 武汉 430062
    4 智能网联汽车网络安全湖北省工程研究中心 武汉 430062
    5 大数据智能分析与行业应用湖北省重点实验室 武汉 430062
  • 收稿日期:2025-02-27 修回日期:2025-05-23 发布日期:2026-03-12
  • 通讯作者: 何佳伟(1766962638@qq.com)
  • 作者简介:(sjhhubu@126.com)
  • 基金资助:
    国家自然科学基金(62377009);湖北省重大攻关项目(JD)(2023BAA018);湖北省重点研发计划重点项目(2021BAA184,2021BAA188);湖北省高等学校人文社会科学重点研究基地绩效评价信息管理研究中心课题(2020JX01);湖北省科技计划重大科技专项(2024BAA008)

Dual-channel Source Code Vulnerability Detection Model Based on Contrastive Learning

SONG Jianhua1,3,4,5, HE Jiawei1, ZHANG Yan2,3,5   

  1. 1 School of Cyber Science and Technology, Hubei University, Wuhan 430062, China
    2 School of Computer Science, Hubei University, Wuhan 430062, China
    3 Key Laboratory of Intelligent Sensing System and Security(Hubei University), Ministry of Education, Wuhan 430062, China
    4 Hubei Provincial Engineering Research Center of Intelligent Connected Vehicle Network Security, Wuhan 430062, China
    5 Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China
  • Received:2025-02-27 Revised:2025-05-23 Online:2026-03-12
  • About author:SONG Jianhua,born in 1973,Ph.D,professor,master’s supervisor,is a member of CCF(No.27785M).Her main research interests include network and information security and so on.
    HE Jiawei,born in 2001,postgraduate.His main research interests include source code vulnerability detection and so on.
  • Supported by:
    National Natural Science Foundation of China(62377009),Major Project of Hubei Province(JD)(2023BAA018),Key Project of Hubei Provincial Key R & D Program(2021BAA184, 2021BAA188),Research Center for Performance Evaluation and Information Management of Key Research Bases for Humanities and Social Sciences in Hubei Provincial Colleges and Universities(2020JX01) and Major Science and Technology Special Project of Hubei Science and Technology Plan(2024BAA008).

摘要: 随着软件漏洞日益增多,系统安全正面临着严峻的挑战。源代码漏洞检测可以在软件开发阶段及时发现软件应用中的潜在安全威胁,对保障软件应用的安全性至关重要。目前,主流的源代码漏洞检测方式为基于深度学习模型的漏洞检测方式。然而,现有的许多深度学习模型仅依赖单一形式特征,未能充分挖掘源代码语义中的全局和局部信息,并且这些模型往往忽略了不同样本之间的差异性和相似性,导致其在处理复杂漏洞模式时表现不佳,误报率和漏报率较高。为了解决上述问题,提出了一种基于对比学习的双通道源代码漏洞检测模型。该模型使用不同通道来分别提取源代码语义中的全局特征和局部特征,并引入对比学习,使得模型能够学习不同样本之间的相似性和差异性,并以此来优化特征提取过程。实验结果表明,此模型在真实世界的漏洞数据集Devign和Reveal上的召回率、F1分数相较于基线模型显著提升。在Devign上平均提升14.65个百分点和6.30个百分点;在Reveal上平均提升31.18个百分点和22.44个百分点。

关键词: 源代码漏洞检测, 双通道网络模型, 对比学习, 交叉注意力, 特征融合

Abstract: As software vulnerabilities continue to increase,system security is facing severe challenges.Source code vulnerability detection can identify potential security threats in software applications during the development phase,which is crucial for ensuring the security of software applications.Currently,the mainstream method for source code vulnerability detection is based on deep learning models.However,many existing deep learning models rely only on a single form of features and fail to fully explore both the global and local information in the source code semantics.Additionally,these models often overlook the differences and similarities between different samples,leading to poor performance when handling complex vulnerability patterns,with high false positive and false negative rates.To address these issues,a dual-channel source code vulnerability detection model based on con-trastive learning is proposed.This model uses different channels to separately extract global and local features from the source code semantics and introduces contrastive learning to allow the model to learn the similarities and differences between different samples,thereby optimizing the feature extraction process.Experimental results show that this model shows significant improvements in recall and F1 score on the real-world vulnerability datasets,Devign and Reveal,compared to the baseline models.The average improvement is 14.65 percentage points and 6.30 percentage points on Devign,and 31.18 percentage points and 22.44 percentage points on Reveal.

Key words: Source code vulnerability detection, Dual channel network model, Comparative learning, Cross attention, Feature fusion

中图分类号: 

  • TP311
[1]Skybox Security.Vulnerability & Threat Trends Report 2023[EB/OL].[2024-11-18].https://www.skyboxsecurity.com/resources/report/vulnerability-threat-trends-report-2023/.
[2]SU X H,ZHENG W L,JIANG Y,et al.Research and progress on learning-base source code vulnerability detection[J].Journal of Computers,2024,47(2):337-374.
[3]CHAKRABORTY S,KRISHNA R,DING Y,et al.Deep lear-ning based vulnerability detection:Are we there yet?[J].IEEE Transactions on Software Engineering,2021,48(9):3280-3296.
[4]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[5]LIN H,CHENG X,WU X,et al.Cat:Cross attention in vision transformer[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6.
[6]FENG Z,GUO D,TANG D,et al.Codebert:A pre-trainedmodel for programming and natural languages[J].arXiv:2002.08155,2020.
[7]LIU Y.Roberta:A robustly optimized bert pretraining approach[J].arXiv:1907.11692,2019.
[8]GUO D,REN S,LU S,et al.Graphcodebert:Pre-training coderepresentations with data flow[J].arXiv:2009.08366,2020.
[9]RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnerability detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE,2018:757-762.
[10]LI Z,ZOU D,XU S,et al.Vuldeepecker:A deep learning-based system for vulnerability detection[J].arXiv:1801.01681,2018.
[11]HANIF H,MAFFEIS S.Vulberta:Simplified source code pre-training for vulnerability detection[C]//2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022:1-8.
[12]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[13]ZHOU Y,LIU S,SIOW J,et al.Devign:Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Advances in neural Information Processing Systems.2019.
[14]NGUYEN V A,NGUYEN D Q,NGUYEN V,et al.ReGVD:Revisiting graph neural networks for vulnerability detection[C]//Proceedings of the ACM/IEEE 44th International Confe-rence on Software Engineering:Companion Proceedings.2022:178-182.
[15]LING M,TANG M,BIAN D,et al.A dual graph neural networks model using sequence embedding as graph nodes for vulnerability detection[J].Information and Software Technology,2025,177:107581.
[16]JAIN P,JAIN A,ZHANG T,et al.Contrastive code representation learning[J].arXiv:2007.04973,2020.
[17]NEELAKANTAN A,XU T,PURI R,et al.Text and code embeddings by contrastive pre-training[J].arXiv:2201.10005,2022.
[18]LIU S,WU B,XIE X,et al.Contrabert:Enhancing code pre-trained models via contrastive learning[C]//2023 IEEE/ACM 45th International Conference on Software Engineering(ICSE).IEEE,2023:2476-2487.
[19]CHEN Y,SUN Z,GONG Z,et al.Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection[C]//Proceedings of the IEEE/ACM 46th International Confe-rence on Software Engineering.2024:1-11.
[20]LLVM Team.libclang:C Interface to Clang[EB/OL].[2024-11-18].https://clang.llvm.org/doxygen/group__CINDEX.html.
[21]GAGE P.A new algorithm for data compression[J].The CUsers Journal,1994,12(2):23-38.
[22]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[23]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!