计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 242-249.doi: 10.11896/jsjkx.240200046

• 计算机软件 • 上一篇    下一篇

基于细粒度代码表示和特征融合的即时软件缺陷预测方法

朱晓燕, 王文格, 王嘉寅, 张选平   

  1. 西安交通大学计算机科学与技术学院 西安 710049
  • 收稿日期:2024-02-19 修回日期:2024-07-02 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 朱晓燕(zhu.xy@xjtu.edu.cn)
  • 基金资助:
    国家自然科学基金(72274152)

Just-In-Time Software Defect Prediction Approach Based on Fine-grained Code Representationand Feature Fusion

ZHU Xiaoyan, WANG Wenge, WANG Jiayin, ZHANG Xuanping   

  1. School of Computer Scienceand Technology,Xi’an Jiaotong University,Xi’an 710049,China
  • Received:2024-02-19 Revised:2024-07-02 Online:2025-01-15 Published:2025-01-09
  • About author:ZHU Xiaoyan,born in 1982,Ph.D,associate professor,Ph.D supervisor,is a member of CCF(No.73027M).Her main research interests include machine learning and data mining.
  • Supported by:
    National Natural Science Foundation of China(72274152).

摘要: 即时软件缺陷预测指在软件更改初次提交之际预测该更改引入缺陷的倾向。此类预测针对单一程序变更,而非在粗粒度上进行。由于其即时性和可追溯性,该技术已在持续测试等领域得到广泛应用。目前的研究中,提取变更代码表示的方法粒度较粗,仅标出了变更行,而没有进行细粒度的标记。此外,现有的使用提交内容进行缺陷预测的方法,仅仅是把提交消息与变更代码的特征进行简单拼接,缺失了在特征空间上的深度对齐,这使得在提交消息质量参差不齐的情况下,会出现预测结果易受噪声干扰的情形,并且现有方法也未将领域专家设计的人工特征以及变更内容中的语义语法信息综合起来进行预测。为了解决上述问题,提出了一种基于细粒度代码表征和特征融合的即时软件缺陷预测方法。通过引入新的变更嵌入计算方法来在细粒度上表示变更代码。同时,引入特征对齐模块,降低提交消息中噪声对方法性能的影响。此外,使用神经网络从人工设计的特征中学习专业知识,充分利用现有特征进行预测。实验结果表明,相较于现有方法,该方法在3个性能指标上均有显著提升。

关键词: 即时软件缺陷预测, 特征融合, 软件工程, 深度学习, 代码表示

Abstract: Just-in-time software defect prediction(JIT-SDP) aims to predict the defect tendency of software changes at the time when they are first committed.Such predictions are made on a single program change rather than on a coarse granularity.It has been widely used in fields such as continuous testing due to its immediacy and traceability.Existing JIT-SDP studies extract features from code changes at a coarse granularity,merely marking the changed lines without fine-grained tagging.Moreover,studies based on commit content are limited to simple concatenation of features extracted from commit messages and code changes,lacking deep alignment in feature space.This makes the prediction results tend to be disturbed by noise when the quality of committed message cannot be guaranteed.Existing methods also fail to fully utilize artificial features designed by domain experts and semantic syntax structure information in commit content at the same time,thus not fully leveraging existing features.To address these problems,a JIT-SDP approach based on fine-grained code changes and feature fusion is proposed.The method introduces new change embeddings to represent code changes at a fine granularity.By designing a feature alignment module,the impact of noise in low-quality commit message on performance is reduced.Meanwhile,neural networks are used to learn domain-specific knowledge from artificial features and fully utilize existing features.Experimental results show that compared to existing me-thods,this approach improves significantly on three performance metrics.

Key words: Just-in-time software defect prediction, Feature fusion, Software engineering, Deep learning, Code representation

中图分类号: 

  • TP311
[1]WANG S,LIU T,NAM J,et al.Deep Semantic Feature Lear-ning for Software Defect Prediction [J].IEEE Transactions on Software Engineering,2020,46(12):1267-1293.
[2]NUCCI D D,PALOMBA F,ROSA G D,et al.A Developer Centered Bug Prediction Model [J].IEEE Transactions on Software Engineering,2018,44(1):5-24.
[3]SHAO Y,LIU B,WANG S,et al.A novel software defect prediction based on atomic class-association rule mining [J].Expert Systems with Applications,2018,114:237-254.
[4]ASANO T,TSUNODA M,TODA K,et al.Using Bandit Algorithms for Project Selection in Cross-Project Defect Prediction [C]//Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2021:649-653.
[5]ZHAO Y,WANG Y,ZHANG D,et al.Eliminating the highfalse-positive rate in defect prediction through BayesNet with adjustable weight [J].Expert Systems,2022,39(6):e12977.
[6]HATA H,MIZUNO O,KIKUNO T.Bug prediction based onfine-grained module histories [C]//Proceedings of the 2012 34th International Conference on Software Engineering(ICSE).Zurich,Switzerland:IEEE,2012:200-210.
[7]HOANG T,DAM H K,KAMEI Y,et al.DeepJIT:an end-to-end deep learning framework for just-in-time defect prediction [C]//Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).Montreal,QC,Canada:IEEE,2019:34-45.
[8]HOANG T,KANG H J,LO D,et al.Cc2vec:Distributed representations of code changes [C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.Seoul,South Korea:Association for Computing Machinery,2020:518-529.
[9]ZHOU X,HAN D,LO D.Assessing Generalizability of Code-BERT [C]//Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution(ICSME).Luxembourg:IEEE,2021:425-436.
[10]D’AMBROS M,LANZA M,ROBBES R.Evaluating defect prediction approaches:a benchmark and an extensive comparison [J].Empirical Software Engineering,2012,17(4):531-577.
[11]TURHAN B,MENZIES T,BENER A B,et al.On the relative value of cross-company and within-company data for defect prediction [J].Empirical Software Engineering,2009,14(5):540-578.
[12]ZHAO Y H,DAMEVSKI K,CHEN H.A Systematic Survey of Just-in-Time Software Defect Prediction[J].ACM Computing Surveys,2023,55(10):1.1-1.35.
[13]KAMEI Y,SHIHAB E,ADAMS B,et al.A large-scale empirical study of just-in-time quality assurance [J].IEEE Transactions on Software Engineering,2012,39(6):757-773.
[14]SHIVAJI S,WHITEHEAD E J,AKELLA R,et al.Reducing features to improve code change-based bug prediction [J].IEEE Transactions on Software Engineering,2012,39(4):552-569.
[15]RAJBAHADUR G,WANG S,KAMEI Y,et al.The Impact of Using Regression Modelsto Build Defect Classifiers [C]//Proceedings of the 2017 IEEE/ACM 14th International Conference on Mining Software Repositories(MSR).Buenos Aires,Argentina:IEEE,2017:135-145.
[16]ZENG Z,ZHANG Y,ZHANG H,et al.Deep just-in-time defect prediction:How Far Are We? [C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.Virtual,Denmark:Association for Computing Machinery,2021:427-438.
[17]MATSUMOTO S,KAMEI Y,MONDEN A,et al.An analysis of developer metrics for fault prediction [C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.Timişoara,Romania:Association for Computing Machinery,2010:1-9.
[18]JIANG T,TAN L,KIM S.Personalized defect prediction [C]//Proceedings of the 2013 28th IEEE/ACM International Confe-rence on Automated Software Engineering(ASE).Silicon Valley,CA,USA:IEEE,2013:279-289.
[19]ZHAO K,XU Z,ZHANG T,et al.Simplified Deep Forest Model Based Just-in-Time Defect Prediction forAndroid Mobile Apps [J].IEEE Transactions on Reliability,2021,70(2):848-859.
[20]CHEN X,ZHAO Y,WANG Q,et al.MULTI:Multi-objective effort-aware just-in-time software defect prediction [J].Information and Software Technology,2018,93:1-13.
[21]KONDO M,GERMAN D M,MIZUNO O,et al.The impact of context metrics on just-in-time defect prediction [J].Empirical Software Engineering,2020,25(1):890-939.
[22]YANG X,LO D,XIA X,et al.Deep Learning for Just-in-Time Defect Prediction [C]//Proceedings of the 2015 IEEE International Conference on Software Quality,Reliability and Security.Vancouver,BC,Canada:IEEE,2015:17-26.
[23]QIAO L,WANG Y.Effort-aware and just-in-time defect prediction with neural network [J].PLoS One,2019,14(2):1-19.
[24]ZHU K,YING S,ZHANG N,et al.Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network [J].Journal of Systems and Software,2021,180:111026.
[25]FENG Z,GUO D,TANG D,et al.CodeBERT:A Pre-TrainedModel for Programming and Natural Languages [C]//Procee-dings of the Association for Computational Linguistics:EMNLP 2020.Online:Association for Computational Linguistics,2020:1536-1547.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need [J].Advances in Nutrition,2017,30:1-11.
[27]HUANG T,ZHANG Z,ZHANG J.FiBiNET:combining feature importance and bilinear feature interaction for click-through rate prediction [C]//Proceedings of the 13th ACM Conference on Recommender Systems.Copenhagen,Denmark:Association for Computing Machinery,2019:169-177.
[28]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models [J].Empirical Software Engineering,2016,21(5):2072-2106.
[29]YANG X,LO D,XIA X,et al.TLEL:A two-layer ensemble learning approach for just-in-time defect prediction [J].Information and Software Technology,2017,87:206-220.
[30]KESHAVARZ H,NAGAPPAN M.ApacheJIT:a large dataset for just-in-time defect prediction [C]//Proceedings of the 19th International Conference on Mining Software Repositories.Pittsburgh,Pennsylvania:Association for Computing Machinery,2022:191-195.
[31]SPADINI D,ANICHE M,BACCHELLI A.PyDriller:Pythonframework for mining software repositories [C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Lake Buena Vista,FL,USA:Association for Computing Machinery,2018:908-911.
[32]WANG Y,WANG W,JOTY S,et al.CodeT5:Identifier-awareUnified Pre-trained Encoder-Decoder Models for Code Understanding and Generation [J].arXiv.2109.00859,2021.
[33]LOSHCHILOV I,HUTTER F.Decoupled Weight Decay Regularization [C]//Proceedings ofthe 7th International Conference on Learning Representations.New Orleans,LA,USA:OpenReview.net,2019.
[34]ZHOU X,HAN D,LO D.Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor [C]//Proceedings of the 2022 IEEE/ACM 30th International Conference on Program Comprehension(ICPC).Pittsburgh,PA,USA:IEEE,2022:229-240.
[35]SAITO T,REHMSMEIER M.The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets [J].PLoS One,2015,10(3):1-21.
[36]GARCIA H V,SHIHAB E.Characterizing and predicting blo-cking bugs in open source projects [C]//Proceedings of the 11th Working Conference on Mining Software Repositories.Hyderabad,India:Association for Computing Machinery,2014:72-81.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!