Computer Science ›› 2025, Vol. 52 ›› Issue (1): 242-249.doi: 10.11896/jsjkx.240200046

• Computer Software • Previous Articles     Next Articles

Just-In-Time Software Defect Prediction Approach Based on Fine-grained Code Representationand Feature Fusion

ZHU Xiaoyan, WANG Wenge, WANG Jiayin, ZHANG Xuanping   

  1. School of Computer Scienceand Technology,Xi’an Jiaotong University,Xi’an 710049,China
  • Received:2024-02-19 Revised:2024-07-02 Online:2025-01-15 Published:2025-01-09
  • About author:ZHU Xiaoyan,born in 1982,Ph.D,associate professor,Ph.D supervisor,is a member of CCF(No.73027M).Her main research interests include machine learning and data mining.
  • Supported by:
    National Natural Science Foundation of China(72274152).

Abstract: Just-in-time software defect prediction(JIT-SDP) aims to predict the defect tendency of software changes at the time when they are first committed.Such predictions are made on a single program change rather than on a coarse granularity.It has been widely used in fields such as continuous testing due to its immediacy and traceability.Existing JIT-SDP studies extract features from code changes at a coarse granularity,merely marking the changed lines without fine-grained tagging.Moreover,studies based on commit content are limited to simple concatenation of features extracted from commit messages and code changes,lacking deep alignment in feature space.This makes the prediction results tend to be disturbed by noise when the quality of committed message cannot be guaranteed.Existing methods also fail to fully utilize artificial features designed by domain experts and semantic syntax structure information in commit content at the same time,thus not fully leveraging existing features.To address these problems,a JIT-SDP approach based on fine-grained code changes and feature fusion is proposed.The method introduces new change embeddings to represent code changes at a fine granularity.By designing a feature alignment module,the impact of noise in low-quality commit message on performance is reduced.Meanwhile,neural networks are used to learn domain-specific knowledge from artificial features and fully utilize existing features.Experimental results show that compared to existing me-thods,this approach improves significantly on three performance metrics.

Key words: Just-in-time software defect prediction, Feature fusion, Software engineering, Deep learning, Code representation

CLC Number: 

  • TP311
[1]WANG S,LIU T,NAM J,et al.Deep Semantic Feature Lear-ning for Software Defect Prediction [J].IEEE Transactions on Software Engineering,2020,46(12):1267-1293.
[2]NUCCI D D,PALOMBA F,ROSA G D,et al.A Developer Centered Bug Prediction Model [J].IEEE Transactions on Software Engineering,2018,44(1):5-24.
[3]SHAO Y,LIU B,WANG S,et al.A novel software defect prediction based on atomic class-association rule mining [J].Expert Systems with Applications,2018,114:237-254.
[4]ASANO T,TSUNODA M,TODA K,et al.Using Bandit Algorithms for Project Selection in Cross-Project Defect Prediction [C]//Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2021:649-653.
[5]ZHAO Y,WANG Y,ZHANG D,et al.Eliminating the highfalse-positive rate in defect prediction through BayesNet with adjustable weight [J].Expert Systems,2022,39(6):e12977.
[6]HATA H,MIZUNO O,KIKUNO T.Bug prediction based onfine-grained module histories [C]//Proceedings of the 2012 34th International Conference on Software Engineering(ICSE).Zurich,Switzerland:IEEE,2012:200-210.
[7]HOANG T,DAM H K,KAMEI Y,et al.DeepJIT:an end-to-end deep learning framework for just-in-time defect prediction [C]//Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).Montreal,QC,Canada:IEEE,2019:34-45.
[8]HOANG T,KANG H J,LO D,et al.Cc2vec:Distributed representations of code changes [C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.Seoul,South Korea:Association for Computing Machinery,2020:518-529.
[9]ZHOU X,HAN D,LO D.Assessing Generalizability of Code-BERT [C]//Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution(ICSME).Luxembourg:IEEE,2021:425-436.
[10]D’AMBROS M,LANZA M,ROBBES R.Evaluating defect prediction approaches:a benchmark and an extensive comparison [J].Empirical Software Engineering,2012,17(4):531-577.
[11]TURHAN B,MENZIES T,BENER A B,et al.On the relative value of cross-company and within-company data for defect prediction [J].Empirical Software Engineering,2009,14(5):540-578.
[12]ZHAO Y H,DAMEVSKI K,CHEN H.A Systematic Survey of Just-in-Time Software Defect Prediction[J].ACM Computing Surveys,2023,55(10):1.1-1.35.
[13]KAMEI Y,SHIHAB E,ADAMS B,et al.A large-scale empirical study of just-in-time quality assurance [J].IEEE Transactions on Software Engineering,2012,39(6):757-773.
[14]SHIVAJI S,WHITEHEAD E J,AKELLA R,et al.Reducing features to improve code change-based bug prediction [J].IEEE Transactions on Software Engineering,2012,39(4):552-569.
[15]RAJBAHADUR G,WANG S,KAMEI Y,et al.The Impact of Using Regression Modelsto Build Defect Classifiers [C]//Proceedings of the 2017 IEEE/ACM 14th International Conference on Mining Software Repositories(MSR).Buenos Aires,Argentina:IEEE,2017:135-145.
[16]ZENG Z,ZHANG Y,ZHANG H,et al.Deep just-in-time defect prediction:How Far Are We? [C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.Virtual,Denmark:Association for Computing Machinery,2021:427-438.
[17]MATSUMOTO S,KAMEI Y,MONDEN A,et al.An analysis of developer metrics for fault prediction [C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.Timişoara,Romania:Association for Computing Machinery,2010:1-9.
[18]JIANG T,TAN L,KIM S.Personalized defect prediction [C]//Proceedings of the 2013 28th IEEE/ACM International Confe-rence on Automated Software Engineering(ASE).Silicon Valley,CA,USA:IEEE,2013:279-289.
[19]ZHAO K,XU Z,ZHANG T,et al.Simplified Deep Forest Model Based Just-in-Time Defect Prediction forAndroid Mobile Apps [J].IEEE Transactions on Reliability,2021,70(2):848-859.
[20]CHEN X,ZHAO Y,WANG Q,et al.MULTI:Multi-objective effort-aware just-in-time software defect prediction [J].Information and Software Technology,2018,93:1-13.
[21]KONDO M,GERMAN D M,MIZUNO O,et al.The impact of context metrics on just-in-time defect prediction [J].Empirical Software Engineering,2020,25(1):890-939.
[22]YANG X,LO D,XIA X,et al.Deep Learning for Just-in-Time Defect Prediction [C]//Proceedings of the 2015 IEEE International Conference on Software Quality,Reliability and Security.Vancouver,BC,Canada:IEEE,2015:17-26.
[23]QIAO L,WANG Y.Effort-aware and just-in-time defect prediction with neural network [J].PLoS One,2019,14(2):1-19.
[24]ZHU K,YING S,ZHANG N,et al.Software defect prediction based on enhanced metaheuristic feature selection optimization and a hybrid deep neural network [J].Journal of Systems and Software,2021,180:111026.
[25]FENG Z,GUO D,TANG D,et al.CodeBERT:A Pre-TrainedModel for Programming and Natural Languages [C]//Procee-dings of the Association for Computational Linguistics:EMNLP 2020.Online:Association for Computational Linguistics,2020:1536-1547.
[26]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need [J].Advances in Nutrition,2017,30:1-11.
[27]HUANG T,ZHANG Z,ZHANG J.FiBiNET:combining feature importance and bilinear feature interaction for click-through rate prediction [C]//Proceedings of the 13th ACM Conference on Recommender Systems.Copenhagen,Denmark:Association for Computing Machinery,2019:169-177.
[28]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models [J].Empirical Software Engineering,2016,21(5):2072-2106.
[29]YANG X,LO D,XIA X,et al.TLEL:A two-layer ensemble learning approach for just-in-time defect prediction [J].Information and Software Technology,2017,87:206-220.
[30]KESHAVARZ H,NAGAPPAN M.ApacheJIT:a large dataset for just-in-time defect prediction [C]//Proceedings of the 19th International Conference on Mining Software Repositories.Pittsburgh,Pennsylvania:Association for Computing Machinery,2022:191-195.
[31]SPADINI D,ANICHE M,BACCHELLI A.PyDriller:Pythonframework for mining software repositories [C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Lake Buena Vista,FL,USA:Association for Computing Machinery,2018:908-911.
[32]WANG Y,WANG W,JOTY S,et al.CodeT5:Identifier-awareUnified Pre-trained Encoder-Decoder Models for Code Understanding and Generation [J].arXiv.2109.00859,2021.
[33]LOSHCHILOV I,HUTTER F.Decoupled Weight Decay Regularization [C]//Proceedings ofthe 7th International Conference on Learning Representations.New Orleans,LA,USA:OpenReview.net,2019.
[34]ZHOU X,HAN D,LO D.Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor [C]//Proceedings of the 2022 IEEE/ACM 30th International Conference on Program Comprehension(ICPC).Pittsburgh,PA,USA:IEEE,2022:229-240.
[35]SAITO T,REHMSMEIER M.The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets [J].PLoS One,2015,10(3):1-21.
[36]GARCIA H V,SHIHAB E.Characterizing and predicting blo-cking bugs in open source projects [C]//Proceedings of the 11th Working Conference on Mining Software Repositories.Hyderabad,India:Association for Computing Machinery,2014:72-81.
[1] ZHANG Yusong, XU Shuai, YAN Xingyu, GUAN Donghai, XU Jianqiu. Survey on Cross-city Human Mobility Prediction [J]. Computer Science, 2025, 52(1): 102-119.
[2] LIU Yuming, DAI Yu, CHEN Gongping. Review of Federated Learning in Medical Image Processing [J]. Computer Science, 2025, 52(1): 183-193.
[3] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
[4] ZHANG Jian, LI Hui, ZHANG Shengming, WU Jie, PENG Ying. Review of Pre-training Methods for Visually-rich Document Understanding [J]. Computer Science, 2025, 52(1): 259-276.
[5] LI Yahe, XIE Zhipeng. Active Learning Based on Maximum Influence Set [J]. Computer Science, 2025, 52(1): 289-297.
[6] ZHANG Xin, ZHANG Han, NIU Manyu, JI Lixia. Adversarial Sample Detection in Computer Vision:A Survey [J]. Computer Science, 2025, 52(1): 345-361.
[7] SU Chaoran, ZHANG Dalong, HUANG Yong, DONG An. RF Fingerprint Recognition Based on SE Attention Multi-source Domain Adversarial Network [J]. Computer Science, 2025, 52(1): 412-419.
[8] XU Jinlong, GUI Zhonghua, LI Jia'nan, LI Yingying, HAN Lin. FP8 Quantization and Inference Memory Optimization Based on MLIR [J]. Computer Science, 2024, 51(9): 112-120.
[9] LI Xin, PU Yuanyuan, ZHAO Zhengpeng, LI Yupan, XU Dan. Image Arbitrary Style Transfer via Artistic Aesthetic Enhancement [J]. Computer Science, 2024, 51(9): 129-139.
[10] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[11] DU Yu, YU Zishu, PENG Xiaohui, XU Zhiwei. Padding Load:Load Reducing Cluster Resource Waste and Deep Learning Training Costs [J]. Computer Science, 2024, 51(9): 71-79.
[12] CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[13] SUN Yumo, LI Xinhang, ZHAO Wenjie, ZHU Li, LIANG Ya’nan. Driving Towards Intelligent Future:The Application of Deep Learning in Rail Transit Innovation [J]. Computer Science, 2024, 51(8): 1-10.
[14] KONG Lingchao, LIU Guozhu. Review of Outlier Detection Algorithms [J]. Computer Science, 2024, 51(8): 20-33.
[15] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!