计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 232-241.doi: 10.11896/jsjkx.240100198

• 计算机软件 • 上一篇    下一篇

基于多目标优化的工作量感知即时软件缺陷预测特征构建方法

赵晨阳, 刘磊, 江贺   

  1. 大连理工大学软件学院 辽宁 大连 116600
  • 收稿日期:2024-01-29 修回日期:2024-06-22 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 江贺(jianghe@dlut.edu.cn)
  • 作者简介:(zcy19998006@mail.dlut.edu.cn)

Feature Construction for Effort-aware Just-In-Time Software Defect Prediction Based on Multi-objective Optimization

ZHAO Chenyang, LIU Lei, JIANG He   

  1. School of Software,Dalian University of Technology,Dalian,Liaoning 116600,China
  • Received:2024-01-29 Revised:2024-06-22 Online:2025-01-15 Published:2025-01-09
  • About author:ZHAO Chenyang,born in 1999, postgraduate. His main research interests include just-in-time software testing and so on.
    JIANG He,born in 1980,professor,Ph.D supervisor,is a member of CCF(No.08846D).His main research intere-sts include system software and intelligent software engineering.

摘要: 即时软件缺陷预测(JIT-SDP)是一种针对代码变更的软件缺陷预测技术,具有细粒度、即时性和可追溯性的优点。工作量感知JIT-SDP进一步考虑代码检查工作量,旨在以有限的工作量识别更多的缺陷变更。尽管目前已有不少工作量感知JIT-SDP,但这些方法大多只针对分类模型算法进行优化。为提升工作量感知JIT-SDP的性能表现与泛用性,首次从特征工程方面入手,提出了一种工作量感知场景下的进化特征构建方法EEF。首先,EEF方法通过遗传编程树来表示特征,从分类性能与工作量感知性能两个角度出发,通过基于多目标优化的进化特征构建方法来获取新的特征转换方法;之后,通过得到的特征转换方法来构建新的特征集,并基于新的特征集训练与测试分类模型。为了验证EEF方法的有效性,在6个开源项目上,通过3个不同评估方案进行了实验研究,结果证明EEF方法可以提升分类模型在工作量感知场景下的性能,并优于其他特征工程方法,而且在保证特征选取多样性的前提下,基于单一模型的EEF方法同样可以提升其他模型的性能。

关键词: 即时缺陷预测, 工作量感知, 进化特征构建, 多目标优化, 特征工程

Abstract: Just-in-time software defect prediction(JIT-SDP) is a software defect prediction technology for code changes,which has the advantages of fine granularity,instantaneity,and traceability.Effort-aware JIT-SDP further considers the cost of code inspection and aims to detect more defective code changes with limited testing resources.Although many effort-aware JIT-SDPs have been proposed,most of them only optimize model algorithms.In order to improve the performance and generalizability of effort-aware JIT-SDP,an effort-aware evolutionary feature construction method EEF is proposed for the first time from the aspect of feature engineering.Firstly,EEF represents features through genetic programming trees.From the two aspects of classification performance and effort-aware performance,a new feature transformation is obtained through an evolutionary feature construction method based on multi-objective optimization.After that,a new feature set is constructed through the obtained feature transformation,and the classification model is trained and tested on the new feature set.In order to verify the effectiveness of EEF,expe-riments are conducted in three different evaluation schemes on six open source datasets.The results prove that EEF can improve the performance of the classification model in effort-aware scenarios and performs better than other feature engineering methods.Moreover,under the premise of ensuring the diversity of feature selection,EEF based on a single model can also improve the performance of other models.

Key words: Just-in-time defect prediction, Effort-aware, Evolutionary feature construction, Multi-objective optimization, Feature engineering

中图分类号: 

  • TP311
[1]MENDE T,RAINER K.Revisiting the evaluation of defect prediction models[C]//5th International Conference on Predictor Models in Software Engineering.2009:1-10.
[2]YUNHUA Z,KOSTADIN D,HUI C.A Systematic Survey of Just-in-Time Software Defect Prediction[J].ACM Comput.Surv.,2023,55(10):1-35.
[3]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733.
[4]YANG X G,YU H Q,FAN G S.An Empirical Study on Progressive Sampling for Just-in-Time Software Defect Prediction[C]//Proceedings of the 7th International Workshop on Quantitative Approaches to Software Quality.2019:12-18.
[5]MENDE T,RAINER K.Effort-aware defect prediction models[C]//14th European Conference on Software Maintenance and Reengineering.2010:107-116.
[6]GUO Y C,MARTIN S,LI N.Bridging effort-aware prediction and strong classification:a just-in-time software defect prediction study[C]//IEEE/ACM 40th International Conference on Software Engineering:Companion.2018:325-326.
[7]LIU J P,ZHOU Y M,YANG Y B,et al.Code churn:A neglected metric in effort-aware just-in-time defect prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2017:11-19.
[8]CHEN X,ZHAO Y Q,WANG Q P,et al.MULTI:Multi-objective effort-aware just-in-time software defect prediction[J].Information and Software Technology,2018,93:1-13.
[9]YANG X G,YU H Q FAN G S.A differential evolution-based approach for effort-aware just-in-time software defect prediction[C]//Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages.2020:13-16.
[10]AUDRIS M,DAVID W.Predicting risk of software changes[J].Bell Labs Tech,2000,5(2):169-180.
[11]SUNGHUN K,JAMES W,YI Z.Classifying software changes:clean or buggy[J].IEEE Trans.Softw.Eng.,2008,34(2):181-196.
[12]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733.
[13]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models[J].Empir.Softw.Eng.,2016,21(5):2072-2106.
[14]YANG X L,LO D,XIA X,et al.Deep learning for just-in-time defect prediction[C]//IEEE International Conference on Software Quality,Reliability and Security.2015:17-26.
[15]THONG H,HOA K,YASUTAKA K,et al.DeepJIT:An end-to-end deep learning framework for just-in-time defect prediction[C]//IEEE/ACM 16th International Conference on Mining Software Repositories(MSR’19).IEEE,2019:34-45.
[16]THONG H,HONG J,DAVID L,et al.CC2Vec:Distributedrepresentations of code changes[C]//ACM/IEEE 42nd International Conference on Software Engineering.2020:518-529.
[17]JIRI G,JIAWEI L,IFTEKHAR A.An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM).2021:1-12.
[18]ERIK A,LIONEL B,EIVIND J.A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J].Journal of Systems and Software,2010,83(1):2-17.
[19]YANG Y,ZHOU Y,LIU J P,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//24th ACM SIGSOFTInternational Symposium on Foundations of Software Engineering.2016:157-168.
[20]CHAO N,XIN X,DAVID L.Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction[J].Empirical Software Engineering,2019,24(5):2823-2862.
[21]WEI F,TIM M.Revisiting unsupervised learning for defect prediction[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:72-83.
[22]LI W W,ZHANG W Z,JIA X Y.Effort-Aware Semi-Supervised Just-in-Time Defect Prediction[J].Information and Software Technology,2020,126:106351-106364.
[23]BI Y,XUE B,ZHANG M J.Genetic programming with image-related operators and a flexible program structure for feature learning in image classification[J].IEEE Trans.Evol.Comput.,2021,25(1):87-101.
[24]BALIGH A,QI C,BING X.Multi-tree genetic programming for feature construction-based domain adaptation in symbolic regression with incomplete data[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference(GECCO’20).Association for Computing Machinery.2020:913-921.
[25]LENSEN A,XUE B,ZHANG M J.Genetic programming forevolving similarity functions for clustering:Representations and analysis[J].Evol.Comput.,2020,28(4):531-561.
[26]MICHAEL L R,WILLIAM F P,ERIK D G,et al.Genetic programming for improved data mining:application to the bioche-mistry of protein interactions[C]//Proceedings of the 1st Annual Conference on Genetic Programming.996:375-380.
[27]KRZYSZTOF K.Genetic programming-based construction offeatures for machine learning and knowledge discovery tasks[J].Genet.Program.Evol.Mach.,2002,3:329-343.
[28]ZHANG H Z,ZHOU A M,ZHANG H.An Evolutionary Forest for Regression[J].IEEE Transactions on Evolutionary Computation,2022,26(4):735-749.
[29]BINH T,BING X,MENG J Z.Genetic programming for multiple feature construction on high-dimensional classification[J].Pattern Recognit,2019,93:404-417.
[30]WILLIAM L,JASON H.Learning feature spaces for regression with genetic programming[J].Genet.Program.Evol.Mach.,2020,21:433-467.
[31]ELAINE J,THOMAS J,ROBERT M.Comparing the effectiveness of several modeling methods for fault prediction[J].Empi-rical Software Engineering,2010,15(3):277-295.
[32]KALYAN D,AMRIT P,SAMEER A,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Trans.Evol.Comput.,2002,6(2):182-197.
[33]XING G Y,HUI Q Y,GUI S F.An empirical study on optimal solutions selection strategies for effort-aware just-in-time software defect prediction[C]//Proceedings of the 31st Interna-tional Conference on Software Engineering and Knowledge Engineering.2019:319-324.
[34]JACEK S,THOMAS Z,ANDREAS Z.When do changes induce fixes[C]//Proceedings of the International Workshop on Mi-ning Software Repositories.2005:1-5.
[35]YI B Y,YU M Z,JIN P L,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Enginee-ring.2016:157-168.
[36]ROMANO J,KROMREY J,CORAGGIO J,et al.Exploringmethods for evaluating group differences on the NSSE and other surveys:Are the t-test and cohens d indices the most appropriate choices[C]//Annual Meeting of the Southern Association for Institutional Research.2006:1-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!