计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 232-241.doi: 10.11896/jsjkx.240100198
赵晨阳, 刘磊, 江贺
ZHAO Chenyang, LIU Lei, JIANG He
摘要: 即时软件缺陷预测(JIT-SDP)是一种针对代码变更的软件缺陷预测技术,具有细粒度、即时性和可追溯性的优点。工作量感知JIT-SDP进一步考虑代码检查工作量,旨在以有限的工作量识别更多的缺陷变更。尽管目前已有不少工作量感知JIT-SDP,但这些方法大多只针对分类模型算法进行优化。为提升工作量感知JIT-SDP的性能表现与泛用性,首次从特征工程方面入手,提出了一种工作量感知场景下的进化特征构建方法EEF。首先,EEF方法通过遗传编程树来表示特征,从分类性能与工作量感知性能两个角度出发,通过基于多目标优化的进化特征构建方法来获取新的特征转换方法;之后,通过得到的特征转换方法来构建新的特征集,并基于新的特征集训练与测试分类模型。为了验证EEF方法的有效性,在6个开源项目上,通过3个不同评估方案进行了实验研究,结果证明EEF方法可以提升分类模型在工作量感知场景下的性能,并优于其他特征工程方法,而且在保证特征选取多样性的前提下,基于单一模型的EEF方法同样可以提升其他模型的性能。
中图分类号:
[1]MENDE T,RAINER K.Revisiting the evaluation of defect prediction models[C]//5th International Conference on Predictor Models in Software Engineering.2009:1-10. [2]YUNHUA Z,KOSTADIN D,HUI C.A Systematic Survey of Just-in-Time Software Defect Prediction[J].ACM Comput.Surv.,2023,55(10):1-35. [3]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733. [4]YANG X G,YU H Q,FAN G S.An Empirical Study on Progressive Sampling for Just-in-Time Software Defect Prediction[C]//Proceedings of the 7th International Workshop on Quantitative Approaches to Software Quality.2019:12-18. [5]MENDE T,RAINER K.Effort-aware defect prediction models[C]//14th European Conference on Software Maintenance and Reengineering.2010:107-116. [6]GUO Y C,MARTIN S,LI N.Bridging effort-aware prediction and strong classification:a just-in-time software defect prediction study[C]//IEEE/ACM 40th International Conference on Software Engineering:Companion.2018:325-326. [7]LIU J P,ZHOU Y M,YANG Y B,et al.Code churn:A neglected metric in effort-aware just-in-time defect prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2017:11-19. [8]CHEN X,ZHAO Y Q,WANG Q P,et al.MULTI:Multi-objective effort-aware just-in-time software defect prediction[J].Information and Software Technology,2018,93:1-13. [9]YANG X G,YU H Q FAN G S.A differential evolution-based approach for effort-aware just-in-time software defect prediction[C]//Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages.2020:13-16. [10]AUDRIS M,DAVID W.Predicting risk of software changes[J].Bell Labs Tech,2000,5(2):169-180. [11]SUNGHUN K,JAMES W,YI Z.Classifying software changes:clean or buggy[J].IEEE Trans.Softw.Eng.,2008,34(2):181-196. [12]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733. [13]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models[J].Empir.Softw.Eng.,2016,21(5):2072-2106. [14]YANG X L,LO D,XIA X,et al.Deep learning for just-in-time defect prediction[C]//IEEE International Conference on Software Quality,Reliability and Security.2015:17-26. [15]THONG H,HOA K,YASUTAKA K,et al.DeepJIT:An end-to-end deep learning framework for just-in-time defect prediction[C]//IEEE/ACM 16th International Conference on Mining Software Repositories(MSR’19).IEEE,2019:34-45. [16]THONG H,HONG J,DAVID L,et al.CC2Vec:Distributedrepresentations of code changes[C]//ACM/IEEE 42nd International Conference on Software Engineering.2020:518-529. [17]JIRI G,JIAWEI L,IFTEKHAR A.An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM).2021:1-12. [18]ERIK A,LIONEL B,EIVIND J.A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J].Journal of Systems and Software,2010,83(1):2-17. [19]YANG Y,ZHOU Y,LIU J P,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//24th ACM SIGSOFTInternational Symposium on Foundations of Software Engineering.2016:157-168. [20]CHAO N,XIN X,DAVID L.Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction[J].Empirical Software Engineering,2019,24(5):2823-2862. [21]WEI F,TIM M.Revisiting unsupervised learning for defect prediction[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:72-83. [22]LI W W,ZHANG W Z,JIA X Y.Effort-Aware Semi-Supervised Just-in-Time Defect Prediction[J].Information and Software Technology,2020,126:106351-106364. [23]BI Y,XUE B,ZHANG M J.Genetic programming with image-related operators and a flexible program structure for feature learning in image classification[J].IEEE Trans.Evol.Comput.,2021,25(1):87-101. [24]BALIGH A,QI C,BING X.Multi-tree genetic programming for feature construction-based domain adaptation in symbolic regression with incomplete data[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference(GECCO’20).Association for Computing Machinery.2020:913-921. [25]LENSEN A,XUE B,ZHANG M J.Genetic programming forevolving similarity functions for clustering:Representations and analysis[J].Evol.Comput.,2020,28(4):531-561. [26]MICHAEL L R,WILLIAM F P,ERIK D G,et al.Genetic programming for improved data mining:application to the bioche-mistry of protein interactions[C]//Proceedings of the 1st Annual Conference on Genetic Programming.996:375-380. [27]KRZYSZTOF K.Genetic programming-based construction offeatures for machine learning and knowledge discovery tasks[J].Genet.Program.Evol.Mach.,2002,3:329-343. [28]ZHANG H Z,ZHOU A M,ZHANG H.An Evolutionary Forest for Regression[J].IEEE Transactions on Evolutionary Computation,2022,26(4):735-749. [29]BINH T,BING X,MENG J Z.Genetic programming for multiple feature construction on high-dimensional classification[J].Pattern Recognit,2019,93:404-417. [30]WILLIAM L,JASON H.Learning feature spaces for regression with genetic programming[J].Genet.Program.Evol.Mach.,2020,21:433-467. [31]ELAINE J,THOMAS J,ROBERT M.Comparing the effectiveness of several modeling methods for fault prediction[J].Empi-rical Software Engineering,2010,15(3):277-295. [32]KALYAN D,AMRIT P,SAMEER A,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Trans.Evol.Comput.,2002,6(2):182-197. [33]XING G Y,HUI Q Y,GUI S F.An empirical study on optimal solutions selection strategies for effort-aware just-in-time software defect prediction[C]//Proceedings of the 31st Interna-tional Conference on Software Engineering and Knowledge Engineering.2019:319-324. [34]JACEK S,THOMAS Z,ANDREAS Z.When do changes induce fixes[C]//Proceedings of the International Workshop on Mi-ning Software Repositories.2005:1-5. [35]YI B Y,YU M Z,JIN P L,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Enginee-ring.2016:157-168. [36]ROMANO J,KROMREY J,CORAGGIO J,et al.Exploringmethods for evaluating group differences on the NSSE and other surveys:Are the t-test and cohens d indices the most appropriate choices[C]//Annual Meeting of the Southern Association for Institutional Research.2006:1-51. |
|