Computer Science ›› 2025, Vol. 52 ›› Issue (1): 232-241.doi: 10.11896/jsjkx.240100198

• Computer Software • Previous Articles     Next Articles

Feature Construction for Effort-aware Just-In-Time Software Defect Prediction Based on Multi-objective Optimization

ZHAO Chenyang, LIU Lei, JIANG He   

  1. School of Software,Dalian University of Technology,Dalian,Liaoning 116600,China
  • Received:2024-01-29 Revised:2024-06-22 Online:2025-01-15 Published:2025-01-09
  • About author:ZHAO Chenyang,born in 1999, postgraduate. His main research interests include just-in-time software testing and so on.
    JIANG He,born in 1980,professor,Ph.D supervisor,is a member of CCF(No.08846D).His main research intere-sts include system software and intelligent software engineering.

Abstract: Just-in-time software defect prediction(JIT-SDP) is a software defect prediction technology for code changes,which has the advantages of fine granularity,instantaneity,and traceability.Effort-aware JIT-SDP further considers the cost of code inspection and aims to detect more defective code changes with limited testing resources.Although many effort-aware JIT-SDPs have been proposed,most of them only optimize model algorithms.In order to improve the performance and generalizability of effort-aware JIT-SDP,an effort-aware evolutionary feature construction method EEF is proposed for the first time from the aspect of feature engineering.Firstly,EEF represents features through genetic programming trees.From the two aspects of classification performance and effort-aware performance,a new feature transformation is obtained through an evolutionary feature construction method based on multi-objective optimization.After that,a new feature set is constructed through the obtained feature transformation,and the classification model is trained and tested on the new feature set.In order to verify the effectiveness of EEF,expe-riments are conducted in three different evaluation schemes on six open source datasets.The results prove that EEF can improve the performance of the classification model in effort-aware scenarios and performs better than other feature engineering methods.Moreover,under the premise of ensuring the diversity of feature selection,EEF based on a single model can also improve the performance of other models.

Key words: Just-in-time defect prediction, Effort-aware, Evolutionary feature construction, Multi-objective optimization, Feature engineering

CLC Number: 

  • TP311
[1]MENDE T,RAINER K.Revisiting the evaluation of defect prediction models[C]//5th International Conference on Predictor Models in Software Engineering.2009:1-10.
[2]YUNHUA Z,KOSTADIN D,HUI C.A Systematic Survey of Just-in-Time Software Defect Prediction[J].ACM Comput.Surv.,2023,55(10):1-35.
[3]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733.
[4]YANG X G,YU H Q,FAN G S.An Empirical Study on Progressive Sampling for Just-in-Time Software Defect Prediction[C]//Proceedings of the 7th International Workshop on Quantitative Approaches to Software Quality.2019:12-18.
[5]MENDE T,RAINER K.Effort-aware defect prediction models[C]//14th European Conference on Software Maintenance and Reengineering.2010:107-116.
[6]GUO Y C,MARTIN S,LI N.Bridging effort-aware prediction and strong classification:a just-in-time software defect prediction study[C]//IEEE/ACM 40th International Conference on Software Engineering:Companion.2018:325-326.
[7]LIU J P,ZHOU Y M,YANG Y B,et al.Code churn:A neglected metric in effort-aware just-in-time defect prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2017:11-19.
[8]CHEN X,ZHAO Y Q,WANG Q P,et al.MULTI:Multi-objective effort-aware just-in-time software defect prediction[J].Information and Software Technology,2018,93:1-13.
[9]YANG X G,YU H Q FAN G S.A differential evolution-based approach for effort-aware just-in-time software defect prediction[C]//Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages.2020:13-16.
[10]AUDRIS M,DAVID W.Predicting risk of software changes[J].Bell Labs Tech,2000,5(2):169-180.
[11]SUNGHUN K,JAMES W,YI Z.Classifying software changes:clean or buggy[J].IEEE Trans.Softw.Eng.,2008,34(2):181-196.
[12]YASUTAKA K,EMAD S,BRAM A,et al.A large-scale empi-rical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-733.
[13]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models[J].Empir.Softw.Eng.,2016,21(5):2072-2106.
[14]YANG X L,LO D,XIA X,et al.Deep learning for just-in-time defect prediction[C]//IEEE International Conference on Software Quality,Reliability and Security.2015:17-26.
[15]THONG H,HOA K,YASUTAKA K,et al.DeepJIT:An end-to-end deep learning framework for just-in-time defect prediction[C]//IEEE/ACM 16th International Conference on Mining Software Repositories(MSR’19).IEEE,2019:34-45.
[16]THONG H,HONG J,DAVID L,et al.CC2Vec:Distributedrepresentations of code changes[C]//ACM/IEEE 42nd International Conference on Software Engineering.2020:518-529.
[17]JIRI G,JIAWEI L,IFTEKHAR A.An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction[C]//ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM).2021:1-12.
[18]ERIK A,LIONEL B,EIVIND J.A systematic and comprehensive investigation of methods to build and evaluate fault prediction models[J].Journal of Systems and Software,2010,83(1):2-17.
[19]YANG Y,ZHOU Y,LIU J P,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//24th ACM SIGSOFTInternational Symposium on Foundations of Software Engineering.2016:157-168.
[20]CHAO N,XIN X,DAVID L.Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction[J].Empirical Software Engineering,2019,24(5):2823-2862.
[21]WEI F,TIM M.Revisiting unsupervised learning for defect prediction[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:72-83.
[22]LI W W,ZHANG W Z,JIA X Y.Effort-Aware Semi-Supervised Just-in-Time Defect Prediction[J].Information and Software Technology,2020,126:106351-106364.
[23]BI Y,XUE B,ZHANG M J.Genetic programming with image-related operators and a flexible program structure for feature learning in image classification[J].IEEE Trans.Evol.Comput.,2021,25(1):87-101.
[24]BALIGH A,QI C,BING X.Multi-tree genetic programming for feature construction-based domain adaptation in symbolic regression with incomplete data[C]//Proceedings of the 2020 Genetic and Evolutionary Computation Conference(GECCO’20).Association for Computing Machinery.2020:913-921.
[25]LENSEN A,XUE B,ZHANG M J.Genetic programming forevolving similarity functions for clustering:Representations and analysis[J].Evol.Comput.,2020,28(4):531-561.
[26]MICHAEL L R,WILLIAM F P,ERIK D G,et al.Genetic programming for improved data mining:application to the bioche-mistry of protein interactions[C]//Proceedings of the 1st Annual Conference on Genetic Programming.996:375-380.
[27]KRZYSZTOF K.Genetic programming-based construction offeatures for machine learning and knowledge discovery tasks[J].Genet.Program.Evol.Mach.,2002,3:329-343.
[28]ZHANG H Z,ZHOU A M,ZHANG H.An Evolutionary Forest for Regression[J].IEEE Transactions on Evolutionary Computation,2022,26(4):735-749.
[29]BINH T,BING X,MENG J Z.Genetic programming for multiple feature construction on high-dimensional classification[J].Pattern Recognit,2019,93:404-417.
[30]WILLIAM L,JASON H.Learning feature spaces for regression with genetic programming[J].Genet.Program.Evol.Mach.,2020,21:433-467.
[31]ELAINE J,THOMAS J,ROBERT M.Comparing the effectiveness of several modeling methods for fault prediction[J].Empi-rical Software Engineering,2010,15(3):277-295.
[32]KALYAN D,AMRIT P,SAMEER A,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Trans.Evol.Comput.,2002,6(2):182-197.
[33]XING G Y,HUI Q Y,GUI S F.An empirical study on optimal solutions selection strategies for effort-aware just-in-time software defect prediction[C]//Proceedings of the 31st Interna-tional Conference on Software Engineering and Knowledge Engineering.2019:319-324.
[34]JACEK S,THOMAS Z,ANDREAS Z.When do changes induce fixes[C]//Proceedings of the International Workshop on Mi-ning Software Repositories.2005:1-5.
[35]YI B Y,YU M Z,JIN P L,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Enginee-ring.2016:157-168.
[36]ROMANO J,KROMREY J,CORAGGIO J,et al.Exploringmethods for evaluating group differences on the NSSE and other surveys:Are the t-test and cohens d indices the most appropriate choices[C]//Annual Meeting of the Southern Association for Institutional Research.2006:1-51.
[1] ZHOU Yu, YANG Junling, DANG Kelin. Change Detection in SAR Images Based on Evolutionary Multi-objective Clustering [J]. Computer Science, 2024, 51(9): 140-146.
[2] HAN Lijun, WANG Peng, LI Ruixu, LIU Zhongyao. Dual Direction Vectors-based Large-scale Multi-objective Evolutionary Algorithm [J]. Computer Science, 2024, 51(6A): 230700155-11.
[3] LI Jinxia, BIAN Huaxing, WEN Fuguo, HU Tianmu, QIN Shihan, WU Han, MA Hui. Performance Risk Prediction of Power Grid Material Suppliers Based on XGBoost [J]. Computer Science, 2024, 51(6A): 230400115-9.
[4] XIE Genlin, CHENG Guozhen, LIANG Hao, WANG Qingfeng. Software Diversity Composition Based on Multi-objective Optimization Algorithm NSGA-II [J]. Computer Science, 2024, 51(6): 85-94.
[5] ZHU Wei, YANG Shibo, TENG Fan, HE Defeng. Study on Unmanned Vehicle Trajectory Planning in Unstructured Scenarios [J]. Computer Science, 2024, 51(4): 334-343.
[6] WANG Zhihong, WANG Gaocai, ZHAO Qifei. Multi-objective Optimization of D2D Collaborative MEC Based on Improved NSGA-III [J]. Computer Science, 2024, 51(3): 280-288.
[7] QIU Mingxin, LEI Shuai, LIU Xianhui, ZHANG Yingyao. Online and Offline Multi-source Heterogeneous Data Fusion System for Recycling Information [J]. Computer Science, 2024, 51(11A): 240100095-7.
[8] LI Wenwang, ZHOU Haohao, DENG Su, MA Wubin, WU Yahui. Joint Optimization of Delay and Energy Consumption of Tasks Offloading for Vehicular EdgeComputing [J]. Computer Science, 2024, 51(11A): 231000080-7.
[9] JIANG Yibo, ZHOU Zebao, LI Qiang, ZHOU Ke. Optimization of Low-carbon Oriented Logistics Center Distribution Based on Genetic Algorithm [J]. Computer Science, 2024, 51(11A): 231200035-6.
[10] LI Sanyi, LIU Shuang. Dynamic Multi-Objective Optimization Algorithm with Irregularly Varying Number of Objectives [J]. Computer Science, 2024, 51(11A): 231000079-11.
[11] QIN Zhongpiao, ZHOU Yatong, LI Zhe. Bank Transaction Fraud Detection Method Based on Graph Neural Network [J]. Computer Science, 2024, 51(11A): 240200024-8.
[12] GENG Huantong, SONG Feifei, ZHOU Zhengli, XU Xiaohan. Improved NSGA-III Based on Kriging Model for Expensive Many-objective Optimization Problems [J]. Computer Science, 2023, 50(7): 194-206.
[13] ZHONG Jialin, WU Yahui, DENG Su, ZHOU Haohao, MA Wubin. Multi-objective Federated Learning Evolutionary Algorithm Based on Improved NSGA-III [J]. Computer Science, 2023, 50(4): 333-342.
[14] LI Jinliang, LIN Bing, CHEN Xing. Reliability Constraint-oriented Workflow Scheduling Strategy in Cloud Environment [J]. Computer Science, 2023, 50(10): 291-298.
[15] SUN Gang, WU Jiang-jiang, CHEN Hao, LI Jun, XU Shi-yuan. Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance [J]. Computer Science, 2022, 49(6): 297-304.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!