Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 210800160-7.doi: 10.11896/jsjkx.210800160

• Software Engineering • Previous Articles     Next Articles

Multi-source Cross-project Defect Prediction with Data Selection

DENG Jian-hua, WANG Wei   

  1. School of Software,Yunnan University,Kunming 650091,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:DENG Jian-hua,born in 1997,master candidate,is a member of China Computer Federation.His main research interest is software defect prediction in software engineering.
    WANG Wei,born in 1979,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include software engineering,machine learning and formal methods.
  • Supported by:
    Young and Middle-aged Academic and Technical Leader Candidate Project of Yunnan Province(2019HB104).

Abstract: Multi-sources cross project defect prediction(MCPDP) aims to use multiple historical data from other projects(source projects) to predict the likelihood of defects in software modules in the target project.The research solves the cold start problem of defect prediction modeling and provides a solution for establishing defect prediction model for new software or software system lacking historical data.Source data selection is considered to be an effective way to further improve the accuracy of cross-project defect prediction.Therefore,a multi-source cross-project defect prediction method for data selection is studied in this paper.The method includes two steps:1) feature alignment of source data;2) improve the maximum mean measure to realize source data screening.In order to verify the effectiveness of the proposed method,experiments are carried out on four public data sets,namely AEEEM,Relink,NASA and SOFTLAB.The results show that the proposed method improves the F-measure index by 4% and 5% respectively compared with the baseline method,which proves that the proposed method has good performance.

Key words: Multi-source domain, Across projects, Defect prediction, Data selection, Feature alignment

CLC Number: 

  • TP311
[1]TIAN J.Software Quality Engineering:Testing,Quality Assu-rance,and Quantifiable Improvement[M].Wiley-Interscience,2005.
[2]CATAL C,DIRI B.Investigating the effect of dataset size,metrics sets,and feature selection techniques on software fault prediction problem[J].Information Sciences,2009,179(8):1040-1058.
[3]MENZIES T,TURHAN B,BENER A,et al.Implications ofceiling effects in defect predictors[C]//Proceedings of the 4th International Workshop on Predictor Models in Software Engineering.2008:47-54.
[4]CANFORA G,LUCIA A D,PENTA M D,et al.Defect prediction as a multiobjective optimization problem[J].Software Testing,Verification and Reliability,2015,25(4):426-459.
[5]MA Y,LUO G,ZENG X,et al.Transfer learning for cross-company software defect prediction[J].Information and Software Technology,2012,54(3):248-256.
[6]NAM J,PAN S J,KIM S.Transfer defect learning[C]//2013 35th International Conference on Software Engineering(ICSE).IEEE,2013:382-391.
[7]MARTINEZ-FERNANDEZ S,JOVANOVIC P,FRANCH X,et al.Towards automated data integration in software analytics[C]//Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics.2018:1-5.
[8]KAMEI Y,FUKUSHIMA T,MCINTOSH S,et al.Studyingjust-in-time defect prediction using cross-project models[J].Empirical Software Engineering,2016,21(5):2072-2106.
[9]HALL T,BEECHAM S,BOWES D,et al.A systematic literature review on fault prediction performance in software engineering[J].IEEE Transactions on Software Engineering,2011,38(6):1276-1304.
[10]LIN D,AN X,ZHANG J.Double-bootstrapping source data selection for instance-based transfer learning[J].Pattern Recognition Letters,2013,34(11):1279-1285.
[11]HERBOLD S.Training data selection for cross-project defectprediction[C]//Proceedings of the 9th International Conference on Predictive Models in Software Engineering.2013:1-10.
[12]TURHAN B,MENZIES T,BENER A B,et al.On the relative value of cross-company and within-company data for defect prediction[J].Empirical Software Engineering,2009,14(5):540-578.
[13]PETERS F,MENZIES T,MARCUS A.Better cross companydefect prediction[C]//2013 10th Working Conference on Mi-ning Software Repositories(MSR).IEEE,2013:409-418.
[14]HE Z,SHU F,YANG Y,et al.An investigation on the feasibility of cross-project defect prediction[J].Automated Software Engineering,2012,19(2):167-199.
[15]HE P,LI B,ZHANG D,et al.Simplification of training data for cross-project defect prediction[J].arXiv:1405.0773,2014.
[16]LI Y,HUANG Z,WANG Y,et al.Evaluating data filter on cross-project defect prediction:Comparison and improvements[J].IEEE Access,2017,5:25646-25656.
[17]LIU C,YANG D,XIA X,et al.A two-phase transfer learning model for cross-project defect prediction[J].Information and Software Technology,2019,107:125-136.
[18]GRETTON A,BORGWARDT K M,RASCH M J,et al.A kernel two-sample test[J].The Journal of Machine Learning Research,2012,13(1):723-773.
[19]SMOLA A,GRETTON A,SONG L,et al.A Hilbert space embedding for distributions[C]//International Conference on Algorithmic Learning Theory.Berlin:Springer,2007:13-31.
[20]JING X,WU F,DONG X,et al.Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning[C]//Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering.2015:496-507.
[21]YIN X,LIU L,LIU H,et al.Heterogeneous cross-project defect prediction with multiple source projects based on transferlear-ning[J].Mathematical Biosciences and Engineering,2020,17(2):1020-1040.
[22]D’AMBORS M,LANZA M,ROBBES R.An extensive comparison of bug prediction approaches[C]//2010 7th IEEE Working Conference on Mining Software Repositories(MSR 2010).IEEE,2010:31-41.
[23]WU R,ZHANG H,KIM S,et al.Relink:recovering links between bugs and changes[C]//Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.2011:15-25.
[24]MENZIES T,GREENWALD J,FRANK A.Data mining static code attributes to learn defect predictors[J].IEEE Transactions on Software Engineering,2006,33(1):2-13.
[25]D’AMBORS M,LANZA M,ROBBES R.Evaluating defect prediction approaches:a benchmark and an extensive comparison[J].Empirical Software Engineering,2012,17(4):531-577.
[26]PETERS F,MENZIES T.Privacy and utility for defect prediction:Experiments with morph[C]//2012 34th International Conference on Software Engineering(ICSE).IEEE,2012:189-199.
[27]HE Z,SHU F,YANG Y,et al.An investigation on the feasibility of cross-project defect prediction[J].Automated Software Engineering,2012,19(2):167-199.
[1] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[2] XIAO Lei, CHEN Rong-shang, MIAO Huai-kou, HONG Yu. Test Case Prioritization Combining Clustering Approach and Fault Prediction [J]. Computer Science, 2021, 48(5): 99-108.
[3] TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song. Noise Tolerable Feature Selection Method for Software Defect Prediction [J]. Computer Science, 2021, 48(12): 131-139.
[4] ZHOU Yu, REN Qin-chai, NIU Hui-bin. Research on Training Sample Data Selection Methods [J]. Computer Science, 2020, 47(11A): 402-408.
[5] YUAN Ding, WANG Qian, DENG Li-wei. Clustering Assist Feature Alignment for Unsupervised Domain Adaptation [J]. Computer Science, 2019, 46(3): 221-226.
[6] QIU Shao-jian, CAIZi-yi, LU Lu. Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction [J]. Computer Science, 2019, 46(11): 156-160.
[7] HU Meng-yuan, HUANG Hong-yun, DING Zuo-hua. Ensemble Model for Software Defect Prediction [J]. Computer Science, 2019, 46(11): 176-180.
[8] ZHANG Ai-ying. Research on Low-resource Mongolian Speech Recognition Based on Multilingual Speech Data Selection [J]. Computer Science, 2018, 45(9): 308-313.
[9] XUE Can-guan, YAN Xue-feng. Software Defect Prediction Based on Improved Deep Forest Algorithm [J]. Computer Science, 2018, 45(8): 160-165.
[10] CHEN Xiang, WANG Qiu-ping. Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes [J]. Computer Science, 2018, 45(6): 161-165.
[11] YANG Jie, YAN Xue-feng and ZHANG De-ping. Cost-sensitive Software Defect Prediction Method Based on Boosting [J]. Computer Science, 2017, 44(8): 176-180.
[12] CHEN Heng, LIU Wen-guang, GAO Dong-jing, PENG Xin and ZHAO Wen-yun. Personalized Defect Prediction for Individual Source Files [J]. Computer Science, 2017, 44(4): 90-95.
[13] GAN Lu, ZANG Lie and LI Hang. Deep Belief Network Software Defect Prediction Model [J]. Computer Science, 2017, 44(4): 229-233.
[14] XIONG Jing, GAO Yan and WANG Ya-yu. Software Defect Prediction Model Based on Adaboost Algorithm [J]. Computer Science, 2016, 43(7): 186-190.
[15] WANG Bin,WU Tai-wen and HU Pei-pei. Research on Software Defect Classification and Analysis [J]. Computer Science, 2013, 40(9): 16-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!