Computer Science ›› 2017, Vol. 44 ›› Issue (8): 176-180, 206.doi: 10.11896/j.issn.1002-137X.2017.08.031

Cost-sensitive Software Defect Prediction Method Based on Boosting

YANG Jie, YAN Xue-feng and ZHANG De-ping   

  Online:2018-11-13 Published:2018-11-13

Abstract: Boosting resampling is a common method to expand data sets for small samples.Firstly,aiming at dimension disaster phenomenon during resampling process,a randomly feature selection method is used to reduce the dimensions.In addition,considering the characteristic that software defect prediction’s penalties for missing of true positives and the wrongly reported of negatives are different,cost-sensitive algorithm is added in feature selection process.On the basis of multi-normal k-NN weak learning,taking minimum costs as the principle,preditor which consists of k value and attri-butes subset of the current sampling set is get,cost-sensitive theory is imported to update weight vector during Boosting resampling process,and different instances are given corresponding weights.An adaptive ensemble k-NN learning is constructed using all the predictors,and a software defect prediction model is established.The results using NASA’s data sets show that under the condition of small samples,with this model,missing of true positive rate reduces largely and the wrongly reported of negative rate increases to some extent.On the whole,compared with the origen boosting-based learning,the method of cost-sensitive software defect prediction based on boosting greatly improves the prediction effect.

Key words: Software defect prediction,Boosting,Cost-sensitive,Randomly feature selection,Ensemble k-NN

