Computer Science ›› 2017, Vol. 44 ›› Issue (8): 176-180, 206.doi: 10.11896/j.issn.1002-137X.2017.08.031

Previous Articles     Next Articles

Cost-sensitive Software Defect Prediction Method Based on Boosting

YANG Jie, YAN Xue-feng and ZHANG De-ping   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Boosting resampling is a common method to expand data sets for small samples.Firstly,aiming at dimension disaster phenomenon during resampling process,a randomly feature selection method is used to reduce the dimensions.In addition,considering the characteristic that software defect prediction’s penalties for missing of true positives and the wrongly reported of negatives are different,cost-sensitive algorithm is added in feature selection process.On the basis of multi-normal k-NN weak learning,taking minimum costs as the principle,preditor which consists of k value and attri-butes subset of the current sampling set is get,cost-sensitive theory is imported to update weight vector during Boosting resampling process,and different instances are given corresponding weights.An adaptive ensemble k-NN learning is constructed using all the predictors,and a software defect prediction model is established.The results using NASA’s data sets show that under the condition of small samples,with this model,missing of true positive rate reduces largely and the wrongly reported of negative rate increases to some extent.On the whole,compared with the origen boosting-based learning,the method of cost-sensitive software defect prediction based on boosting greatly improves the prediction effect.

Key words: Software defect prediction,Boosting,Cost-sensitive,Randomly feature selection,Ensemble k-NN

[1] LIU H,HAO K G.Cause Analysis Method of Software Defect[J].Computer Science,2009,36(1):242-243.(in Chinese) 刘海,郝克刚.软件缺陷原因分析方法[J].计算机科学,2009,36(1):242-243.
[2] WANG Q,WU S J,LI M S.Software Prediction[J].Journal of Software,2008,19(7):1565-1580.(in Chinese) 王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580.
[3] QIAO H.Research on Software Defect Prediction Techniques[D].Zhengzhou:The PLA Information Engineering University,2013.(in Chinese) 乔辉.软件缺陷预测技术研究[D].郑州:解放军信息工程大学,2013.
[4] JAMBET C,MOULY C.The Indifferent Naive Bayes Classifier[C]∥Sixteenth International Florida Artificial Intelligence Research Society Conference,2003.St.Augustine,Florida,USA,2003:341-345.
[5] VAPNIK V,GOLOWICH S E,SMOLA A.Support Vector Me-thod for Function Approximation,Regression Estimation,and Signal Processing[J].Advances in Neural Information Proces-sing Systems,1970,9:281-287.
[6] ZHUANG F Z,LUO P,HE Q,et al.Survey on Transfer Lear-ning Research[J].Journal of Software,2015,26(1):26-39.(in Chinese) 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.
[7] TAHERI S,MAMMADOV M,BAGIROV A M.Improving Nai-ve Bayes classifier using conditional probabilities[C]∥Australa-sian Data Mining Conference.2011:63-68.
[8] LI H L,WANG C H,YUAN B Z.An Improved SVM:NN-SVM[J].Chinese Journal of Computers,2003,26(8):1015-1020.(in Chinese) 李红莲,王春花,袁保宗.一种改进的支持向量机NN-SVM[J].计算机学报,2003,26(8):1015-1020.
[9] CAUWENBERGHS G,POGGIO T.Incremental and Decremental Support Vector Machine Learning[M]∥Advances in Neural Information Processing Systems 13.2010:409-415.
[10] WU F J.Understanding Knowledge Sharing Activities in Soft-ware Fault-prone Prediction:a Transfer Learning Study[J].Journal of Chinese Computer Systems,2014,35(11):2416-2421.(in Chinese) 吴方君.软件缺陷预测经验共享:一种迁移学习方法[J].小型微型计算机系统,2014,35(11):2416-2421.
[11] ZHANG Q,LI M,WANG X S,et al.Instance-based TransferLearning for Multi-source Domains[J].Acta Automatica Sinica,2014,40(6):1176-1183.(in Chinese) 张倩,李明,王雪松,等.一种面向多源领域的实例迁移学习[J].自动化学报,2014,40(6):1176-1183.
[12] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357.
[13] RICHELLI A,COMENSOLI S,KOVACS-VAJNA Z M.A DC/DC Boosting Technique and Power Management for Ultralow-Voltage Energy Harvesting Applications[J].IEEE Transactions on Industrial Electronics,2012,59(6):2701-2708.
[14] ZHENG J.Cost-sensitive boosting neural networks for software defect prediction[J].Expert Systems with Applications,2010,37(6):4537-4543.
[15] LI Y,HUANG Z Q,FANG B W,et al.Using Cost-Sensitive Classification for Software Defects Prediction[J].Journal of Frontiers of Computer Science and Technology,2014,8(12):1442-1451.(in Chinese) 李勇,黄志球,房丙午,等.代价敏感分类的软件缺陷预测方法[J].计算机科学与探索,2014,8(12):1442-1451.
[16] MIAO L S.Software Defect Prediction Based on Cost-Sensitive Neural Networks[J].Electronic Science and Technology,2012,25(6):75-78.(in Chinese) 缪林松.基于代价敏感神经网络算法的软件缺陷预测[J].电子科技,2012,25(6):75-78.
[17] HE L,SONG Q B,SHEN J Y.Boosting-Based k-NN Learning for Software Defect Prediction[J].Pattern Recognition and Artificial Intelligence,2012,25(5):792-802.(in Chinese) 何亮,宋擒豹,沈钧毅.基于Boosting的集成k-NN软件缺陷预测方法[J].模式识别与人工智能,2012,25(5):792-802.
[18] CHEN X,GU Q,LIU W S,et al.Survey of Static Software Defect Prediction[J].Journal of Software,2016,27(1):1-25.(in Chinese) 陈翔,顾庆,刘望舒,等.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25.

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .