计算机科学 ›› 2017, Vol. 44 ›› Issue (8): 176-180.doi: 10.11896/j.issn.1002-137X.2017.08.031

• 软件与数据库技术 • 上一篇    下一篇

基于Boosting的代价敏感软件缺陷预测方法

杨杰,燕雪峰,张德平   

  1. 南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受十三五重点基础科研项目(JCKY2016206B001),江苏省六大人才高峰项目(XXRJ-004),软件新技术与产业化协同创新中心资助

Cost-sensitive Software Defect Prediction Method Based on Boosting

YANG Jie, YAN Xue-feng and ZHANG De-ping   

  • Online:2018-11-13 Published:2018-11-13

摘要: Boosting重抽样是常用的扩充小样本数据集的方法,首先针对抽样过程中存在的维数灾难现象,提出随机属性子集选择方法以进行降维处理;进而针对软件缺陷预测对于漏报与误报的惩罚因子不同的特点,在属性选择过程中添加代价敏感算法。以多个基本k-NN预测器为弱学习器,以代价最小为属性删除原则,得到当前抽样集的k值与属性子集的预测器集合,采用代价敏感的权重更新机制对抽样过程中的不同数据实例赋予相应权值,由所有预测器集合构成自适应的集成k-NN强学习器并建立软件缺陷预测模型。基于NASA数据集的实验结果表明,在小样本情况下,基于Boosting的代价敏感软件缺陷预测方法预测的漏报率有较大程度降低,误报率有一定程度增加,整体性能优于原来的Boosting集成预测方法。

关键词: 软件缺陷预测,Boosting,代价敏感,随机属性选择,集成k-NN

Abstract: Boosting resampling is a common method to expand data sets for small samples.Firstly,aiming at dimension disaster phenomenon during resampling process,a randomly feature selection method is used to reduce the dimensions.In addition,considering the characteristic that software defect prediction’s penalties for missing of true positives and the wrongly reported of negatives are different,cost-sensitive algorithm is added in feature selection process.On the basis of multi-normal k-NN weak learning,taking minimum costs as the principle,preditor which consists of k value and attri-butes subset of the current sampling set is get,cost-sensitive theory is imported to update weight vector during Boosting resampling process,and different instances are given corresponding weights.An adaptive ensemble k-NN learning is constructed using all the predictors,and a software defect prediction model is established.The results using NASA’s data sets show that under the condition of small samples,with this model,missing of true positive rate reduces largely and the wrongly reported of negative rate increases to some extent.On the whole,compared with the origen boosting-based learning,the method of cost-sensitive software defect prediction based on boosting greatly improves the prediction effect.

Key words: Software defect prediction,Boosting,Cost-sensitive,Randomly feature selection,Ensemble k-NN

[1] LIU H,HAO K G.Cause Analysis Method of Software Defect[J].Computer Science,2009,36(1):242-243.(in Chinese) 刘海,郝克刚.软件缺陷原因分析方法[J].计算机科学,2009,36(1):242-243.
[2] WANG Q,WU S J,LI M S.Software Prediction[J].Journal of Software,2008,19(7):1565-1580.(in Chinese) 王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580.
[3] QIAO H.Research on Software Defect Prediction Techniques[D].Zhengzhou:The PLA Information Engineering University,2013.(in Chinese) 乔辉.软件缺陷预测技术研究[D].郑州:解放军信息工程大学,2013.
[4] JAMBET C,MOULY C.The Indifferent Naive Bayes Classifier[C]∥Sixteenth International Florida Artificial Intelligence Research Society Conference,2003.St.Augustine,Florida,USA,2003:341-345.
[5] VAPNIK V,GOLOWICH S E,SMOLA A.Support Vector Me-thod for Function Approximation,Regression Estimation,and Signal Processing[J].Advances in Neural Information Proces-sing Systems,1970,9:281-287.
[6] ZHUANG F Z,LUO P,HE Q,et al.Survey on Transfer Lear-ning Research[J].Journal of Software,2015,26(1):26-39.(in Chinese) 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.
[7] TAHERI S,MAMMADOV M,BAGIROV A M.Improving Nai-ve Bayes classifier using conditional probabilities[C]∥Australa-sian Data Mining Conference.2011:63-68.
[8] LI H L,WANG C H,YUAN B Z.An Improved SVM:NN-SVM[J].Chinese Journal of Computers,2003,26(8):1015-1020.(in Chinese) 李红莲,王春花,袁保宗.一种改进的支持向量机NN-SVM[J].计算机学报,2003,26(8):1015-1020.
[9] CAUWENBERGHS G,POGGIO T.Incremental and Decremental Support Vector Machine Learning[M]∥Advances in Neural Information Processing Systems 13.2010:409-415.
[10] WU F J.Understanding Knowledge Sharing Activities in Soft-ware Fault-prone Prediction:a Transfer Learning Study[J].Journal of Chinese Computer Systems,2014,35(11):2416-2421.(in Chinese) 吴方君.软件缺陷预测经验共享:一种迁移学习方法[J].小型微型计算机系统,2014,35(11):2416-2421.
[11] ZHANG Q,LI M,WANG X S,et al.Instance-based TransferLearning for Multi-source Domains[J].Acta Automatica Sinica,2014,40(6):1176-1183.(in Chinese) 张倩,李明,王雪松,等.一种面向多源领域的实例迁移学习[J].自动化学报,2014,40(6):1176-1183.
[12] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357.
[13] RICHELLI A,COMENSOLI S,KOVACS-VAJNA Z M.A DC/DC Boosting Technique and Power Management for Ultralow-Voltage Energy Harvesting Applications[J].IEEE Transactions on Industrial Electronics,2012,59(6):2701-2708.
[14] ZHENG J.Cost-sensitive boosting neural networks for software defect prediction[J].Expert Systems with Applications,2010,37(6):4537-4543.
[15] LI Y,HUANG Z Q,FANG B W,et al.Using Cost-Sensitive Classification for Software Defects Prediction[J].Journal of Frontiers of Computer Science and Technology,2014,8(12):1442-1451.(in Chinese) 李勇,黄志球,房丙午,等.代价敏感分类的软件缺陷预测方法[J].计算机科学与探索,2014,8(12):1442-1451.
[16] MIAO L S.Software Defect Prediction Based on Cost-Sensitive Neural Networks[J].Electronic Science and Technology,2012,25(6):75-78.(in Chinese) 缪林松.基于代价敏感神经网络算法的软件缺陷预测[J].电子科技,2012,25(6):75-78.
[17] HE L,SONG Q B,SHEN J Y.Boosting-Based k-NN Learning for Software Defect Prediction[J].Pattern Recognition and Artificial Intelligence,2012,25(5):792-802.(in Chinese) 何亮,宋擒豹,沈钧毅.基于Boosting的集成k-NN软件缺陷预测方法[J].模式识别与人工智能,2012,25(5):792-802.
[18] CHEN X,GU Q,LIU W S,et al.Survey of Static Software Defect Prediction[J].Journal of Software,2016,27(1):1-25.(in Chinese) 陈翔,顾庆,刘望舒,等.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!