计算机科学 ›› 2018, Vol. 45 ›› Issue (8): 160-165.doi: 10.11896/j.issn.1002-137X.2018.08.029

• 软件与数据库技术 • 上一篇    下一篇

基于改进深度森林算法的软件缺陷预测

薛参观, 燕雪峰   

  1. 南京航空航天大学计算机科学与技术学院 南京210016
  • 收稿日期:2017-07-11 出版日期:2018-08-29 发布日期:2018-08-29
  • 作者简介:薛参观(1986-),男,硕士生,主要研究方向为系统建模与仿真,E-mail:277815319@qq.com; 燕雪峰(1975-),男,博士,教授,主要研究方向为软件工程方法论、系统建模与仿真等,E-mail:yxf@nuaa.edu.cn(通信作者)。
  • 基金资助:
    本文受十三五重点基础科研项目(JCKY2016206B001),十三五装备预研项目(41401010201),江苏省软件新技术与产业化协同创新中心资助。

Software Defect Prediction Based on Improved Deep Forest Algorithm

XUE Can-guan, YAN Xue-feng   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
  • Received:2017-07-11 Online:2018-08-29 Published:2018-08-29

摘要: 软件缺陷预测是合理利用软件测试资源、提高软件性能的重要途径。为处理软件缺陷预测模型中浅层机器学习算法无法对软件数据特征进行深度挖掘的问题,提出一种改进深度森林算法——深度堆叠森林(DSF)。该算法首先采用随机抽样的方式对软件的原始特征进行变换以增强其特征表达能力,然后用堆叠结构对变换特征做逐层表征学习。将深度堆叠森林应用于Eclipse数据集的缺陷预测中,实验结果表明,该算法在预测性能和时间效率上均比深度森林有明显的提升。

关键词: 堆叠结构, 软件缺陷预测, 深度堆叠森林, 深度森林, 随机抽样

Abstract: Software defect prediction is an important way to rationally use software testing resources and improve software performance.In order to solve the problem that the shallow machine learning algorithm cannot deeply mine the characteristics of software data,an improved deep forest algorithm named deep stacking forest (DSF) was proposed.This algorithm firstly adopts the random sampling method to transform the original features to enhance its feature expression ability,and then uses the stacking structure to performlayer-by-layer representation learning for the transform features.The deep stacking forest was applied for the defect prediction of Eclipse dataset.The experimental results show that the algorithm has significant improvement in the predicting performance and time efficiency than the deep forest.

Key words: Deep forest, Deep stacking forest, Random sampling, Software defect prediction, Stacking structure

中图分类号: 

  • TP311
[1]WANG Q,WU S J,LI M S.Software defect prediction[J].Journal of Software,2008,19(7):1565-1580.(in Chinese)王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580.
[2]CHEN X,GU Q,LIU W S,et al.Survey of static software defect prediction[J].Journal ofSoftware,2016,27(1):1-25.(in Chinese)陈翔,顾庆,刘望舒,等.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25.
[3]JINDAL R,MALHOTRA R,JAIN A.Software defect prediction using neural networks[C]∥International Conference on Reliability,INFOCOM Technologies and Optimization.IEEE,2015:1-6.
[4]OKUTAN A,YILDIZ O T.Software defect prediction usingBayesian networks[J].Empirical Software Engineering,2014,19(1):154-181.
[5]KALAI M R,GRACIA JACOB S.Improved Random ForestAlgorithm for Software Defect Prediction through Data Mining Techniques[J].International Journal of Computer Applications,2015,117(23):18-22.
[6]WANG T,ZHANG Z,JING X,et al.Multiple kernel ensemble learning for software defect prediction[J].Automated Software Engineering,2016,23(4):1-22.
[7]THANGAVEL M,NASIRA G M.Support Vector Machine for Software Defect Prediction[J].International Journal of Applied Engineering Research,2014,9(24):25633-25644.
[8]SUN Z J,XUE L,XU Y M,et al.Overview of deep learning[J].Application Research of Computers,2012,29(8):2806-2810.(in Chinese)孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.
[9]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504.
[10]DENGL,DONG Y.Deep Learning:Methods and Applications[M].Now Publishers Inc.,2014.
[11]ZHOU Z H,FENG J.Deep Forest:Towards An Alternative to Deep Neural Networks[C]∥IJCAI-17.2017:3553-3559.
[12]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[13]LIU F T,TING K M,YU Y,et al.Spectrum of variable-random trees[J].Journal of Artificial Intelligence Research,2008,32(1):355-384.
[14]WOLPERT D H.Stacked Generalization[J].Neural Networks,1992,5(2):241-259.
[15]ZIMMERMANN T,PREMRAJ R,ZELLER A.Predicting Defects for Eclipse[C]∥International Workshop on Predictor MODELS in Software Engineering.IEEE,2007:9.
[1] 郑小萌, 高猛, 滕俊元.
航天器软件缺陷预测数据集构建方法研究
Research on Construction Method of Defect Prediction Dataset for Spacecraft Software
计算机科学, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133
[2] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[3] 王萧萧, 王亭雯, 马玉玲, 范佳奕, 崔超然.
基于深度森林的P2P网贷借款人信用风险评估方法
Credit Risk Assessment Method of P2P Online Loan Borrowers Based on Deep Forest
计算机科学, 2021, 48(11A): 429-434. https://doi.org/10.11896/jsjkx.201000013
[4] 蒋鹏飞, 魏松杰.
基于深度森林与CWGAN-GP的移动应用网络行为分类与评估
Classification and Evaluation of Mobile Application Network Behavior Based on Deep Forest and CWGAN-GP
计算机科学, 2020, 47(1): 287-292. https://doi.org/10.11896/jsjkx.181102118
[5] 葛绍林, 叶剑, 何明祥.
基于深度森林的用户购买行为预测模型
Prediction Model of User Purchase Behavior Based on Deep Forest
计算机科学, 2019, 46(9): 190-194. https://doi.org/10.11896/j.issn.1002-137X.2019.09.027
[6] 韩慧,王黎明,柴玉梅,刘箴.
基于强化表征学习深度森林的文本情感分类
Text Sentiment Classification Based on Deep Forests with Enhanced Features
计算机科学, 2019, 46(7): 172-179. https://doi.org/10.11896/j.issn.1002-137X.2019.07.027
[7] 邱少健, 蔡子仪, 陆璐.
基于卷积神经网络的代价敏感软件缺陷预测模型
Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction
计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C
[8] 胡梦园, 黄鸿云, 丁佐华.
用于软件缺陷预测的集成模型
Ensemble Model for Software Defect Prediction
计算机科学, 2019, 46(11): 176-180. https://doi.org/10.11896/jsjkx.180901685
[9] 李广敬, 鲍泓, 徐成.
一种基于3D激光雷达的实时道路边缘提取算法
Real-time Road Edge Extraction Algorithm Based on 3D-Lidar
计算机科学, 2018, 45(9): 294-298. https://doi.org/10.11896/j.issn.1002-137X.2018.09.049
[10] 陈翔, 王秋萍.
基于代码修改的多目标有监督缺陷预测建模方法
Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes
计算机科学, 2018, 45(6): 161-165. https://doi.org/10.11896/j.issn.1002-137X.2018.06.028
[11] 刘川熙,赵汝进,刘恩海,洪裕珍.
基于RANSAC的SIFT匹配阈值自适应估计
Estimate Threshold of SIFT Matching Adaptively Based on RANSAC
计算机科学, 2017, 44(Z6): 157-160. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.036
[12] 郑建彬,白雅贤,詹恩奇,汪阳.
基于改进SIFT匹配方法的货架乳制品识别
Improved SIFT Matching Method for Milk Beverage Recognition in Grocery
计算机科学, 2017, 44(9): 315-319. https://doi.org/10.11896/j.issn.1002-137X.2017.09.059
[13] 杨杰,燕雪峰,张德平.
基于Boosting的代价敏感软件缺陷预测方法
Cost-sensitive Software Defect Prediction Method Based on Boosting
计算机科学, 2017, 44(8): 176-180. https://doi.org/10.11896/j.issn.1002-137X.2017.08.031
[14] 甘露,臧洌,李航.
深度信念网软件缺陷预测模型
Deep Belief Network Software Defect Prediction Model
计算机科学, 2017, 44(4): 229-233. https://doi.org/10.11896/j.issn.1002-137X.2017.04.049
[15] 王铁建,吴飞,荆晓远.
基于多核字典学习的软件缺陷预测
Multiple Kernel Dictionary Learning for Software Defect Prediction
计算机科学, 2017, 44(12): 131-134. https://doi.org/10.11896/j.issn.1002-137X.2017.12.026
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!