Computer Science ›› 2019, Vol. 46 ›› Issue (11): 176-180.doi: 10.11896/jsjkx.180901685

• Software & Database Technology • Previous Articles     Next Articles

Ensemble Model for Software Defect Prediction

HU Meng-yuan1, HUANG Hong-yun2, DING Zuo-hua3   

  1. (School of Science,Zhejiang Sci-Tech University,Hangzhou 310018,China)1
    (Center of Multimedia Big Data of Library,Zhejiang Sci-Tech University,Hangzhou 310018,China)2
    (School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)3
  • Received:2018-09-09 Online:2019-11-15 Published:2019-11-14

Abstract: Software defect prediction aims to identify defective modules effectively.Traditional classifiers have good predictive effect on class-balanced data,but when the proportion of data classes is unbalanced,the traditional classifiers incline to majority classes,easily leading to the misclassification of minorityclass module.In reality,the data in software defect prediction are often unbalanced.In order to deal with this kind of class imbalance problem in software defects,this paper proposed an integrated model based on improved class weight self-adaptation,soft voting and threshold mo-ving.This model considers the class imbalance problem in the training stage and decision stage without changing the original data sets.Firstly,in class weight learning stage,the optimal weights of different classes are obtained through class weight adaptive learning.Then,in the training stage,three base classifiers are trained by using the optimal weights obtained in the previous step,and the three base classifiers are combined by soft ensemble method.Finally,in the decision stage,the decision is made according to the threshold moving model to get the final prediction category.In order to prove the validity of the proposed method,the NASA software defect standard data sets and the Eclipse software defect standard data sets are used for prediction,and the proposed method is compared with the results of several software defect prediction methods proposed in recent years on the recall rate Pd,false positive rate Pf and F1 measurement F-measure.The experimental results show that the recall rate Pd and F1 measurement F-measure of the proposed method improves by 0.09 and 0.06 on average respectively.Therefore,the overall performance of proposed method for dealing with class imbalance in software defect prediction is superior to other software defect prediction methods,and it has better prediction effect.

Key words: Class weighted self-adaptation, Ensemble learning, Soft ensemble, Soft voting, Software defect prediction, Threshold-moving

CLC Number: 

  • TP311
[1]BISHNU P S,BHATTACHERJEE V.Software fault prediction using quad tree-based k-means clustering algorithm[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(6):1146-1150.
[2]HALL T,BEECHAM S,BOWES D,et al.A Systematic Literature Review on Fault Prediction Performance in Software Engineering[J].IEEE Transactions on Software Engineering,2012,38(6):1276-1304.
[3]WANG J,SHEN B,CHEN Y.Compressed C4.5 Models forSoftware Defect Prediction [C]∥International Conference on Quality Software.Xi An China.IEEE,2012:13-16.
[4]XING F,GUO P.Support vector regression for software reliability growth modeling and prediction[C]∥International Conference on Advances in Neural Networks.Chongqing China.Springer-Verlag,2005:925-930.
[5]ZHENG J.Cost-sensitive boosting neural networks for software defect prediction[J].Expert Systems with Applications,2010,37(6):4537-4543.
[6]GAO K,KHOSHGOFTAAR T M,NAPOLITANO A.A Hybrid Approach to Coping with High Dimensionality and Class Imbalance for Software Defect Prediction[C]∥International Conference on Machine Learning and Applications.Atlanta,GA,USA,IEEE,2013:281-288.
[7]WANG S,YAO X.Using Class Imbalance Learning for Software Defect Prediction[J].IEEE Transactions on Reliability,2013,62(2):434-443.
[8]YU Q,JIANG S J,ZHANG Y M,et al.The Impact Study of Class Imbalance on the Performance of Software Defect Prediction Models[J].Chinese Journal of Computer,2018,41(4):809-822.(in Chinese)
于巧,姜淑娟,张艳梅,等.分类不平衡对软件缺陷预测模型性能的影响研究[J].计算机学报,2018,41(4):809-822.
[9]MARUF ÖZTURK M,ZENGIN A.HSDD:A hybrid sampling strategy for class imbalance in defect prediction data sets[C]∥Eleventh International Conference on Digital Information Ma-nagement.Fukuoka,Japan.IEEE,2017:60-69.
[10]ZHOU Z H,LIU X Y.Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem[J].IEEE Transactions on Knowledge & Data Engineering,2006,18(1):63-77.
[11]WANG S,CHEN H,YAO X.Negative correlation learning for classification ensembles[C]∥International Joint Conference on Neural Networks.San Jose,California:IEEE,2011:1-8.
[12]MIAO L,LIU M,ZHANG D.Cost-sensitive feature selectionwith application in software defect prediction[C]∥2012 21st International Conference on Pattern Recognition (ICPR).Portland,Oregon:IEEE,2012:967-970.
[13]GALA R,FERNANDE Z,BARRENECHE A,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems Man & Cybernetics Part C Applications & Reviews,2012,42(4):463-484.
[14]ELISH K O,ELISH M O.Predicting defect-prone softwaremodules using support vector machines[J].Journal of Systems &Software,2008,81(5):649-660.
[15]JIANG Y,LI M,ZHOU Z H.Software Defect Detection with Rocus[J].Journal of Computer Science & Technology,2011,26(2):328-342.
[16]ZHANG Z W,JING X Y,WANG T J.Label propagation based semi-supervised learning for software defect prediction[J].Automated Software Engineering,2016,24(1):1-23.
[17]JING X Y,YING S,ZHANG Z W,et al.Dictionary learningbased software defect prediction[C]∥Proceedings of the 36th International Conference on Software Engineering.ACM,2014:414-423.
[18]LU Q,JU C.Research on Credit Card Fraud Detection Model Based on Class Weighted Support Vector Machine[J].Journal of Convergence Information Technology,2011,6(1):62-68.
[19]MÖHLE S,BRÜNDL M,BEIERLE C.Modeling a System for Decision Support in Snow Avalanche Warning Using Balanced Random Forest and Weighted Random Forest[C]∥Internatio-nal Conference on Artificial Intelligence:Methodology,Systems,and Applications.Varna,Bulgaria,Springer/LNAI,2014:80-91.
[20]ZHANG Y,ZHANG H,CAI J,et al.A Weighted Voting Classifier Based on Differential Evolution[J].Abstract and Applied Analysis,2014,2014(2):1-6.
[21]ZHOU Z H.Ensemble Methods:Foundations and Algorithms[M].London:Taylor & Francis,2012.
[1] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[2] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[3] WANG Yu-fei, CHEN Wen. Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment [J]. Computer Science, 2022, 49(6): 127-133.
[4] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[5] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[6] CHEN Wei, LI Hang, LI Wei-hua. Ensemble Learning Method for Nucleosome Localization Prediction [J]. Computer Science, 2022, 49(2): 285-291.
[7] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[8] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[9] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[10] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[11] DAI Zong-ming, HU Kai, XIE Jie, GUO Ya. Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets [J]. Computer Science, 2021, 48(6A): 270-274.
[12] TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song. Noise Tolerable Feature Selection Method for Software Defect Prediction [J]. Computer Science, 2021, 48(12): 131-139.
[13] HUAN Wen-ming, LIN Hai-tao. Design of Intrusion Detection System Based on Sampling Ensemble Algorithm [J]. Computer Science, 2021, 48(11A): 705-712.
[14] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[15] CAO Ya-xi, HUANG Hai-yan. Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning [J]. Computer Science, 2019, 46(5): 203-208.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!