Computer Science ›› 2023, Vol. 50 ›› Issue (12): 24-31.doi: 10.11896/jsjkx.221100171

• Computer Software • Previous Articles     Next Articles

Aggregation Model for Software Defect Prediction Based on Data Enhancement by GAN

XU Jinpeng1, GUO Xinfeng1, WANG Ruibo2, LI Jihong2   

  1. 1 School of Automation and Software Engineering,Shanxi University,Taiyuan 030006,China
    2 School of Modern Education Technology,Shanxi University,Taiyuan 030006,China
  • Received:2022-11-21 Revised:2023-04-07 Online:2023-12-15 Published:2023-12-07
  • About author:XU Jinpeng,born in 1998,postgra-duate.His main research interests include software defect prediction and deep learning.
    LI Jihong,born in 1964,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include deep lear-ning,natural language processing and software defect prediction.
  • Supported by:
    Young Scientists Fund of the National Natural Science Foundation of China(61806115).

Abstract: In the task of software defect prediction,the machine learning classification algorithm is usually used to build a software defect prediction(SDP) model based on dataset with static softwarefeatures such as C&K metrics.However,the number of defects in most datasets with static software metrics is small,the class imbalance in the dataset is serious,resulting in the low prediction performance of the model.Based on generation adversarial network(GAN),this paper uses FID score screening to ge-nerate positive sample data,enhances the amount of postitive data,and then aggregates the results of learned models by majority-voting,and finally build the SDP model based on block-regularized m×2 Cross validation(m×2BCV).20 datasets in PROMISE database are used as the experimental datasets,and random forest algorithm is used to build model.Experimental results show that,compared with the traditional random over-sampling,SMOTE,and random under-sampling,the average F1 values of the SDP aggregation model in the 20 datasets is increased by 10.2%,5.7%,and 3.4% respectively,and the stability of F1 is also improved accordingly.In 17 of the 20 datasets,the SDP aggregation models have the highest F1 values.From the AUC index,there is no significant difference between the proposed method and the traditional sampling method.

Key words: Generative adversarial network, Data enhancement, Block-regularized m×2 cross validation, Software defect prediction, Aggregation model

CLC Number: 

  • TP311
[1]LI L,REN Z K,SHI K X,et al.Cost Sensitive Boosting Software Defect Prediction Method[J].Computer Engineering,2022,48(3):175-180.
[2]LI Z,JING X Y,ZHU X,et al.Progress on approaches to software defect prediction[J].IET Software,2018,12(3):161-175.
[3]YU Q,JIANG S J.The Impact Study of Class Imbalance on the Performance of Software DefectPrediction Models[J].Chinese Journal of Computers,2018,4:809-824.
[4]SONG Q,GUO Y,SHEPPERD M,et al.A comprehensive investigation of the role of imbalanced learning for software defect prediction[J].IEEE Transactions on Software Engineering,2018,45(12):1253-1269.
[5]MALHOTRA R,KAMAL S.An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data[J].Neurocomputing,2019,343:120-140.
[6]NEZHADSHOKOUHI M M,MAJIDI M A,RASOOLZADE-GAN A,et al.Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance[J].The Journal of Supercomputing,2020,76(1):602-635.
[7]PAK C,WANG T,SU X,et al.An empirical study on software defect prediction using oversampling by smote[J].International Journal of Software Engineering and Knowledge Engineering,2018,28(6):811-830.
[8]GOYAL S.Handling class-imbalance with KNN(neighbour-hood) under-sampling for software defect prediction[J].Artificial Intelligence Review,2022,55(3):2023-2064.
[9]HAN H,WANG W Y,MAO B H,et al.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//International Conference on Intelligent Computing.Berlin:Springer,2005:878-887.
[10]LIU X Y,WU J,ZHOU Z H,et al.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2008,39(2):539-550.
[11]KONNO T,IWAZUME M.Pseudo-feature generation for im-balanced data analysis in deep learning[C]//CoRR.2018.
[12]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[13]LI Z.Imbalanced Data Enhancement Algorithm Based on GAN and Its Application Research[D].Beijing:Beijing Jiaotong University,2019.
[14]WANG R,WANG Y,LI J,et al.Block-regularized m× 2 cross-validated estimator of the generalization error[J].Neural Computation,2017,29(2):519-554.
[15]XUE Y.Confidence in Comparing Two Models with F1 MeasureBased on Block-regularized m×2 Cross Validation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:1-8.
[16]WANG R.Research on Block-regularized Cross-Validation Me-thods for Comparing Supervised Algorithms[D].Taiyuan:Shanxi University,2019.
[17]ALSAEEDI A,KHAN M Z.Software defect prediction usingsupervised machine learning and ensemble techniques:a compa-rative study[J].Journal of Software Engineering and Applications,2019,12(5):85-100.
[18]WANG Y,LI J,LI Y,et al.Confidence Interval for F1 Measure of Algorithm Performance Based on Blocked 3×2 Cross-validation[J].IEEE Transactions on Knowledge and Data Enginee-ring,2014,27(3):651-659.
[19]HOSSEINI S,TURHAN B,GUNARATHNA D,et al.A sys-tematic literature review and meta-analysis on cross project defect prediction[J].IEEE Transactions on Software Engineering,2017,45(2):111-147.
[20]MENG F,CHENG W,WANG J,et al.Semi-supervised software defect prediction model based on tri-training[J].KSII Transactions on Internet and Information Systems(TIIS),2021,15(11):4028-4042.
[21]WANG K,LIU L,YUAN C,et al.Software defect predictionmodel based on LASSO-SVM[J].Neural Computing and Applications,2021,33(14):8249-8259.
[22]MALOHTRA R,YADAV H S.An improved CNN-based architecture forwithin-project software defect prediction[M]//Soft Computing and Signal Processing.Springer,Singapore,2021:335-349.
[23]IBRAHIM D R,GHNEMAT R,HUDAIB A,et al.Software defect prediction using feature selection and random forest algorithm[C]//2017 International Conference on New Trends in Computing Sciences(ICTCS).IEEE,2017:252-257.
[24]TANTITHAMTHAVORN C,HASSAN A E,MATSUMOTOK,et al.The impact of class rebalancing techniques on the performance and interpretation of defect prediction models[J].IEEE Transactions on Software Engineering,2018,46(11):1200-1219.
[25]HU M Y,HUANG H Y,XIANG Z H,et al.EnsembleModel for Software Defect Prediction[J].Computer Science,2019,46(11):176-180.
[26]ALI H,SALLEH M N M,SAEDUDIN R,et al.Imbalance class problems in data mining:a review[J].Indonesian Journal of Electrical Engineering and Computer Science,2019,14(3):1560-1571.
[27]LEEVY J L,KHOSHGOFTAAR T M,BAUDER R A,et al.A survey on addressing high-class imbalance in big data[J].Journal of Big Data,2018,5(1):1-30.
[28]QIAN Y,QIAN X M,GUAN Y,et al.A Cross-Project Defect Prediction Method Using Adversarial Learning[J].Journal of Software2022,33(6):2097-2112.
[29]SHENG L,LU L,LIN J,et al.An adversarial discriminativeconvolutional neural network for cross-project defect prediction[J].IEEE Access,2020,8:55241-55253.
[30]WANG R,LI J,YANG X,et al.Block-regularized repeatedlearning-testing for estimating generalization error[J].Information Sciences,2019,477:246-264.
[31]YANG X,WANG Y,WANG R,et al.Ensemble Feature Selection With Block-Regularized mx2 Cross-Validation[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(9):6628-6641.
[32]ARJOVSKY M,BOTTOU L.Towards principled methods for training generative adversarial networks[J].arXiv:1701.04862,2017.
[33]LEI K,MARDANI M,PAULY J M,et al.Wasserstein GANs for MR imaging:from paired to unpaired training[J].IEEE Transactions on Medical Imaging,2020,40(1):105-115.
[34]OBUKHOV A,KRASNYANSKIY M.Quality assessmentmethod for GAN based on modified metrics inception score and Fréchet inception distance[C]//Proceedings of the Computational Methods in Systems and Software.Cham:Springer,2020:102-114.
[35]DEL RIO S,BENÍTEZ J M,HERRERA F,et al.Analysis of data preprocessing increasing the oversampling ratio for extremely imbalanced big data classification[C]//2015 IEEE Trustcom/BigDataSE/ISPA.IEEE,2015:180-185.
[36]WANG S,LIU T,TAN L,et al.Automatically learning semantic features for defect prediction[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:297-308.
[1] ZHUANG Yuan, CAO Wenfang, SUN Guokai, SUN Jianguo, SHEN Linshan, YOU Yang, WANG Xiaopeng, ZHANG Yunhai. Network Protocol Vulnerability Mining Method Based on the Combination of Generative AdversarialNetwork and Mutation Strategy [J]. Computer Science, 2023, 50(9): 44-51.
[2] YAN Yan, SUI Yi, SI Jianwei. Remote Sensing Image Pan-sharpening Method Based on Generative Adversarial Network [J]. Computer Science, 2023, 50(8): 133-141.
[3] WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin. GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network [J]. Computer Science, 2023, 50(6): 216-224.
[4] LI Huilai, YANG Bin, YU Xiuli, TANG Xiaomei. Explainable Comparison of Software Defect Prediction Models [J]. Computer Science, 2023, 50(5): 21-30.
[5] LIANG Weiliang, LI Yue, WANG Pengfei. Lightweight Face Generation Method Based on TransEditor and Its Application Specification [J]. Computer Science, 2023, 50(2): 221-230.
[6] CHEN Wanze, CHEN Jiazhen, HUANG Liqing, YE Feng, HUANG Tianqiang, LUO Haifeng. Controlled Facial Gender Forgery Combining Wavelet Transform High Frequency Information [J]. Computer Science, 2023, 50(11A): 221000241-10.
[7] ZHANG Dehui, DONG Anming, YU Jiguo, ZHAO Kai andZHOU You. Speech Enhancement Based on Generative Adversarial Networks with Gated Recurrent Units and Self-attention Mechanisms [J]. Computer Science, 2023, 50(11A): 230200203-9.
[8] LI Xiaoling, WU Haotian, ZHOU Tao, LU Hui. Password Guessing Model Based on Reinforcement Learning [J]. Computer Science, 2023, 50(1): 334-341.
[9] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[10] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[11] DAI Zhao-xia, LI Jin-xin, ZHANG Xiang-dong, XU Xu, MEI Lin, ZHANG Liang. Super-resolution Reconstruction of MRI Based on DNGAN [J]. Computer Science, 2022, 49(7): 113-119.
[12] XU Guo-ning, CHEN Yi-peng, CHEN Yi-ming, CHEN Jin-yin, WEN Hao. Data Debiasing Method Based on Constrained Optimized Generative Adversarial Networks [J]. Computer Science, 2022, 49(6A): 184-190.
[13] CAI Xin-yu, FENG Xiang, YU Hui-qun. Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes [J]. Computer Science, 2022, 49(6): 134-141.
[14] YIN Wen-bing, GAO Ge, ZENG Bang, WANG Xiao, CHEN Yi. Speech Enhancement Based on Time-Frequency Domain GAN [J]. Computer Science, 2022, 49(6): 187-192.
[15] XU Hui, KANG Jin-meng, ZHANG Jia-wan. Digital Mural Inpainting Method Based on Feature Perception [J]. Computer Science, 2022, 49(6): 217-223.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!