Computer Science ›› 2021, Vol. 48 ›› Issue (12): 131-139.doi: 10.11896/jsjkx.201000168

• Computer Software • Previous Articles     Next Articles

Noise Tolerable Feature Selection Method for Software Defect Prediction

TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song   

  1. Beijing Institute of Control Engineering,Beijing 100190,China
  • Received:2020-10-28 Revised:2021-03-15 Online:2021-12-15 Published:2021-11-26
  • About author:TENG Jun-yuan,born in 1985,master,senior engineer.His main research interests include embedded software testing and software engineering.
    GAO Meng,born in 1982,master,senior engineer.His main research interests include embedded software testing and software engineering.
  • Supported by:
    National Natural Science Foundation of China(61802017) and Equipment Pre-Research Field Fund Project(61400020407).

Abstract: Software defect prediction can identify defective modules in advance by mining the defect datasets,helping testers to achieve more targeted testing.However,the ubiquity of label noise in the datasets affects the performance of the prediction mo-del.Few feature selection methods have been used to specifically design noise tolerance.In addition,the strategy selection in the mainstream noise tolerable feature selection framework can only be performed manually based on human experience,which is difficult to be applied in software engineering.In view of this,this paper proposes a novel method NTFES (noise tolerable feature selection).In particular,NTFES first generates multiple Bootstrap samples by Bootstrap sampling method.Then it divides the original features into different groups on Bootstrap samples by approximate Markov blanket and selects candidate features from each group based on two heuristic feature selection strategies. Sequently it uses genetic algorithm (GA) to search the optimal feature subset in the candidate feature space.To verify the effectiveness of the proposed method,this paper chooses NASA MDP dataset,and inject label noises simultaneously to imitate noisy datasets.Then it compares NTFES with other classical baseline methods,such as FULL,FCBF and CFS,by controlling the ratio of label noises.The experimental results show that the proposed method has the advantages of achieving higher classification performance and has better noise tolerable while the ratio of label noises is acceptable.

Key words: Feature selection, Label noise, Noise tolerable, Software defect prediction, Software testing

CLC Number: 

  • TP391
[1]CATAL C.Software fault prediction:A literature review and current trends[J].Expert Systems with Applications,2011,38(4):4626-4636.
[2]HERZIG K,JUST S,ZELLER A.It's not a bug,it's a feature:How misclassification impacts bug prediction[C]//Proceedings of the International Conference on Software Engineering.San Francisco,USA,2013:392-401.
[3]BOLON-CANEDO V,SANCHEZ-MARONO N,ALONSO- BETANZOS A.Feature selection for high dimensional data[J].Progress in Artificial Intelligence,2016,5(2):65-75.
[4]KIM S,ZHANG H Y,WU R X,et al.Dealing with noise in defect prediction[C]//Proceedings of the Intemational Conference on Software Engineering.Honolulu,USA,2011:481-490.
[5]TANTITHAMTHAVORN C,MCINTOSH S,HASSAN A E,et al.The impact of mislabeling on the performance and interpretation of defect prediction models[C]//Proceedings of the International Conference on Software Engineering.Firenze,Italy,2015:812-823.
[6]HALL T,BEECHAM S,BOWES D,et al.A systematic litera- ture review on fault prediction performance in software engineering[J].IEEE Transactions on Software Engineering,2012,38(6):1276-1304.
[7]CHEN X,GU Q,LIU W S,et al.Software defect prediction[J].Journal of Software,2016,27(1):1-25.
[8]MENZIES T,GREENWALD J,FRANK A.Data mining static code attributes to learn defect predictors[J].IEEE Transactions on Software Engineering,2007,33(1):2-13.
[9]GAO K H,KHOSHGOFTAAR T M,WANG H J,et al.Choosing software metrics for defect prediction:an investigation on feature selection techniques[J].Software Practice & Expe-rience,2011,41(5):579-606.
[10]WANG H J,KHOSHGOFTAAR T M,HULSE J V,et al.Metric selection for software defect prediction[J].International Journal of Software Engineering & Knowledge Engineering,2011,21(2):237-257.
[11]XU Z,XUAN J F,LIU J,et al.MICHAC:defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering[C]//Proceedings of the 23rd International Conference on Software Analysis,Evolution and Reengineering.Washington:IEEE Computer Society,2016,1:370-381.
[12]SONG Q B,JIA Z H,SHEPPERD M,et al.A general software defect-proneness prediction framework[J].IEEE Transactions on Software Engineering,2011,37(3):356-370.
[13]XU Z,LIU J,YANG Z J,et al.The impact of feature selection on defect prediction performance:an empirical comparison[C]//Proceedings of the 27th International Symposium on Software Reliability Engineering.Washington:IEEE Computer Society,2016:309-320.
[14]YU L,LIU H.Efficient feature selection via analysis of relevance and redundancy[J].Journal of Machine Learning Research,2004,5(10):1205-1224.
[15]PES B,DESSI N,ANGIONI M.Exploiting the ensemble paradigm for stable feature selection:A case study on high dimensional genomic data[J].Information Fusion,2017,35(C):132-147.
[16]ZHOU M.A hybrid feature selection method based on fisher score and genetic algorithm[J].Journal of Mathematical Sciences:Advances and Application,2016,37:51-78.
[17]LIU S L,CHEN X,LIU W S,et al.FECAR:A feature selection framework for software defect prediction[C]//Proceedings of the Annual Computer Software and Applications Conference.Vasteras,Sweden,2014:426-435.
[18]RAHMAN F,POSNETT D,HERRAIZ I,et al.Sample size vs.bias in defect prediction[C]//Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on Foundations of Software Engineering.Saint Petersburg,Russia,2013:147-157.
[19]LIU W S,CHEN X,GU Q,et al.A noise tolerable feature selection framework for software defect prediction[J].Chinese Journal of Computers,2018,41(3):506-520.
[20]GARCÍA-TORRES M,GÓMEZ-VELA F,MELIÁN-BATISTA B,et al.High-dimensional feature selection via feature grouping:A Variable Neighborhood Search approach[J].Information Sciences,2016,326:102-118.
[21]LIU Y,CAO J J,DIAO X C,et al.Survey on Stability of Feature Selection[J].Journal of Software,2018,29(9):2559-2579.
[22]DEVIJVER P A,KITTLER J.Pattern recognition:a statistical approach [M].London:Prentice Hall,1992.
[23]VAFAIE H,DE JONG K A.Genetic algorithms as a tool for feature selection in machine learning[C]//Proceedings of the 4th IEEE International Conference on Tools with AI.Washington DC:IEEE Computer Society,1992:200-203.
[24]HALL M A.Correlation-based feature subset selection for machine learning [D].Hamilton,New Zealand:University of Waikato,1999.
[25]SÁEZ J A,GALAR M,LUENGO J,et al.Tackling the problem of classification with noisy data using Multiple Classifier Systems:Analysis of the performance and robustness[J].Information Sciences,2013,247:1-20.
[26]LI J,CHENG K,WANG S,et al.Feature selection:A data perspective[J].ACM Computing Surveys (CSUR),2017,50(6):1-45.
[1] LI Bin, WAN Yuan. Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment [J]. Computer Science, 2022, 49(8): 86-96.
[2] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[3] KANG Yan, WANG Hai-ning, TAO Liu, YANG Hai-xiao, YANG Xue-kun, WANG Fei, LI Hao. Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection [J]. Computer Science, 2022, 49(6A): 125-132.
[4] CHU An-qi, DING Zhi-jun. Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation [J]. Computer Science, 2022, 49(4): 134-139.
[5] SUN Lin, HUANG Miao-miao, XU Jiu-cheng. Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief [J]. Computer Science, 2022, 49(4): 152-160.
[6] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[7] ZHANG Ye, LI Zhi-hua, WANG Chang-jie. Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method [J]. Computer Science, 2021, 48(9): 337-344.
[8] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[9] HOU Chun-ping, ZHAO Chun-yue, WANG Zhi-peng. Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining [J]. Computer Science, 2021, 48(7): 199-205.
[10] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[11] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[12] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[13] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[14] WEN Jin, ZHANG Xing-yu, SHA Chao-feng, LIU Yan-jun. Test Suite Reduction via Submodular Function Maximization [J]. Computer Science, 2021, 48(12): 75-84.
[15] ZHANG Ya-chuan, LI Hao, SONG Chen-ming, BU Rong-jing, WANG Hai-ning, KANG Yan. Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection [J]. Computer Science, 2021, 48(11A): 93-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!