Computer Science ›› 2024, Vol. 51 ›› Issue (6): 68-77.doi: 10.11896/jsjkx.230400017

• Computer Software • Previous Articles     Next Articles

Identifying Coincidental Correct Test Cases Based on Machine Learning

TIAN Shuaihua, LI Zheng, WU Yonghao, LIU Yong   

  1. College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China
  • Received:2023-04-03 Revised:2023-07-11 Online:2024-06-15 Published:2024-06-05
  • About author:TIAN Shuaihua,born in 1998,postgra-duate.His main research interests include fault localization and artificial intelligence.
    WU Yonghao,born in 1995,Ph.D candidate.His main research interests include software testing and fault localization.
  • Supported by:
    National Natural Science Foundation of China(61902015,61872026).

Abstract: Spectrum-based fault localization(SBFL) techniques have been widely studied to help developers quickly find the position of the fault,so as to reduce the cost of program debugging.However,there is a special test case in the test suites that executes the fault statement but outputs the expected result,and this test case is called coincidental correct(CC) test case.CC test case can negatively effect the performance of SBFL fault localization.In order to mitigate the negative impact of CC test case and enhance the performance of SBFL technique,this paper proposes CC test cases identification via machine learning approach(CCIML).CCIML approach utilizes features extracted from the SBFL suspiciousness formula and program static features to identify CC test cases,thus improving the fault localization accuracy of SBFL technique.To evaluate the performance of CCIML approach,experiments are carried out on the Defects4J dataset.The experimental results show that the average recall,precision,and F1 score of the CCIML approach for identifying CC test cases are 63.89%,70.16%,and 50.64%,respectively,better than the baselines.In addition,after processing the CC test cases identified by the CCIML approach using the cleaning and relabeling strategies,the fault localization performance obtained is also better than the comparison baselines.Under the cleaning and relabe-ling strategy,the number of faulty statements ranked first in suspicion value are 328 and 312,respectively.Compared to the fuzzy weighted K-nearest neighbor(FW-KNN) approach,the fault localization accuracy is improved by 124.66% and 235.48%.

Key words: Software debugging, Fault localization, Machine learning, Coincidental correct test case, Feature extraction

CLC Number: 

  • TP311
[1]WONG W E,GAO R,LI Y,et al.A survey on software fault localization[J].IEEE Transactions on Software Engineering,2016,42(8):707-740.
[2]ZHANG Z,LEI Y,MAO X,et al.A study of effectiveness of deep learning in locating real faults[J].Information and Software Technology,2021,131:106486.
[3]PAPADAKIS M,LE TRAON Y.Metallaxis-FL:mutation-basedfault localization[J].Software Testing,Verification andReliabi-lity,2015,25(5/6/7):605-628.
[4]WIDYASARI R,PRANA G A A,HARYONO S A,et al.XAI4FL:enhancing spectrum-based fault localization with explainable artificial intelligence[C]//Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension.2022:499-510.
[5]ZHANG M,LI X,ZHANG L,et al.Boosting spectrum-basedfault localization using pagerank[C]//Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis.2017:261-272.
[6]CAO H,LI L,CHU Y,et al.A coincidental correctness test case identification framework with fuzzy C-means clustering[J].Multimedia Systems,2023,29(3):1089-1101.
[7]LIU Y,LI M,WU Y,et al.A weighted fuzzy classification approach to identify and manipulate coincidental correct test cases for fault localization[J].Journal of Systems and Software,2019,151:20-37.
[8]LIU Y,LI Z,ZHAO R,et al.An optimal mutation executionstrategy for cost reduction of mutation-based fault localization[J].Information Sciences,2018,422:572-596.
[9]LI X,TADIKONDA D N.Improving Mutation-Based Fault Localization via Mutant Categorization[C]//The 34th Interna-tional Conference on Software Engineering and Knowledge Engineering(SEKE).2022:166-171.
[10]WONG W E,DEBROY V,GOLDEN R,et al.Effective software fault localization using an RBF neural network[J].IEEE Tran-sactions on Reliability,2011,61(1):149-169.
[11]WEISER M.Programmers use slices when debugging[J].Communications of the ACM,1982,25(7):446-452.
[12]YU J,LEI Y,XIE H,et al.Context-based cluster fault localization[C]//Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension.2022:482-493.
[13]SABBAGHI A,KEYVANPOUR M R,PARSA S.FCCI:A fuzzyexpert system for identifying coincidental correct test cases[J].Journal of Systems and Software,2020,168:110635.
[14]JIANG J,WANG R,XIONG Y,et al.Combining spectrum-based fault localization and statistical debugging:An empirical study[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2019:502-514.
[15]ZHANG Z,LEI Y,MAO X,et al.CNN-FL:An effective approach for localizing faults using convolutional neural networks[C]//2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2019:445-455.
[16]LI Y,WANG S,NGUYEN T.Fault localization with code co-verage representation learning[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:661-673.
[17]KIM Y,MUN S,YOO S,et al.Precise learn-to-rank fault localization using dynamic and static features of target programs[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2019,28(4):1-34.
[18]WONG W E,DEBROY V,GAO R,et al.The DStar method for effective software fault localization[J].IEEE Transactions on Reliability,2013,63(1):290-308.
[19]WEISHI L,MAO X.Alleviating the impact of coincidental correctness on the effectiveness of sfl by clustering test cases[C]//2014 Theoretical Aspects of Software Engineering Conference.IEEE,2014:66-69.
[20]WU Y,LIU Y,WANG W,et al.Theoretical Analysis and Empirical Study on the Impact of Coincidental Correct Test Cases in Multiple Fault Localization[J].IEEE Transactions on Reliability,2022,71(2):830-849.
[21]ABOU ASSI R,MASRI W,TRAD C.How detrimental is coincidental correctness to coverage-based fault detection and localization? An empirical study[J].Software Testing,Verification and Reliability,2021,31(5):e1762.
[22]DASS S,XUE X,NAMIN A S.Ensemble Random Forests Classifier for Detecting Coincidentally Correct Test Cases[C]//2020 IEEE 44th Annual Computers,Software,and Applications Conference(COMPSAC).IEEE,2020:1326-1331.
[23]MENZIES T,MILTON Z,TURHAN B,et al.Defect prediction from static code features:current results,limitations,new approaches[J].Automated Software Engineering,2010,17:375-407.
[24]LI X,LI W,ZHANG Y,et al.Deepfl:Integrating multiple fault diagnosis dimensions for deep fault localization[C]//Procee-dings of the 28th ACM SIGSOFT International Symposium on Software Testing And analysis.2019:169-180.
[25]ADETUNJI A B,AKANDE O N,AJALA F A,et al.House price prediction using random forest machine learning technique[J].Procedia Computer Science,2022,199:806-813.
[26]ZHANG J,WANG Z,ZHANG L,et al.Predictive mutation testing[C]//Proceedings of the 25th International Symposium on Software Testing and Analysis.2016:342-353.
[27]STEFANIDOU-VOZIKI P,CARDONER-VALBUENA D,VILLAFAFILA-ROBLES R,et al.Data analysis and management for optimal application of an advanced ML-based fault location algorithm for low voltage grids[J].International Journal of Electrical Power & Energy Systems,2022,142:108303.
[28]XIE H,LEI Y,YAN M,et al.A universal data augmentation approach for fault localization[C]//Proceedings of the 44th International Conference on Software Engineering.2022:48-60.
[29]GAO X,DENG F,YUE X.Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty[J].Neurocomputing,2020,396:487-494.
[30]ZHANG Z,LEI Y,MAO X,et al.Improving Fault Localization Using Model-domain Synthesized Failing Test Generation[C]//2022 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2022:199-210.
[31]ABOU ASSI R,TRAD C,MAALOUF M,et al.Coincidentalcorrectness in the Defects4J benchmark[J].Software Testing,Verification and Reliability,2019,29(3):e1696.
[32]HU J,XIE H,LEI Y,et al.A light-weight data augmentation method for fault localization[J].Information and Software Technology,2023:107148.
[33]LI Z,WU Y,LIU Y.An empirical study of bug isolation on the effectiveness of multiple fault localization[C]//2019 IEEE 19th International Conference on Software Quality,Reliability and Security(QRS).IEEE,2019:18-25.
[34]LOU Y,ZHU Q,DONG J,et al.Boosting coverage-based fault localization via graph-based representation learning[C]//Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2021:664-676.
[35]KIM J,KIM J,LEE E.Vfl:Variable-based fault localization[J].Information and Software Technology,2019,107:179-191.
[36]ZENG M,WU Y,YE Z,et al.Fault localization via efficientprobabilistic modeling of program semantics[C]//Proceedings of the 44th International Conference on Software Engineering.2022:958-969.
[37]ZAKARI A,LEE S P,ABREU R,et al.Multiple fault localization of software programs:A systematic literature review[J].Information and Software Technology,2020,124:106312.
[38]KOCHHAR P S,XIA X,LO D,et al.Practitioners’ expecta-tions on automated fault localization[C]//Proceedings of the 25th International Symposium on Software Testing and Analysis.2016:165-176.
[39]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python[J].Journal of Machine Learning Research,2011,12:2825-2830.
[1] SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320.
[2] LIN Binwei, YU Zhiyong, HUANG Fangwan, GUO Xianwei. Data Completion and Prediction of Street Parking Spaces Based on Transformer [J]. Computer Science, 2024, 51(4): 165-173.
[3] SONG Hao, MAO Kuanmin, ZHU Zhou. Algorithm of Stereo Matching Based on GAANET [J]. Computer Science, 2024, 51(4): 229-235.
[4] WANG Degang, SUN Yi, GAO Qi. Active Membership Inference Attack Method Based on Multiple Redundant Neurons [J]. Computer Science, 2024, 51(4): 373-380.
[5] WANG Xin, HUANG Weikou, SUN Lingyun. Survey of Incentive Mechanism for Cross-silo Federated Learning [J]. Computer Science, 2024, 51(3): 20-29.
[6] WANG Wenmiao. Prediction of Lower Limb Joint Angle Based on VMD-ELMAN Electromyographic Signals [J]. Computer Science, 2024, 51(3): 257-264.
[7] ZHAO Jiangfeng, HE Hongjie, CHEN Fan, YANG Shubin. Two-stage Visible Watermark Removal Model Based on Global and Local Features for Document Images [J]. Computer Science, 2024, 51(2): 172-181.
[8] LI Meng, DAI Haipeng, SUI Yongxi, GU Rong, CHEN Guihai. Survey of Learning-based Filters [J]. Computer Science, 2024, 51(1): 41-49.
[9] ZHANG Wenqiong, LI Yun. Fairness Metrics of Machine Learning:Review of Status,Challenges and Future Directions [J]. Computer Science, 2024, 51(1): 266-272.
[10] FU Jianming, JIANG Yuqian, HE Jia, ZHENG Rui, SURI Guga, PENG Guojun. Cryptocurrency Mining Malware Detection Method Based on Sample Embedding [J]. Computer Science, 2024, 51(1): 327-334.
[11] HUANG Shuxin, ZHANG Quanxin, WANG Yajie, ZHANG Yaoyuan, LI Yuanzhang. Research Progress of Backdoor Attacks in Deep Neural Networks [J]. Computer Science, 2023, 50(9): 52-61.
[12] WANG Yao, LI Yi. Termination Analysis of Single Path Loop Programs Based on Iterative Trajectory Division [J]. Computer Science, 2023, 50(9): 108-116.
[13] LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[14] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[15] LI Yang, LI Zhenhua, XIN Xianlong. Attack Economics Based Fraud Detection for MVNO [J]. Computer Science, 2023, 50(8): 260-270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!