计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 68-77.doi: 10.11896/jsjkx.230400017

• 计算机软件 • 上一篇    下一篇

基于机器学习识别偶然正确测试用例

田帅华, 李征, 吴永豪, 刘勇   

  1. 北京化工大学信息科学与技术学院 北京 100029
  • 收稿日期:2023-04-03 修回日期:2023-07-11 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 吴永豪(appmlk@outlook.com)
  • 作者简介:(2021210481@buct.edu.cn)
  • 基金资助:
    国家自然科学基金(61902015,61872026)

Identifying Coincidental Correct Test Cases Based on Machine Learning

TIAN Shuaihua, LI Zheng, WU Yonghao, LIU Yong   

  1. College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China
  • Received:2023-04-03 Revised:2023-07-11 Online:2024-06-15 Published:2024-06-05
  • About author:TIAN Shuaihua,born in 1998,postgra-duate.His main research interests include fault localization and artificial intelligence.
    WU Yonghao,born in 1995,Ph.D candidate.His main research interests include software testing and fault localization.
  • Supported by:
    National Natural Science Foundation of China(61902015,61872026).

摘要: 基于频谱的故障定位(Spectrum-Based Fault Localization,SBFL)技术已被广泛研究,可以帮助开发人员快速找到程序错误位置,以降低软件测试成本。然而,测试套件中存在一种特殊的测试用例,其执行了错误的语句但能输出符合预期的结果,这种测试用例被称为偶然正确(Coincidental Correct,CC)测试用例。CC测试用例会对SBFL技术的性能产生负面影响。为了减轻CC产生的负面影响,提升SBFL技术性能,文中提出了一种基于机器学习的CC测试用例识别方法(CC test cases Identification via Machine Learning,CCIML)。CCIML结合怀疑度公式特征和程序静态特征来识别CC测试用例,从而提高SBFL技术的故障定位精度。为了评估CCIML方法的性能,文中基于Defects4J数据集进行对比实验。实验结果表明,CCIML方法识别CC测试用例的平均召回率、准确率和F1分数分别为63.89%,70.16%和50.64%,该结果优于对比方法。除此之外,采用清洗和重标策略处理CCIML方法识别出的CC测试用例后,最终取得的故障定位效果也优于对比方法。其中,在清洗策略和重标策略下,错误语句怀疑度值排在第一位的数量分别为328和312,相比模糊加权K近邻(Fuzzy Weighted K-Nearest Neighbor,FW-KNN)方法,定位到的故障数量分别增长了124.66%,235.48%。

关键词: 软件测试, 故障定位, 机器学习, 偶然正确测试用例, 特征提取

Abstract: Spectrum-based fault localization(SBFL) techniques have been widely studied to help developers quickly find the position of the fault,so as to reduce the cost of program debugging.However,there is a special test case in the test suites that executes the fault statement but outputs the expected result,and this test case is called coincidental correct(CC) test case.CC test case can negatively effect the performance of SBFL fault localization.In order to mitigate the negative impact of CC test case and enhance the performance of SBFL technique,this paper proposes CC test cases identification via machine learning approach(CCIML).CCIML approach utilizes features extracted from the SBFL suspiciousness formula and program static features to identify CC test cases,thus improving the fault localization accuracy of SBFL technique.To evaluate the performance of CCIML approach,experiments are carried out on the Defects4J dataset.The experimental results show that the average recall,precision,and F1 score of the CCIML approach for identifying CC test cases are 63.89%,70.16%,and 50.64%,respectively,better than the baselines.In addition,after processing the CC test cases identified by the CCIML approach using the cleaning and relabeling strategies,the fault localization performance obtained is also better than the comparison baselines.Under the cleaning and relabe-ling strategy,the number of faulty statements ranked first in suspicion value are 328 and 312,respectively.Compared to the fuzzy weighted K-nearest neighbor(FW-KNN) approach,the fault localization accuracy is improved by 124.66% and 235.48%.

Key words: Software debugging, Fault localization, Machine learning, Coincidental correct test case, Feature extraction

中图分类号: 

  • TP311
[1]WONG W E,GAO R,LI Y,et al.A survey on software fault localization[J].IEEE Transactions on Software Engineering,2016,42(8):707-740.
[2]ZHANG Z,LEI Y,MAO X,et al.A study of effectiveness of deep learning in locating real faults[J].Information and Software Technology,2021,131:106486.
[3]PAPADAKIS M,LE TRAON Y.Metallaxis-FL:mutation-basedfault localization[J].Software Testing,Verification andReliabi-lity,2015,25(5/6/7):605-628.
[4]WIDYASARI R,PRANA G A A,HARYONO S A,et al.XAI4FL:enhancing spectrum-based fault localization with explainable artificial intelligence[C]//Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension.2022:499-510.
[5]ZHANG M,LI X,ZHANG L,et al.Boosting spectrum-basedfault localization using pagerank[C]//Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis.2017:261-272.
[6]CAO H,LI L,CHU Y,et al.A coincidental correctness test case identification framework with fuzzy C-means clustering[J].Multimedia Systems,2023,29(3):1089-1101.
[7]LIU Y,LI M,WU Y,et al.A weighted fuzzy classification approach to identify and manipulate coincidental correct test cases for fault localization[J].Journal of Systems and Software,2019,151:20-37.
[8]LIU Y,LI Z,ZHAO R,et al.An optimal mutation executionstrategy for cost reduction of mutation-based fault localization[J].Information Sciences,2018,422:572-596.
[9]LI X,TADIKONDA D N.Improving Mutation-Based Fault Localization via Mutant Categorization[C]//The 34th Interna-tional Conference on Software Engineering and Knowledge Engineering(SEKE).2022:166-171.
[10]WONG W E,DEBROY V,GOLDEN R,et al.Effective software fault localization using an RBF neural network[J].IEEE Tran-sactions on Reliability,2011,61(1):149-169.
[11]WEISER M.Programmers use slices when debugging[J].Communications of the ACM,1982,25(7):446-452.
[12]YU J,LEI Y,XIE H,et al.Context-based cluster fault localization[C]//Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension.2022:482-493.
[13]SABBAGHI A,KEYVANPOUR M R,PARSA S.FCCI:A fuzzyexpert system for identifying coincidental correct test cases[J].Journal of Systems and Software,2020,168:110635.
[14]JIANG J,WANG R,XIONG Y,et al.Combining spectrum-based fault localization and statistical debugging:An empirical study[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2019:502-514.
[15]ZHANG Z,LEI Y,MAO X,et al.CNN-FL:An effective approach for localizing faults using convolutional neural networks[C]//2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2019:445-455.
[16]LI Y,WANG S,NGUYEN T.Fault localization with code co-verage representation learning[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:661-673.
[17]KIM Y,MUN S,YOO S,et al.Precise learn-to-rank fault localization using dynamic and static features of target programs[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2019,28(4):1-34.
[18]WONG W E,DEBROY V,GAO R,et al.The DStar method for effective software fault localization[J].IEEE Transactions on Reliability,2013,63(1):290-308.
[19]WEISHI L,MAO X.Alleviating the impact of coincidental correctness on the effectiveness of sfl by clustering test cases[C]//2014 Theoretical Aspects of Software Engineering Conference.IEEE,2014:66-69.
[20]WU Y,LIU Y,WANG W,et al.Theoretical Analysis and Empirical Study on the Impact of Coincidental Correct Test Cases in Multiple Fault Localization[J].IEEE Transactions on Reliability,2022,71(2):830-849.
[21]ABOU ASSI R,MASRI W,TRAD C.How detrimental is coincidental correctness to coverage-based fault detection and localization? An empirical study[J].Software Testing,Verification and Reliability,2021,31(5):e1762.
[22]DASS S,XUE X,NAMIN A S.Ensemble Random Forests Classifier for Detecting Coincidentally Correct Test Cases[C]//2020 IEEE 44th Annual Computers,Software,and Applications Conference(COMPSAC).IEEE,2020:1326-1331.
[23]MENZIES T,MILTON Z,TURHAN B,et al.Defect prediction from static code features:current results,limitations,new approaches[J].Automated Software Engineering,2010,17:375-407.
[24]LI X,LI W,ZHANG Y,et al.Deepfl:Integrating multiple fault diagnosis dimensions for deep fault localization[C]//Procee-dings of the 28th ACM SIGSOFT International Symposium on Software Testing And analysis.2019:169-180.
[25]ADETUNJI A B,AKANDE O N,AJALA F A,et al.House price prediction using random forest machine learning technique[J].Procedia Computer Science,2022,199:806-813.
[26]ZHANG J,WANG Z,ZHANG L,et al.Predictive mutation testing[C]//Proceedings of the 25th International Symposium on Software Testing and Analysis.2016:342-353.
[27]STEFANIDOU-VOZIKI P,CARDONER-VALBUENA D,VILLAFAFILA-ROBLES R,et al.Data analysis and management for optimal application of an advanced ML-based fault location algorithm for low voltage grids[J].International Journal of Electrical Power & Energy Systems,2022,142:108303.
[28]XIE H,LEI Y,YAN M,et al.A universal data augmentation approach for fault localization[C]//Proceedings of the 44th International Conference on Software Engineering.2022:48-60.
[29]GAO X,DENG F,YUE X.Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty[J].Neurocomputing,2020,396:487-494.
[30]ZHANG Z,LEI Y,MAO X,et al.Improving Fault Localization Using Model-domain Synthesized Failing Test Generation[C]//2022 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2022:199-210.
[31]ABOU ASSI R,TRAD C,MAALOUF M,et al.Coincidentalcorrectness in the Defects4J benchmark[J].Software Testing,Verification and Reliability,2019,29(3):e1696.
[32]HU J,XIE H,LEI Y,et al.A light-weight data augmentation method for fault localization[J].Information and Software Technology,2023:107148.
[33]LI Z,WU Y,LIU Y.An empirical study of bug isolation on the effectiveness of multiple fault localization[C]//2019 IEEE 19th International Conference on Software Quality,Reliability and Security(QRS).IEEE,2019:18-25.
[34]LOU Y,ZHU Q,DONG J,et al.Boosting coverage-based fault localization via graph-based representation learning[C]//Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2021:664-676.
[35]KIM J,KIM J,LEE E.Vfl:Variable-based fault localization[J].Information and Software Technology,2019,107:179-191.
[36]ZENG M,WU Y,YE Z,et al.Fault localization via efficientprobabilistic modeling of program semantics[C]//Proceedings of the 44th International Conference on Software Engineering.2022:958-969.
[37]ZAKARI A,LEE S P,ABREU R,et al.Multiple fault localization of software programs:A systematic literature review[J].Information and Software Technology,2020,124:106312.
[38]KOCHHAR P S,XIA X,LO D,et al.Practitioners’ expecta-tions on automated fault localization[C]//Proceedings of the 25th International Symposium on Software Testing and Analysis.2016:165-176.
[39]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python[J].Journal of Machine Learning Research,2011,12:2825-2830.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!