计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600121-8.doi: 10.11896/jsjkx.230600121

• 计算机软件&体系架构 • 上一篇    下一篇

基于DNN模型输出差异的测试输入优先级方法

朱进1, 陶传奇1,2,3,4, 郭虹静1   

  1. 1 南京航空航天大学计算机科学与技术学院 南京 210016
    2 高安全系统的软件开发与验证技术工信部重点实验室 南京 210016
    3 计算机软件新技术国家重点实验室 南京 210023
    4 软件新技术与产业化协同创新中心 南京 210016
  • 发布日期:2024-06-06
  • 通讯作者: 陶传奇(taochuanqi@nuaa.edu.cn)
  • 作者简介:(513689570@qq.com)

Test Input Prioritization Approach Based on DNN Model Output Differences

ZHU Jin1, TAO Chuanqi1,2,3,4, GUO Hongjing1   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
    2 Ministry Key Laboratory for Safety-Critical Software Development and Verification,Nanjing 210016,China
    3 State Key Laboratory for Novel Software Technology,Nanjing 210023,China
    4 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210016,China
  • Published:2024-06-06
  • About author:ZHU Jin,born in 1996,postgraduate.His main research interests include deep learning testing and so on.
    TAO Chuanqi,Ph.D,associate professor.His main research interests include intelligent software testing,regression testing,cloud-based mobile testing as a service and quality assurance for big data applications.

摘要: 深度神经网络测试需要大量的测试数据来保证DNN的质量,但大多数测试输入缺乏标注信息,而且对测试输入进行标注会带来高昂的人工代价。为了解决标注成本的问题,研究人员提出了测试输入优先级方法,筛选高优先级的测试输入进行标注。然而,大多数优先级方法都受到有限情景的影响,例如难以筛选出高置信度的误分类输入。为了应对上述挑战,文中将差分测试技术应用于测试输入优先级,并提出了基于DNN模型输出差异的测试输入优先级方法(DeepDiff)。DeepDiff首先构建一个与原始模型具有相同功能的差分模型,然后计算测试输入在原始模型与差分模型之间的输出差异,最后为输出差异较大的测试输入分配更高的优先级。在实验验证中,我们对4个广泛使用的数据集和相应的8个DNN模型进行了研究。实验结果表明,在原始测试集上,DeepDiff的有效性比基线方法平均高出13.06%,在混合测试集上高出39.69%。

关键词: 深度神经网络测试, 测试输入优先级, 差分测试, 模型输出差异

Abstract: Deep neural network(DNN) testing requires a large amount of test data to ensure the quality of DNN.However,most test inputs lack annotation information,and annotating test inputs is costly.Therefore,in order to address the issue of annotation costs,researchers have proposed a test input prioritization approach to screen high priority test inputs for annotation.However,most prioritization methods are influenced by limited scenarios,such as difficulty in filtering out high confidence misclassified inputs.To address the above challenges,this paper applies differential testing technology to test input prioritization and proposes a test input prioritization method based on DNN model output differences(DeepDiff).DeepDiff first constructs a contrast model that has the same functionality as the original model,then calculates the output differences between the test inputs on the original model and the contrast model,and finally assigns higher priority to the test inputs with larger output differences.For empirical evidence,we conduct a study on four widely used datasets and the corresponding eight DNN models.Experimental results demonstrate that DeepDiff is 13.06% higher on average in effectiveness compared to the baseline approaches on the original test set and 39.69% higher on the mixed test set.

Key words: Deep neural network testing, Test input prioritization, Differential testing, Model output differences

中图分类号: 

  • TP311
[1]WANG Z,YAN M,LIU S,et al.A Review of Deep Neural Network Testing Research[J].Journal of Software,2020,31(5):1255-1275.
[2]MA L,JUEFEI-XU F,ZHANG F,et al.Deepgauge:Multi-gra-nularity testing criteria for deep learning systems[C]//Procee-dings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.2018:120-131.
[3]PEI K,CAO Y,YANG J,et al.Deepxplore:Automated whitebox testing of deep learning systems[C]//Proceedings of the 26th Symposium on Operating Systems Principles.2017:1-18.
[4]SUN Y,HUANG X,KROENING D,et al.Testing deep neural networks[J].arXiv:1803.04792,2018.
[5]ZHANG J J,ZHANG X H.Multi-branch Convolutional Neural Network Pulmonary Nodule Classification Method and Its Interpretability[J].Computer Science,2020,47(9):129-134.
[6]LI X Y,HE Z,LOU Y J,et al.Bilinear graph convolutional network for wetland remote sensing classification in the South China Sea region[J].Surveying and Mapping Bulletin,2023(5):44.
[7]FENG Y,SHI Q,GAO X,et al.Deepgini:prioritizing massive tests to enhance the robustness of deep neural networks[C]//Proceedings of the 29th ACM SIGSOFT International Sympo-sium on Software Testing and Analysis.2020:177-188.
[8]SHEN W,LI Y,CHEN L,et al.Multiple-boundary clustering and prioritization to promote neural network retraining[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.2020:410-422.
[9]AL-QADASI H,WU C,FALCONE Y,et al.DeepAbstraction:2-Level Prioritization for Unlabeled Test Inputs in Deep Neural Networks[C]//2022 IEEE International Conference On Artificial Intelligence Testing(AITest).IEEE,2022:64-71.
[10]LI Y,LI M,LAI Q,et al.Testrank:Bringing order into unla-beled test instances for deep learning tasks[J].Advances in Neural Information Processing Systems,2021,34:20874-20886.
[11]KIM J,FELDT R,YOO S.Guiding deep learning system testing using surprise adequacy[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:1039-1049.
[12]BYUN T,SHARMA V,VIJAYAKUMAR A,et al.Input priori-tization for testing neural networks[C]//2019 IEEE International Conference On Artificial Intelligence Testing(AITest).IEEE,2019:63-70.
[13]TAO Y,TAO C,GUO H,et al.TPFL:Test input prioritization for deep neural networks based on fault localization[C]//International Conference on Advanced Data Mining and Applications.Cham:Springer Nature Switzerland,2022:368-383.
[14]MCKEEMAN W M.Differential testing for software[J].Digital Technical Journal,1998,10(1):100-107.
[15]MA L,ZHANG F,SUN J,et al.Deepmutation:Mutation testing of deep learning systems[C]//2018 IEEE 29th International Symposium on Software Reliability Engineering(ISSRE).IEEE,2018:100-111.
[16]HUMBATOVA N,JAHANGIROVA G,TONELLA P.DeepCrime:mutation testing of deep learning systems based on real faults[C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis.2021:67-78.
[17]TAN P N,STEINBACH M,KUMAR V.Introduction to data mining[M].Pearson Education India,2016.
[18]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014.
[19]KURAKIN A,GOODFELLOW I J,BENGIO S.Adversarial examples in the physical world[M]//Artificial intelligence safety and security.Chapman and Hall/CRC,2018:99-112.
[20]PAPERNOT N,MCDANIEL P,JHA S,et al.The limitations of deep learning in adversarial settings[C]//2016 IEEE European Symposium on Security and Privacy(EuroS&P).IEEE,2016:372-387.
[21]CARLINI N,WAGNER D.Towards evaluating the robustness of neural networks[C]//2017 IEEE Symposium on Security and Privacy(PS).IEEE,2017:39-57.
[22]WANG Z,YOU H,CHEN J,et al.Prioritizing test inputs for deep neural networks via mutation analysis[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:397-409.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!