计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 34-43.doi: 10.11896/jsjkx.230400029
赵通, 沙朝锋
ZHAO Tong, SHA Chaofeng
摘要: 深度神经网络(DNN)已被广泛应用于各种任务,而在部署前对DNN进行充分测试尤为重要,因此需要构建能够对DNN进行充分测试的测试集。由于标注成本受限,通常通过测试样例选取的方式得到测试子集。然而,人们使用基于预测不确定性的方法(该方法在发现误分类样例和提升重训练表现方面表现出卓越的能力)进行测试样例选取时,忽略了对测试样例的预测不确定性估计是否准确的问题。为了填补上述研究的空白,通过实验定性和定量地揭示了模型校准程度和测试样例选取任务中使用的不确定性指标之间的相关性。校准模型会使模型有更准确的预测不确定性估计,因此研究了不同校准程度的模型用不确定指标选取得到的测试子集质量是否不同。在3个公开数据集和4个卷积神经网络(CNN)架构模型上进行了充分的实验和分析,结果表明在CNN架构模型上:1)不确定指标和模型校准存在一定程度的相关性;2)校准程度好的模型所选择的测试子集质量优于校准程度差的模型选择的测试子集质量。在发现模型误分类样例的能力上,70.57%经过校准训练后的模型对应的实验结果优于未校准模型对应的实验结果。因此在测试样例选取任务中考虑模型校准十分重要,且可以使用模型校准来提升测试样例选取的表现。
中图分类号:
[1]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [2]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016. [3]PEI K,CAO Y,YANG J,et al.Deepxplore:Automated white-box testing of deep learning systems[C]//Proceedings of the 26th Symposium on Operating Systems Principles.2017:1-18. [4]MA L,XU J F,ZHANG F,et al.Deepgauge:Multi-granularity testing criteria for deep learning systems[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.2018:120-131. [5]KIM J,FELDT R,YOO S.Guiding deep learning system testing using surprise adequacy[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:1039-1049. [6]MA W,PAPADAKIS M,TSAKMALIS A,et al.Test selection for deep learning systems[J].ACM Transactions on Software Engineering and Methodology(TOSEM),2021,30(2):1-22. [7]FENG Y,SHI Q,GAO X,et al.Deepgini:prioritizing massive tests to enhance the robustness of deep neural networks[C]//Proceedings of the 29th ACM SIGSOFT International Sympo-sium on Software Testing and Analysis.2020:177-188. [8]GUO C,PLEISS G,SUN Y,et al.On calibration of modern neural networks[C]//International Conference on Machine Lear-ning.PMLR,2017:1321-1330. [9]ZHANG H Y,CISSÉ M,DAUPHIN Y N,et al.mixup:Beyond Empirical Risk Minimization[C]//International Conference on Learning Representations.2018. [10]PEREYRA G,TUCKER G,CHOROWSKI J,et al.Regularizing Neural Networks by Penalizing Confident Output Distributions[C]//International Conference on Learning Representations.2017. [11]YUAN Y,PANG Q,WANG S.You Can’t See the Forest for Its Trees:Assessing Deep Neural Network Testing via Neural Coverage[C]//2023 IEEE/ACM 45st International Conference on Software Engineering(ICSE).IEEE,2023. [12]WEISS M,TONELLA P.Simple techniques work surprisingly well for neural network test prioritization and active learning(replicability study)[C]//Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis.2022:139-150. [13]ZHANG X,XIE X,MA L,et al.Towards characterizing adversarial defects of deep learning software from the lens of uncertainty[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:739-751. [14]ZHAO C,MU Y,CHEN X,et al.Can test input selection me-thods for deep neural network guarantee test diversity? A large-scale empirical study[J].Information and Software Technology,2022,150:106982. [15]SHEN W,LI Y,CHEN L,et al.Multiple-boundary clustering and prioritization to promote neural network retraining[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.2020:410-422. [16]GAO X,FENG Y,YIN Y,et al.Adaptive test selection for deep neural networks[C]//Proceedings of the 44th International Conference on Software Engineering.2022:73-85. [17]HU Q,GUO Y,XIE X,et al.Aries:Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation[C]//2023 IEEE/ACM 45st International Conference on Software Engineering(ICSE).IEEE,2023. [18]ARRIETA A.Multi-objective metamorphic follow-up test case selection for deep learning systems[C]//Proceedings of the Genetic and Evolutionary Computation Conference.2022:1327-1335. [19]ZADROZNY B,ELKAN C.Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers[C]//ICML.2001:609-616. [20]THULASIDASAN S,CHENNUPATI G,BILMES J A,et al.On mixup training:Improved calibration and predictive uncertainty for deep neural networks[C]//Advances in Neural Information Processing Systems.2019:13888-13899. [21]MÜLLER R,KORNBLITH S,HINTON G E.When does label smoothing help?[C]//Advances in Neural Information Processing Systems.2019:4696-4705. [22]QIN Y,WANG X,BEUTEL A,et al.Improving calibrationthrough the relationship with adversarial robustness[J].Advances in Neural Information Processing Systems,2021,34:14358-14369. [23]LI Z,MA X,XU C,et al.Operational calibration:Debuggingconfidence errors for dnns in the field[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2020:901-913. [24]LI D,HU B,CHEN Q.Calibration Meets Explanation:A Simple and Effective Approach for Model Confidence Estimates[C]//Conference on Empirical Methods in Natural Language Proces-sing.2022. [25]MINDERER M,DJOLONGA J,ROMIJNDERS R,et al.Revisi-ting the calibration of modern neural networks[J].Advances in Neural Information Processing Systems,2021,34:15682-15694. [26]DENG L.The mnist database of handwritten digit images formachine learning research[J].IEEE Signal Processing Magazine,2012,29(6):141-142. [27]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[J/OL].http://cs.toronto.edu/~kriz/learning-features-2009-TR.pdf. [28]MU N,GILMER J.Mnist-c:A robustness benchmark for computer vision[J].arXiv:1906.02337,2019. [29]HENDRYCKS D,DIETTERICH T.Benchmarking neural net-work robustness to common corruptions and perturbations[C]//International Conference on Learning Representations.2019. [30]HU Q,GUO Y,CORDY M,et al.An empirical study on data distribution-aware test selection for deep learning enhancement[J].ACM Transactions on Software Engineering and Methodo-logy(TOSEM),2022,31(4):1-30. [31]HE T,ZHANG Z,ZHANG H,et al.Bag of tricks for image classification with convolutional neural networks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:558-567. [32]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.PMLR,2013:1139-1147. [33]KROGH A,HERTZ J.A simple weight decay can improve ge-neralization[C]//Advances in Neural Information Processing Systems.1991:950-957. |
|