计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 374-379.doi: 10.11896/jsjkx.240400210
陈自刚1,2,3, 潘鼎1, 冷涛2, 朱海华1, 陈龙1, 周由胜1
CHEN Zigang1,2,3, PAN Ding1, LENG Tao2, ZHU Haihua1, CHEN Long1, ZHOU Yousheng1
摘要: 深度学习可解释性在发展的同时,也面临着安全性方面的巨大挑战。模型对输入数据的解释结果存在被恶意操纵攻击的风险,此攻击严重限制了可解释性技术的应用场景并阻碍了人类对模型的探索与认知。针对此问题,提出一种使用模型梯度作为相似性约束的解释鲁棒性对抗训练方法。首先,沿解释方向采样生成对抗训练数据;其次,结合训练过程中样本的梯度信息来计算采样数据解释之间的多种相似性指标,用以对模型正则化,平滑模型的曲率;最后,为验证所提出的解释鲁棒性对抗训练方法的有效性,在多个数据集和解释方法上进行验证,实验结果表明,所提方法在防御对抗解释样本上具有显著效果。
中图分类号:
[1]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,USA,2015:815-823. [2]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016. [3]TOBIYAMA S,YAMAGUCHI Y,SHIMADA H,et al.Mal-ware detection with deep neural network using process behavior[C]//2016 IEEE 40th Annual Computer Software and Applications Conference(COMPSAC).Atlanta,USA,2016:2:577-582. [4]WANG J,LI J Z,WANG Z T,et al.An Interpretable Prediction Model for Heart Disease Risk Based on Improved Whale Optimized LightGBM[J].Journal of Beijing University of Posts and Telecommunications,2023,46(6):39-45. [5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90. [6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [7]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848. [8]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252. [9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [10]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA,2016:770-778. [11]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA,2017:4700-4708. [12]China Academy of Information and Communications Technology.Artificial Intelligence White Paper(2022) [R].Beijing:China Academy of Information and Communications Technology,2022. [13]REGULATION P.General data protection regulation[J].In-touch,2018,25:1-5. [14]GHORBANI A,ABID A,ZOU J.Interpretation of neural networks is fragile[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA,2019:3681-3688. [15]SIMONYAN K,VEDALDI A,ZISSERMAN A.Deep insideconvolutional networks:Visualising image classification models and saliency maps[J].arXiv:1312.6034,2013. [16]SUNDARARAJAN M,TALY A,YAN Q.Axiomatic attribution for deep networks[C]//International Conference on Machine Learning.Sydney,Australia,2017:3319-3328. [17]SHRIKUMAR A,GREENSIDE P,KUNDAJE A.Learning important features through propagating activation differences[C]//International Conference on Machine Learning.Sydney,Australia,2017:3145-3153. [18]DOMBROWSKI A K,ALBER M,ANDERS C,et al.Explanations can be manipulated and geometry is to blame[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:13589-13600. [19]ZHANG X Y,WANG N F,SHEN H,et al.Interpretable deep learning under fire[C]//29th {USENIX} Security Symposium({USENIX} Security 20).Virtual Event,2020:1659-1676. [20]HEO J,JOO S,MOON T.Fooling neural network interpretations via adversarial model manipulation[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:2925-2936. [21]CHATTOPADHAY A,SARKAR A,HOWLADER P,et al.Grad-cam++:Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).Lake Tahoe,USA,2018:839-847. [22]DOMBROWSKI A K,ANDERS C J,MÜLLER K R,et al.Towards robust explanations for deep neural networks[J].Pattern Recognition,2022,121:108194. [23]PETSIUK V,DAS A,SAENKO K.Rise:Randomized inputsampling for explanation of black-box models[J].arXiv:1806.07421,2018. [24]SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:The all convolutional net[J].arXiv:1412.6806,2014. [25]BACH S,BINDER A,MONTAVON G,et al.On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J].PloS One,2015,10(7):e0130140. |
|