基于局部梯度平滑的解释鲁棒性对抗训练方法

doi:10.11896/jsjkx.240400210

计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 374-379.doi: 10.11896/jsjkx.240400210

基于局部梯度平滑的解释鲁棒性对抗训练方法

陈自刚^1,2,3, 潘鼎¹, 冷涛², 朱海华¹, 陈龙¹, 周由胜¹

1 重庆邮电大学网络空间安全监测与治理重庆市重点实验室重庆 400065
2 四川警察学院智能警务四川省重点实验室四川泸州 646000
3 重庆邮电大学网络空间大数据智能安全教育部重点实验室重庆 400065

收稿日期:2024-04-30 修回日期:2024-08-26 出版日期:2025-02-15 发布日期:2025-02-17
通讯作者: 冷涛(lengtao@iie.ac.cn)
作者简介:(chenzg@cqupt.edu.cn)
基金资助:
国家自然科学基金(62272076);智能警务四川省重点实验室开放基金重点项目(ZNJW2022KFZD002)

Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing

CHEN Zigang^1,2,3, PAN Ding¹, LENG Tao², ZHU Haihua¹, CHEN Long¹, ZHOU Yousheng¹

1 Chongqing Key Laboratory of Cyberspace Security Monitoring,Governance,Chongqing University of Posts,Telecommunications,Chongqing 400065,China
2 Intelligent Policing Key Laboratory of Sichuan Province,Sichuan Police College,Luzhou,Sichuan 646000,China
3 Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2024-04-30 Revised:2024-08-26 Online:2025-02-15 Published:2025-02-17
About author:CHEN Zigang,born in 1978,Ph.D,asso-ciate professor,is a member of CCF(No.F8469M).His main research interests include Internet of Things,Internet of Vehicles security and forensics,intelligent security and forensics,data security and privacy protection.
LENG Tao,born in 1986,Ph.D,asso-ciate professor.His main research in-terests include threat hunting,advanced threat detection,forensic analysis and graph neural networks.
Supported by:
National Natural Science Foundation of China(62272076) and Opening Project of Intelligent Policing Key Laboratory of Sichuan Province(ZNJW2022KFZD002).

摘要/Abstract

摘要： 深度学习可解释性在发展的同时,也面临着安全性方面的巨大挑战。模型对输入数据的解释结果存在被恶意操纵攻击的风险,此攻击严重限制了可解释性技术的应用场景并阻碍了人类对模型的探索与认知。针对此问题,提出一种使用模型梯度作为相似性约束的解释鲁棒性对抗训练方法。首先,沿解释方向采样生成对抗训练数据;其次,结合训练过程中样本的梯度信息来计算采样数据解释之间的多种相似性指标,用以对模型正则化,平滑模型的曲率;最后,为验证所提出的解释鲁棒性对抗训练方法的有效性,在多个数据集和解释方法上进行验证,实验结果表明,所提方法在防御对抗解释样本上具有显著效果。

关键词: 深度学习, 可解释性, 对抗攻击, 对抗训练, 对抗样本

Abstract: While the interpretability of deep learning is developing,its security is also facing significant challenges.There is a risk that the interpretation results of the model on input data may be maliciously manipulated and attacked,which seriously affects the application scenarios of interpretability technology and hinders human exploration and cognition of the model.To address this issue,an interpretable robust adversarial training method using model gradients as similarity constraints is proposed.Firstly,adversarial training data is generated by sampling along the interpretation direction.Secondly,multiple similarity metrics between the interpretations of the sampled data are calculated by combining the gradient information of the samples during the training process,which is used to regularize the model and smooth its curvature.Finally,to verify the effectiveness of the proposed interpretable robust adversarial training method,it is validated on multiple datasets and interpretation methods.The experimental results show that the proposed method has a significant effect on defending against adversarial interpretation samples.

Key words: Deep learning, Interpretability, Adversarial attack, Adversarial training, Adversarial samples

中图分类号:

TP309

陈自刚, 潘鼎, 冷涛, 朱海华, 陈龙, 周由胜. 基于局部梯度平滑的解释鲁棒性对抗训练方法[J]. 计算机科学, 2025, 52(2): 374-379. https://doi.org/10.11896/jsjkx.240400210

CHEN Zigang, PAN Ding, LENG Tao, ZHU Haihua, CHEN Long, ZHOU Yousheng. Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing[J]. Computer Science, 2025, 52(2): 374-379. https://doi.org/10.11896/jsjkx.240400210

参考文献

[1]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,USA,2015:815-823.
[2]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016.
[3]TOBIYAMA S,YAMAGUCHI Y,SHIMADA H,et al.Mal-ware detection with deep neural network using process behavior[C]//2016 IEEE 40th Annual Computer Software and Applications Conference(COMPSAC).Atlanta,USA,2016:2:577-582.
[4]WANG J,LI J Z,WANG Z T,et al.An Interpretable Prediction Model for Heart Disease Risk Based on Improved Whale Optimized LightGBM[J].Journal of Beijing University of Posts and Telecommunications,2023,46(6):39-45.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[7]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[8]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA,2016:770-778.
[11]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA,2017:4700-4708.
[12]China Academy of Information and Communications Technology.Artificial Intelligence White Paper(2022) [R].Beijing:China Academy of Information and Communications Technology,2022.
[13]REGULATION P.General data protection regulation[J].In-touch,2018,25:1-5.
[14]GHORBANI A,ABID A,ZOU J.Interpretation of neural networks is fragile[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA,2019:3681-3688.
[15]SIMONYAN K,VEDALDI A,ZISSERMAN A.Deep insideconvolutional networks:Visualising image classification models and saliency maps[J].arXiv:1312.6034,2013.
[16]SUNDARARAJAN M,TALY A,YAN Q.Axiomatic attribution for deep networks[C]//International Conference on Machine Learning.Sydney,Australia,2017:3319-3328.
[17]SHRIKUMAR A,GREENSIDE P,KUNDAJE A.Learning important features through propagating activation differences[C]//International Conference on Machine Learning.Sydney,Australia,2017:3145-3153.
[18]DOMBROWSKI A K,ALBER M,ANDERS C,et al.Explanations can be manipulated and geometry is to blame[C]//Proceedings of the 33^rd International Conference on Neural Information Processing Systems.2019:13589-13600.
[19]ZHANG X Y,WANG N F,SHEN H,et al.Interpretable deep learning under fire[C]//29th {USENIX} Security Symposium({USENIX} Security 20).Virtual Event,2020:1659-1676.
[20]HEO J,JOO S,MOON T.Fooling neural network interpretations via adversarial model manipulation[C]//Proceedings of the 33^rd International Conference on Neural Information Processing Systems.2019:2925-2936.
[21]CHATTOPADHAY A,SARKAR A,HOWLADER P,et al.Grad-cam++:Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).Lake Tahoe,USA,2018:839-847.
[22]DOMBROWSKI A K,ANDERS C J,MÜLLER K R,et al.Towards robust explanations for deep neural networks[J].Pattern Recognition,2022,121:108194.
[23]PETSIUK V,DAS A,SAENKO K.Rise:Randomized inputsampling for explanation of black-box models[J].arXiv:1806.07421,2018.
[24]SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:The all convolutional net[J].arXiv:1412.6806,2014.
[25]BACH S,BINDER A,MONTAVON G,et al.On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J].PloS One,2015,10(7):e0130140.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于局部梯度平滑的解释鲁棒性对抗训练方法

Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0