基于局部梯度平滑的解释鲁棒性对抗训练方法

doi:10.11896/jsjkx.240400210

Computer Science ›› 2025, Vol. 52 ›› Issue (2): 374-379.doi: 10.11896/jsjkx.240400210

• Information Security • Previous Articles Next Articles

Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing

CHEN Zigang^1,2,3, PAN Ding¹, LENG Tao², ZHU Haihua¹, CHEN Long¹, ZHOU Yousheng¹

1 Chongqing Key Laboratory of Cyberspace Security Monitoring,Governance,Chongqing University of Posts,Telecommunications,Chongqing 400065,China
2 Intelligent Policing Key Laboratory of Sichuan Province,Sichuan Police College,Luzhou,Sichuan 646000,China
3 Key Laboratory of Cyberspace Big Data Intelligent Security, Ministry of Education, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received:2024-04-30 Revised:2024-08-26 Online:2025-02-15 Published:2025-02-17
About author:CHEN Zigang,born in 1978,Ph.D,asso-ciate professor,is a member of CCF(No.F8469M).His main research interests include Internet of Things,Internet of Vehicles security and forensics,intelligent security and forensics,data security and privacy protection.
LENG Tao,born in 1986,Ph.D,asso-ciate professor.His main research in-terests include threat hunting,advanced threat detection,forensic analysis and graph neural networks.
Supported by:
National Natural Science Foundation of China(62272076) and Opening Project of Intelligent Policing Key Laboratory of Sichuan Province(ZNJW2022KFZD002).

Abstract

Abstract: While the interpretability of deep learning is developing,its security is also facing significant challenges.There is a risk that the interpretation results of the model on input data may be maliciously manipulated and attacked,which seriously affects the application scenarios of interpretability technology and hinders human exploration and cognition of the model.To address this issue,an interpretable robust adversarial training method using model gradients as similarity constraints is proposed.Firstly,adversarial training data is generated by sampling along the interpretation direction.Secondly,multiple similarity metrics between the interpretations of the sampled data are calculated by combining the gradient information of the samples during the training process,which is used to regularize the model and smooth its curvature.Finally,to verify the effectiveness of the proposed interpretable robust adversarial training method,it is validated on multiple datasets and interpretation methods.The experimental results show that the proposed method has a significant effect on defending against adversarial interpretation samples.

Key words: Deep learning, Interpretability, Adversarial attack, Adversarial training, Adversarial samples

CLC Number:

TP309

CHEN Zigang, PAN Ding, LENG Tao, ZHU Haihua, CHEN Long, ZHOU Yousheng. Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing[J].Computer Science, 2025, 52(2): 374-379.

References

[1]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,USA,2015:815-823.
[2]BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[J].arXiv:1604.07316,2016.
[3]TOBIYAMA S,YAMAGUCHI Y,SHIMADA H,et al.Mal-ware detection with deep neural network using process behavior[C]//2016 IEEE 40th Annual Computer Software and Applications Conference(COMPSAC).Atlanta,USA,2016:2:577-582.
[4]WANG J,LI J Z,WANG Z T,et al.An Interpretable Prediction Model for Heart Disease Risk Based on Improved Whale Optimized LightGBM[J].Journal of Beijing University of Posts and Telecommunications,2023,46(6):39-45.
[5]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[6]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[7]CHEN L C,PAPANDREOU G,KOKKINOS I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(4):834-848.
[8]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[9]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[10]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,USA,2016:770-778.
[11]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely con-nected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,USA,2017:4700-4708.
[12]China Academy of Information and Communications Technology.Artificial Intelligence White Paper(2022) [R].Beijing:China Academy of Information and Communications Technology,2022.
[13]REGULATION P.General data protection regulation[J].In-touch,2018,25:1-5.
[14]GHORBANI A,ABID A,ZOU J.Interpretation of neural networks is fragile[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Hawaii,USA,2019:3681-3688.
[15]SIMONYAN K,VEDALDI A,ZISSERMAN A.Deep insideconvolutional networks:Visualising image classification models and saliency maps[J].arXiv:1312.6034,2013.
[16]SUNDARARAJAN M,TALY A,YAN Q.Axiomatic attribution for deep networks[C]//International Conference on Machine Learning.Sydney,Australia,2017:3319-3328.
[17]SHRIKUMAR A,GREENSIDE P,KUNDAJE A.Learning important features through propagating activation differences[C]//International Conference on Machine Learning.Sydney,Australia,2017:3145-3153.
[18]DOMBROWSKI A K,ALBER M,ANDERS C,et al.Explanations can be manipulated and geometry is to blame[C]//Proceedings of the 33^rd International Conference on Neural Information Processing Systems.2019:13589-13600.
[19]ZHANG X Y,WANG N F,SHEN H,et al.Interpretable deep learning under fire[C]//29th {USENIX} Security Symposium({USENIX} Security 20).Virtual Event,2020:1659-1676.
[20]HEO J,JOO S,MOON T.Fooling neural network interpretations via adversarial model manipulation[C]//Proceedings of the 33^rd International Conference on Neural Information Processing Systems.2019:2925-2936.
[21]CHATTOPADHAY A,SARKAR A,HOWLADER P,et al.Grad-cam++:Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).Lake Tahoe,USA,2018:839-847.
[22]DOMBROWSKI A K,ANDERS C J,MÜLLER K R,et al.Towards robust explanations for deep neural networks[J].Pattern Recognition,2022,121:108194.
[23]PETSIUK V,DAS A,SAENKO K.Rise:Randomized inputsampling for explanation of black-box models[J].arXiv:1806.07421,2018.
[24]SPRINGENBERG J T,DOSOVITSKIY A,BROX T,et al.Striving for simplicity:The all convolutional net[J].arXiv:1412.6806,2014.
[25]BACH S,BINDER A,MONTAVON G,et al.On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation[J].PloS One,2015,10(7):e0130140.

Related Articles 15

[1]	WANG Chanfei, YANG Jing, XU Yamei, HE Jiai. OFDM Index Modulation Signal Detection Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900122-6.
[2]	ZOU Ling, ZHU Lei, DENG Yangjun, ZHANG Hongyan. Source Recording Device Verification Forensics of Digital Speech Based on End-to-End DeepLearning [J]. Computer Science, 2025, 52(6A): 240800028-7.
[3]	WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.
[4]	LIU Chengming, LI Haixia, LI Shaochuan, LI Yinghao. Ensemble Learning Model for Stock Manipulation Detection Based on Multi-scale Data [J]. Computer Science, 2025, 52(6A): 240700108-8.
[5]	WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[6]	HAO Xu, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, WANG Jiamin, CHU Hongkun. Survey of Man-Machine Distance Detection Method in Construction Site [J]. Computer Science, 2025, 52(6A): 240700098-10.
[7]	CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[8]	GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
[9]	ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240900126-7.
[10]	HUANG Hong, SU Han, MIN Peng. Small Target Detection Algorithm in UAV Images Integrating Multi-scale Features [J]. Computer Science, 2025, 52(6A): 240700097-5.
[11]	ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[12]	GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[13]	TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[14]	RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[15]	FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0