基于带毒分类器的自监督后门攻击防御方法

doi:10.11896/jsjkx.240100005

计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 336-342.doi: 10.11896/jsjkx.240100005

基于带毒分类器的自监督后门攻击防御方法

王一飞¹, 张胜杰¹, 薛迪展², 钱胜胜²

1 郑州大学河南先进技术研究院郑州 450000
2 中国科学院自动化研究所多模态人工智能系统全国重点实验室北京 100190

收稿日期:2024-01-02 修回日期:2024-05-29 出版日期:2025-04-15 发布日期:2025-04-14
通讯作者: 钱胜胜(shengsheng.qian@nlpr.ia.ac.cn)
作者简介:(wang_fei@gs.zzu.edu.cn)
基金资助:
北京市自然科学基金(JQ23018)

Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier

WANG Yifei¹, ZHANG Shengjie¹, XUE Dizhan², QIAN Shengsheng²

1 Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China
2 State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China

Received:2024-01-02 Revised:2024-05-29 Online:2025-04-15 Published:2025-04-14
About author:WANG Yifei,born in 1997,postgra-duate.His main research interests include computer vision and natural language processing.
QIAN Shengsheng,born in 1991,Ph.D,associate professor.His main research interests include data mining and multimedia content analysis.
Supported by:
Beijing Natural Science Foundation(JQ23018).

摘要/Abstract

摘要： 近年来,自监督学习网络(Self-Supervised Learning,SSL)在深度学习领域迅速崛起,成为该领域发展的主要动力,特别是预训练图像模型和大规模语言模型(Large Language Model,LLM)的出现,引起了全球范围内的广泛关注。但是最近的研究发现,自监督学习网络容易受到后门攻击的影响。攻击者可以通过在训练数据集中加入少量带有恶意后门的样本,来操控预训练模型在下游任务中的表现。为了防御这种SSL后门攻击,提出了一种基于带毒分类器的自监督后门攻击防御方法,称为DPC(Defending by Poisoned Classifier)。通过获取在被污染数据集上训练的威胁模型,所提方法可以准确地检测出有毒样本。实验结果显示,假设屏蔽后门触发器可以有效地改变下游聚类模型的激活状态,DPC防御方法在实验中达到了91.5%的后门触发器检测召回率以及27.4%的精准率,超过了原来的SOTA方法。这表明该方法在检测潜在威胁方面具有出色的性能,为自监督学习网络的安全性提供了有效的保障。

关键词: 自监督网络, 人工智能防御, 后门攻击, 图像分类

Abstract: In recent years,the rapid ascension of Self-Supervised Learning(SSL) networks has become a pivotal force propelling advancements in the realm of deep learning.This surge in prominence is particularly evident with the introduction of pre-trained image models and large language models(LLM),capturing widespread attention on a global scale.However,amidst this progress,recent investigations have brought to light the susceptibility of self-supervised learning networks to backdoor attacks,posing a significant challenge to their robustness.The vulnerability arises from the potential manipulation of pre-trained models’ perfor-mance on downstream tasks through the incorporation of a limited number of training samples carrying malicious backdoors into the training dataset.Recognizing the critical need to fortify against such SSL backdoor attacks,our response comes in the form of a novel defense mechanism known as defending by poisoned classifier(DPC),leveraging the capabilities of a poisoned classifier.DPC operates by training a threat model on a dataset intentionally contaminated with adversarial samples.This strategic approach enables our method to accurately identify and detect toxic samples,thereby establishing a formidable defense against potential threats embedded within the training data.The experimental outcomes are compelling,showcasing that assuming the blocking of the backdoor trigger can effectively modify the activation state of downstream clustering models,DPC defence achieves a 91.5% recall rate for backdoor trigger detection and a 27.4% precision rate in our experiments,outperforming the original SOTA me-thod.These results underscore the effectiveness of the proposed method is not only fortifying self-supervised learning networks against potential threats but also in elevating their overall security posture.By providing a robust defense mechanism,DPC contri-butes significantly to ensuring the integrity and reliability of self-supervised learning models in the face of evolving challenges in the dynamic landscape of deep learning.

Key words: Self-supervised networks, Artificial intelligence defence, Backdoor attacks, Image classification

中图分类号:

TP391

王一飞, 张胜杰, 薛迪展, 钱胜胜. 基于带毒分类器的自监督后门攻击防御方法[J]. 计算机科学, 2025, 52(4): 336-342. https://doi.org/10.11896/jsjkx.240100005

WANG Yifei, ZHANG Shengjie, XUE Dizhan, QIAN Shengsheng. Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier[J]. Computer Science, 2025, 52(4): 336-342. https://doi.org/10.11896/jsjkx.240100005

参考文献

[1]JAISWAL A,BABU A R,ZADEH M Z,et al.A survey on contrastive self-supervised learning[J].Technologies,2020,9(1):2.
[2]KRISHNAN R,RAJPURKAR P,TOPOL E J.Self-supervised learning in medicine and healthcare[J].Nature Biomedical Engineering,2022,6(12):1346-1352.
[3]LIU X,ZHANG F,HOU Z,et al.Self-supervised learning:Generative or contrastive[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):857-876.
[4]MISRA I,MAATEN L.Self-supervised learning of pretext-in-variant representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6707-6717.
[5]SCHIAPPA M C,RAWAT Y S,SHAH M.Self-supervisedlearning for videos:A survey[J].ACM Computing Surveys,2023,55(13s):1-37.
[6]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[7]CHEN X,FAN H,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020.
[8]CHEN X,XIE S,HE K.An empirical study of training self-supervised vision transformers[C]//CVF International Conference on Computer Vision(ICCV).IEEE,2021:9620-9629.
[9]GRILL J B,STRUB F,ALTCHÉ F,et al.Bootstrap your own latenta new approach to self-supervised learning[J].Advances in Neural Information Processing Systems,2020,33:21271-21284.
[10]HE K,FAN H,WU Y,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738.
[11]CARLINI N,TERZIS A.Poisoning and backdooring contrastive learning[J].arXiv:2106.09667,2021.
[12]SAHA A,TEJANKAR A,KOOHPAYEGANI S A,et al.Backdoor attacks on self-supervised learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13337-13346.
[13]LIU M,SANGIOVANNI-VINCENTELLI A,YUE X.Beating Backdoor Attack at Its Own Game[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4620-4629.
[14]MU B,NIU Z,WANG L,et al.Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20495-20503.
[15]PANG L,SUN T,LING H,et al.Backdoor cleansing with unlabeled data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12218-12227.
[16]QI X,XIE T,WANG J T,et al.Towards a proactive ML approach for detecting backdoor poison samples[C]//32nd USENIX Security Symposium(USENIX Security 23).2023:1685-1702.
[17]TEJANKAR A,SANJABI M,WANG Q,et al.DefendingAgainst Patch-based Backdoor Attacks on Self-Supervised Lear-ning[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2023:12239-12249.
[18]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:618-626.
[19]DOSOVITSKIY A,SPRINGENBERG J T,RIEDMILLER M,et al.Discriminative unsupervised feature learning with convolutional neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:766-774.
[20]GIDARIS S,SINGH P,KOMODAKIS N.Unsupervised representation learning by predicting image rotations[J].arXiv:1803.07728,2018.
[21]NOROOZI M,FAVARO P.Unsupervised learning of visual representations by solving jigsaw puzzles[C]//European Confe-rence on Computer Vision.Cham:Springer International Publi-shing,2016:69-84.
[22]WU Z,XIONG Y,YU S X,et al.Unsupervised feature learning via non-parametric instance discrimination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3733-3742.
[23]ZHANG R,ISOLA P,EFROS A A.Colorful image colorization[C]//Computer Vision-ECCV 2016:14th European Confe-rence,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part III 14.Springer International Publishing,2016:649-666.
[24]CARON M,MISRA I,MAIRAL J,et al.Unsupervised learning of visual features by contrasting cluster assignments[J].Advances in Neural Information Processing Systems,2020,33:9912-9924.
[25]CHUANG C Y,ROBINSON J,LIN Y C,et al.Debiased contrastive learning[J].Advances in Neural Information Processing Systems,2020,33:8765-8775.
[26]SHAH A,SRA S,CHELLAPPA R,et al.Max-margin contrastive learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(8):8220-8230.
[27]YOU Y,CHEN T,SUI Y,et al.Graph contrastive learning with augmentations[J].Advances in Neural Information Processing Systems,2020,33:5812-5823.
[28]JIA J,LIU Y,GONG N Z.Badencoder:Backdoor attacks to pre-trained encoders in self-supervised learning[C]//2022 IEEE Symposium on Security and Privacy(SP).IEEE,2022:2043-2059.
[29]TAO G,WANG Z,FENG S,et al.Distribution preserving backdoor attack in self-supervised learning[C]//2024 IEEE Symposium on Security and Privacy(SP).IEEE Computer Society,2023.
[30]WANG Q,YIN C,FANG L,et al.SSL-OTA:Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection[J].arXiv:2401.00137,2023.
[31]LI C,PANG R,XI Z,et al.An embarrassingly simple backdoor attack on self-supervised learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4367-4378.
[32]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[33]HUANG K,LI Y,WU B,et al.Backdoor defense via decoupling the training process[J].arXiv:2202.03423,2022.
[34]MIN R,QIN Z,SHEN L,et al.Towards stable backdoor purification through feature shift tuning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems.2024:75286-75306.
[35]XU Q,TAO G,HONORIO J,et al.MEDIC:Remove ModelBackdoors via Importance Driven Cloning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20485-20494.
[36]ZHANG Z,LIU Q,WANG Z,et al.Backdoor Defense via Deconfounded Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12228-12238.
[37]ZHU M,WEI S,ZHA H,et al.Neural polarizer:A lightweight and effective backdoor defense via purifying poisoned features[C]//NeurIPS 2023.2023.
[38]BANSAL H,SINGHI N,YANG Y,et al.CleanCLIP:Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning[J].arXiv:2303.03323,2023.
[39]HONG S,CHANDRASEKARAN V,KAYA Y,et al.On the effectiveness of mitigating data poisoning attacks with gradient shaping[J].arXiv:2002.11497,2020.
[40]YUN S,HAN D,CHUN S,et al.CutMix:Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6023-6032.
[41]CHATTOPADHAY A,SARKAR A,HOWLADER P,et al.Grad-CAM++:Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2018:839-847.
[42]JIANG P T,ZHANG C B,HOU Q,et al.Layercam:Exploring hierarchical class activation maps for localization[J].IEEE Transactions on Image Processing,2021,30:5875-5888.
[43]WANG H,WANG Z,DU M,et al.Score-CAM:Score-weighted visual explanations for convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:24-25.
[44]TIAN Y,KRISHNAN D,ISOLA P.Contrastive multiview co-ding[C]//Computer Vision-ECCV 2020:16th European Confe-rence,Glasgow,UK,August 23-28,2020,Proceedings,Part XI 16.Springer International Publishing,2020:776-794.
[45]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于带毒分类器的自监督后门攻击防御方法

Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0