Computer Science ›› 2025, Vol. 52 ›› Issue (4): 336-342.doi: 10.11896/jsjkx.240100005

• Information Security • Previous Articles     Next Articles

Self-supervised Backdoor Attack Defence Method Based on Poisoned Classifier

WANG Yifei1, ZHANG Shengjie1, XUE Dizhan2, QIAN Shengsheng2   

  1. 1 Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China
    2 State Key Laboratory of Multimodal Artificial Intelligence Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2024-01-02 Revised:2024-05-29 Online:2025-04-15 Published:2025-04-14
  • About author:WANG Yifei,born in 1997,postgra-duate.His main research interests include computer vision and natural language processing.
    QIAN Shengsheng,born in 1991,Ph.D,associate professor.His main research interests include data mining and multimedia content analysis.
  • Supported by:
    Beijing Natural Science Foundation(JQ23018).

Abstract: In recent years,the rapid ascension of Self-Supervised Learning(SSL) networks has become a pivotal force propelling advancements in the realm of deep learning.This surge in prominence is particularly evident with the introduction of pre-trained image models and large language models(LLM),capturing widespread attention on a global scale.However,amidst this progress,recent investigations have brought to light the susceptibility of self-supervised learning networks to backdoor attacks,posing a significant challenge to their robustness.The vulnerability arises from the potential manipulation of pre-trained models’ perfor-mance on downstream tasks through the incorporation of a limited number of training samples carrying malicious backdoors into the training dataset.Recognizing the critical need to fortify against such SSL backdoor attacks,our response comes in the form of a novel defense mechanism known as defending by poisoned classifier(DPC),leveraging the capabilities of a poisoned classifier.DPC operates by training a threat model on a dataset intentionally contaminated with adversarial samples.This strategic approach enables our method to accurately identify and detect toxic samples,thereby establishing a formidable defense against potential threats embedded within the training data.The experimental outcomes are compelling,showcasing that assuming the blocking of the backdoor trigger can effectively modify the activation state of downstream clustering models,DPC defence achieves a 91.5% recall rate for backdoor trigger detection and a 27.4% precision rate in our experiments,outperforming the original SOTA me-thod.These results underscore the effectiveness of the proposed method is not only fortifying self-supervised learning networks against potential threats but also in elevating their overall security posture.By providing a robust defense mechanism,DPC contri-butes significantly to ensuring the integrity and reliability of self-supervised learning models in the face of evolving challenges in the dynamic landscape of deep learning.

Key words: Self-supervised networks, Artificial intelligence defence, Backdoor attacks, Image classification

CLC Number: 

  • TP391
[1]JAISWAL A,BABU A R,ZADEH M Z,et al.A survey on contrastive self-supervised learning[J].Technologies,2020,9(1):2.
[2]KRISHNAN R,RAJPURKAR P,TOPOL E J.Self-supervised learning in medicine and healthcare[J].Nature Biomedical Engineering,2022,6(12):1346-1352.
[3]LIU X,ZHANG F,HOU Z,et al.Self-supervised learning:Generative or contrastive[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):857-876.
[4]MISRA I,MAATEN L.Self-supervised learning of pretext-in-variant representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:6707-6717.
[5]SCHIAPPA M C,RAWAT Y S,SHAH M.Self-supervisedlearning for videos:A survey[J].ACM Computing Surveys,2023,55(13s):1-37.
[6]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[7]CHEN X,FAN H,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020.
[8]CHEN X,XIE S,HE K.An empirical study of training self-supervised vision transformers[C]//CVF International Conference on Computer Vision(ICCV).IEEE,2021:9620-9629.
[9]GRILL J B,STRUB F,ALTCHÉ F,et al.Bootstrap your own latenta new approach to self-supervised learning[J].Advances in Neural Information Processing Systems,2020,33:21271-21284.
[10]HE K,FAN H,WU Y,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738.
[11]CARLINI N,TERZIS A.Poisoning and backdooring contrastive learning[J].arXiv:2106.09667,2021.
[12]SAHA A,TEJANKAR A,KOOHPAYEGANI S A,et al.Backdoor attacks on self-supervised learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13337-13346.
[13]LIU M,SANGIOVANNI-VINCENTELLI A,YUE X.Beating Backdoor Attack at Its Own Game[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4620-4629.
[14]MU B,NIU Z,WANG L,et al.Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20495-20503.
[15]PANG L,SUN T,LING H,et al.Backdoor cleansing with unlabeled data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12218-12227.
[16]QI X,XIE T,WANG J T,et al.Towards a proactive ML approach for detecting backdoor poison samples[C]//32nd USENIX Security Symposium(USENIX Security 23).2023:1685-1702.
[17]TEJANKAR A,SANJABI M,WANG Q,et al.DefendingAgainst Patch-based Backdoor Attacks on Self-Supervised Lear-ning[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2023:12239-12249.
[18]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-cam:Visual explanations from deep networks via gradient-based localization[C]//Proceedings of the IEEE International Confe-rence on Computer Vision.2017:618-626.
[19]DOSOVITSKIY A,SPRINGENBERG J T,RIEDMILLER M,et al.Discriminative unsupervised feature learning with convolutional neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:766-774.
[20]GIDARIS S,SINGH P,KOMODAKIS N.Unsupervised representation learning by predicting image rotations[J].arXiv:1803.07728,2018.
[21]NOROOZI M,FAVARO P.Unsupervised learning of visual representations by solving jigsaw puzzles[C]//European Confe-rence on Computer Vision.Cham:Springer International Publi-shing,2016:69-84.
[22]WU Z,XIONG Y,YU S X,et al.Unsupervised feature learning via non-parametric instance discrimination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3733-3742.
[23]ZHANG R,ISOLA P,EFROS A A.Colorful image colorization[C]//Computer Vision-ECCV 2016:14th European Confe-rence,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part III 14.Springer International Publishing,2016:649-666.
[24]CARON M,MISRA I,MAIRAL J,et al.Unsupervised learning of visual features by contrasting cluster assignments[J].Advances in Neural Information Processing Systems,2020,33:9912-9924.
[25]CHUANG C Y,ROBINSON J,LIN Y C,et al.Debiased contrastive learning[J].Advances in Neural Information Processing Systems,2020,33:8765-8775.
[26]SHAH A,SRA S,CHELLAPPA R,et al.Max-margin contrastive learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(8):8220-8230.
[27]YOU Y,CHEN T,SUI Y,et al.Graph contrastive learning with augmentations[J].Advances in Neural Information Processing Systems,2020,33:5812-5823.
[28]JIA J,LIU Y,GONG N Z.Badencoder:Backdoor attacks to pre-trained encoders in self-supervised learning[C]//2022 IEEE Symposium on Security and Privacy(SP).IEEE,2022:2043-2059.
[29]TAO G,WANG Z,FENG S,et al.Distribution preserving backdoor attack in self-supervised learning[C]//2024 IEEE Symposium on Security and Privacy(SP).IEEE Computer Society,2023.
[30]WANG Q,YIN C,FANG L,et al.SSL-OTA:Unveiling Backdoor Threats in Self-Supervised Learning for Object Detection[J].arXiv:2401.00137,2023.
[31]LI C,PANG R,XI Z,et al.An embarrassingly simple backdoor attack on self-supervised learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4367-4378.
[32]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning.PMLR,2021:8748-8763.
[33]HUANG K,LI Y,WU B,et al.Backdoor defense via decoupling the training process[J].arXiv:2202.03423,2022.
[34]MIN R,QIN Z,SHEN L,et al.Towards stable backdoor purification through feature shift tuning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems.2024:75286-75306.
[35]XU Q,TAO G,HONORIO J,et al.MEDIC:Remove ModelBackdoors via Importance Driven Cloning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20485-20494.
[36]ZHANG Z,LIU Q,WANG Z,et al.Backdoor Defense via Deconfounded Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:12228-12238.
[37]ZHU M,WEI S,ZHA H,et al.Neural polarizer:A lightweight and effective backdoor defense via purifying poisoned features[C]//NeurIPS 2023.2023.
[38]BANSAL H,SINGHI N,YANG Y,et al.CleanCLIP:Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning[J].arXiv:2303.03323,2023.
[39]HONG S,CHANDRASEKARAN V,KAYA Y,et al.On the effectiveness of mitigating data poisoning attacks with gradient shaping[J].arXiv:2002.11497,2020.
[40]YUN S,HAN D,CHUN S,et al.CutMix:Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:6023-6032.
[41]CHATTOPADHAY A,SARKAR A,HOWLADER P,et al.Grad-CAM++:Generalized gradient-based visual explanations for deep convolutional networks[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2018:839-847.
[42]JIANG P T,ZHANG C B,HOU Q,et al.Layercam:Exploring hierarchical class activation maps for localization[J].IEEE Transactions on Image Processing,2021,30:5875-5888.
[43]WANG H,WANG Z,DU M,et al.Score-CAM:Score-weighted visual explanations for convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:24-25.
[44]TIAN Y,KRISHNAN D,ISOLA P.Contrastive multiview co-ding[C]//Computer Vision-ECCV 2020:16th European Confe-rence,Glasgow,UK,August 23-28,2020,Proceedings,Part XI 16.Springer International Publishing,2020:776-794.
[45]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[1] SUN Jinyong, WANG Xuechun, CAI Guoyong, SHANG Zhiliang. Open Set Recognition Based on Meta Class Incremental Learning [J]. Computer Science, 2025, 52(5): 187-198.
[2] SUN Tanghui, ZHAO Gang, GUO Meiqian. Long-tail Distributed Medical Image Classification Based on Large Selective Nuclear Bilateral-branch Networks [J]. Computer Science, 2025, 52(4): 231-239.
[3] ZHANG Xin, ZHANG Han, NIU Manyu, JI Lixia. Adversarial Sample Detection in Computer Vision:A Survey [J]. Computer Science, 2025, 52(1): 345-361.
[4] TANG Ruiqi, XIAO Ting, CHI Ziqiu, WANG Zhe. Few-shot Image Classification Based on Pseudo-label Dependence Enhancement and NoiseInterferenceReduction [J]. Computer Science, 2024, 51(8): 152-159.
[5] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[6] CAO Yan, ZHU Zhenfeng. DRSTN:Deep Residual Soft Thresholding Network [J]. Computer Science, 2024, 51(6A): 230400112-7.
[7] SU Ruqi, BIAN Xiong, ZHU Songhao. Few-shot Images Classification Based on Clustering Optimization Learning [J]. Computer Science, 2024, 51(6A): 230300227-7.
[8] LI Xinrui, ZHANG Yanfang, KANG Xiaodong, LI Bo, HAN Junling. Intelligent Diagnosis of Brain Tumor with MRI Based on Ensemble Learning [J]. Computer Science, 2024, 51(6A): 230600043-7.
[9] LYU Yiming, WANG Jiyang. Iron Ore Image Classification Method Based on Improved Efficientnetv2 [J]. Computer Science, 2024, 51(6A): 230600212-6.
[10] ZENG Ruiren, XIE Jiangtao, LI Peihua. Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization [J]. Computer Science, 2024, 51(4): 254-261.
[11] WANG Wenjie, YANG Yan, JING Lili, WANG Jie, LIU Yan. LNG-Transformer:An Image Classification Network Based on Multi-scale Information Interaction [J]. Computer Science, 2024, 51(2): 189-195.
[12] ZHANG Feng, HUANG Shixin, HUA Qiang, DONG Chunru. Novel Image Classification Model Based on Depth-wise Convolution Neural Network andVisual Transformer [J]. Computer Science, 2024, 51(2): 196-204.
[13] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[14] WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160.
[15] XIE Qinqin, HE Lang, XU Ruli. Classification of Oil Painting Art Style Based on Multi-feature Fusion [J]. Computer Science, 2023, 50(3): 223-230.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!