联合视觉-文本特征的复合型触发器后门攻击

doi:10.11896/jsjkx.241200105

摘要/Abstract

摘要： 后门攻击指攻击者通过毒化数据集,隐蔽地诱导受害模型关联中毒数据和目标标签,对人工智能技术的可信和安全产生威胁。现有后门攻击方法普遍存在着有效性和隐蔽性之间顾此失彼的矛盾,有效性强的触发器隐蔽性差,反之,隐蔽性好的触发器有效性弱。针对该问题,提出一种联合视觉-文本特征的复合型触发器净标签后门攻击。复合型触发器由通用型和个性化两部分可学习的触发器叠加而成。复合型触发器的设计和优化均以块内像素值的同余为约束,旨在诱导受害模型捕捉同余规律,建立起触发器和目标标签的关联,形成后门。通用型触发器使得中毒图像的块内像素值对位权2同余,其信号形态对于所有的中毒图像单一固定;个性化触发器使得中毒图像的边缘像素值对LoSB(Lower Significant Bit)的位权同余,其信号特定于图像的边缘位置。两部分触发器相叠加,有利于兼顾有效性和隐蔽性。在此基础上,引入CLIP(Contrastive Language-Image Pre-training)模型,联合视觉和文本特征构建驱动复合型触发器训练的监督信号。预训练的CLIP模型具有较强的泛化能力,能够引导复合型触发器吸收异类的文本特征,起到弱化图像内容特征的作用,进一步增强触发器的有效性。在CIFAR-10,ImageNet,GTSRB这3个数据集上开展了实验。结果表明,所提方法能够抵御后门防御技术的侦测,在攻击成功率指标上平均超越次优方法2.48个百分点;在峰值信噪比、结构相似性度量、梯度幅度相似性偏差和学习感知图像块相似度4项指标上分别平均超越次优方法10.61%,0.31%,68.44%和46.38%。消融实验的结果验证了联合视觉和本文特征引导复合型触发器训练的优势,还验证了通用型和个性化两部分触发器对后门攻击的有效性和隐蔽性。

关键词: 后门攻击, 复合型触发器, 同余规律, CLIP模型

Abstract: A backdoor attack refers to an attack covertly poisoning the dataset,subtly inducing the victim model to associate the poisoned data with a target label,thereby posing a threat to the trustworthiness and security of artificial intelligence technologies.Existing backdoor attack methods generally face a trade-off between effectiveness and stealthiness.Triggers with high effectiveness tend to lack stealthiness,while those with good stealthiness tend to have weak effectiveness.To address this issue,this paper proposes a composite trigger for clean-label backdoor attack,which combines visual and textual features.The composite trigger is composed of two learnable triggers:a universal part and an individual part.During the design and optimization of the composite trigger,pixel values within patches are constrained to follow a congruence rule.This constraint aims to induce the victim model to capture the congruence,thereby establishing an association between the trigger and the target label,forming a backdoor.The universal trigger ensures that pixel values within patches of poisoned images are congruent modulo 2,maintaining a fixed signal pattern across all poisoned images.The individual trigger,on the other hand,ensures that edge pixel values of poisoned images are congruent with respect to the weight of the LoSB,rendering its signal specific to the edge positions of each image.The two parts of the trigger are integrated to balance both effectiveness and stealthiness.Building on this,this paper introduces the CLIP model,which combines visual and textual features to construct the supervisory signal for training the composite trigger.The pre-trained CLIP model has strong generalization capabilities,enabling the composite trigger to absorb disparate textual features,which helps diminish the image content features and further enhances the trigger’s effectiveness.Experiments are conducted on three datasets:CIFAR-10,ImageNet,and GTSRB.The results show that the proposed method can evade detection by backdoor defense techniques and outperforms the second-best method by an average of 2.48 percentage points in terms of attack success rate.Additionally,it surpasses the second-best method by an average of 10.61%,0.31%,68.44%,and 46.38% in peak signal-to-noise ratio(PSNR),structural similarity index(SSIM),gradient magnitude similarity deviation(GMSD),and learned perceptual image patch similarity(LPIPS),respectively.The results of the ablation experiments demonstrate the advantage of combining visual and textual features in guiding the training of the composite trigger.These results also validate the roles of both the universal and individual triggers in enhancing the effectiveness and stealthiness of the backdoor attack.

Key words: Backdoor attack, Composite trigger, Congruence rule, CLIP model

中图分类号:

TP391

黄荣, 唐迎春, 周树波, 蒋学芹. 联合视觉-文本特征的复合型触发器后门攻击[J]. 计算机科学, 2026, 53(1): 382-394. https://doi.org/10.11896/jsjkx.241200105

HUANG Rong, TANG Yingchun, ZHOU Shubo , JIANG Xueqin. Composite Trigger Backdoor Attack Combining Visual and Textual Features[J]. Computer Science, 2026, 53(1): 382-394. https://doi.org/10.11896/jsjkx.241200105

参考文献

[1]YUAN L,CHEN Y,CUI G,et al.Revisiting Out-of-Distribution Robustness in NLP:Benchmarks,Analysis,and LLMs Evaluations[C]//Advances in Neural Information Processing Systems.MIT,2023:58478-58507.
[2]CHEN J,TAM D,RAFFEL C,et al.An Empirical Survey of Data Augmentation for Limited Data Learning in NLP[J].Transactions of the Association for Computational Linguistics,2023,11:191-211.
[3]LENG Y,TAN X,ZHU L,et al.Fastcorrect:Fast Error Correction with Edit Alignment for Automatic Speech Recognition[C]//Advances in Neural Information Processing Systems.MIT,2021:21708-21719.
[4]KHEDDAR H,HEMIS M,HIMEUR Y.Automatic SpeechRecognition Using Advanced Deep Learning Approaches:A Survey[J].Information Fusion,2024,109:102422.
[5]YU J,YIN H,XIA X,et al.Self-Supervised Learning for Re-commender Systems:A Survey[J].2023 IEEE Transactions on Knowledge and Data Engineering,2023,36(1):335-355.
[6]RAJPUT S,MEHTA N,SINGH A,et al.Recommender Sys-tems with Generative Retrieval[C]//Advances in Neural Information Processing Systems.MIT,2023:10299-10315.
[7]QIU H,YU B,GONG D,et al.Synface:Face Recognition with Synthetic Data[C]//Proceedings of the IEEE/CVF Internatio-nal Conference on Computer Vision.IEEE,2021:10880-10890.
[8]ZHANG S,GONG Y H,WANG J J.The Development of Deep Convolution Neural Network and Its Applications on Computer Vision[J].Chinese Journal of Computers,2018,41(7):1619-1647.
[9]DU W,LIU G S.A Survey of Backdoor Attack in Deep Learning[J].Journal of Cyber Security,2022,7(3):1-16.
[10]HUANG S X,ZHANG Q X,WANG Y J,et al.Research Progress of Backdoor Attacks in Deep Neural Networks[J].Computer Science,2023,50(9):52-61.
[11]GU T Y,LIU K,DOLAN-GAVITT B,et al.BadNets:Evaluating Backdooring Attacks on Deep Neural Networks[J].IEEE Access,2019,7:47230-47244.
[12]CHEN X Y,LIU C,LI B,et al.Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning[J].arXiv:1712.05526,2017.
[13]NGUYEN A,TRAN A.WaNet-Imperceptible Warping-BasedBackdoor Attack[C]//The 9th International Conference on Learning Representations.2021.
[14]LI S,XUE M,ZHAO B Z H,et al.Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization[J].IEEE Transactions on Dependable and Secure Computing,2020,18(5):2088-2105.
[15]LI Y M,LI Y M,WU B Y,et al.Invisible Backdoor Attack with Sample-Specific Triggers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:16463-16472.
[16]ZHONG N,QIAN Z,ZHANG X.Imperceptible Backdoor Attack:From Input Space to Feature Representation[C]//Proceedings of the 31th International Joint Conference on Artificial Intelligence.Morgan Kaufmann,2022:1736-1742.
[17]ZHANG J,DONGDONG C,HUANG Q,et al.Poison Ink:Robust and Invisible Backdoor Attack[J].IEEE Transactions on Image Processing,2022,31:5691-5705.
[18]TURNER A,TSIPRAS D,MADRY A.Label-Consistent Backdoor Attacks[J].arXiv:1912.02771,2019.
[19]BARNI M,KALLAS K,TONDI B.A New Backdoor Attack in CNNS by Training Set Corruption without Label Poisoning[C]//2019 IEEE International Conference on Image Proces-sing.IEEE,2019:101-105.
[20]LIU Y,MA X,BAILEY J,et al.Reflection Backdoor:A Natural Backdoor Attack on Deep Neural Networks[C]//The 16th European Conference on Computer Vision.Springer,2020:182-199.
[21]NING R,LI J,XIN C S,et al.Invisible Poison:A BlackboxClean Label Backdoor Attack to Deep Neural Networks[C]//Proceedings of 2021 IEEE Conference on Computer Communications.IEEE,2021:1-10.
[22]ZHU S W,LUO G,WEI P,et al.Image-Imperceptible Backdoor Attacks[J].Journal of Image and Graphics,2023,28(3):864-877.
[23]SAHA A,SUBRAMANYA A,PIRS-IAVASH H.Hidden Trigger Backdoor Attacks[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence.AAAI,2020:11957-11965.
[24]SOURI H,FOWL L,CHELLAPPA R,et al.Sleeper Agent:Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch[C]//Advances in Neural Information Processing Systems.MIT,2022:19165-19178.
[25]XU C,LIU W,ZHENG Y,et al.An Imperceptible Data Augmentation Based Blackbox Clean-Label Backdoor Attack on Deep Neural Networks[J].IEEE Transactions on Circuits and Systems,2023,70(12):2011-5024.
[26]RADFORD A,KIM J W,HALLACY C,et al.Learning Transferable Visual Models from Natural Language Supervision[C]//Proceedings of the 38th International Conference on Machine Learning.2021:8748-8763.
[27]TRAN B,LI J,MADRY A.Spectral Signatures in BackdoorAttacks[C]//Advances in Neural Information Processing Systems.MIT,2018:8011-8021.
[28]GAO Y S,XU C G,WANG D R,et al.STRIP:A Defenceagainst Trojan Attacks on Deep Neural Networks[C]//The 35th Annual Computer Security Applications Conference.IEEE,2019:113-125.
[29]CHOU E,TRAMER F,PELLEGRINO G.SentiNet:Detecting Localized Universal Attacks Against Deep Learning Systems[C]//2020 IEEE Security and Privacy Workshops.IEEE,2020:48-54.
[30]SELVARAJUR R,COGSWELL M,DAS A,et al.Grad-Cam:Visual Explanations from Deep Networks via Gradient-Based Localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2017:618-626.
[31]DOAN B G,ABBASNEJAD E,RANASINGHE D C.Februus:Input Purifycation Defense against Trojan Attacks on Deep Neural Network Systems[C]//Annual Computer Security Applications Conference.ACM,2020:897-912.
[32]CHEN B,CARVALHO W,BARACALDO N,et al.DetectingBackdoor Attacks on Deep Neural Networks by Activation Clustering[C]//2019 Proceedings of the Workshop on Artificial Intelligence.AAAI,2019.
[33]LI Y,LYU X,KOREN N,et al.Anti-backdoor Learning:Training Clean Models on Poisoned Data[C]//Advances in Neural Information Processing Systems.MIT,2021:14900-14912.
[34]ZHENG R,TANG R,LI J,et al.Pre-activation DistributionsExpose Backdoor Neurons[C]//Advances in Neural Information Processing Systems.MIT,2022:18667-18680.
[35]LIU K,DOLAN-GAVITT B,GARG S.Fine-Pruning:Defending against Backdooring Attacks on Deep Neural Networks[C]//Research in Attacks,Intrusions,and Defenses.Springer,2018:273-294.
[36]WANG B L,YAO Y S,SHAN S,et al.Neural Cleanse:Identi-fying and Mitigating Backdoor Attacks in Neural Networks[C]//2019 IEEE Symposium on Security and Privacy.IEEE,2019:707-723.
[37]ZENG Y,CHEN S,PARK W,et al.Adversarial Unlearning of Backdoors via Implicit Hypergradient[C]//The 10th International Conference on Learning Representations.2022.
[38]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images.[EB/OL].[2024-12-13].https://www.cs.Utoronto.ca/~kriz/learning-features-2009-TR.pdf.
[39]DENG J,DONG W,SOCHER R,et al.ImageNet:A Large-Scale Hierarchical Image Database[C]//Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[40]STALLKAMP J,SCHLIPSING M,SALMEN J,et al.The German Traffic Sign Recognition Benchmark:A Multi-Class Classification Competition[C]//The 2011 International Joint Confe-rence on Neural Networks.IEEE,2011:1453-1460.
[41]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].arXiv:1409.1556,2014.
[42]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of 2016 IEEE Confe-rence on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[43]HUANG G,LIU Z,VAN DER MAATEN L,et al.DenselyConnected Convolutional Networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017:4700-4708.
[44]ZAGORUYKO S,KOMODAKIS N.Wide Residual Networks[J].arXiv:1605.07146,2016.
[45]LOSHCHILOV I,HUTTER F.SGDR:Stochastic Gradient Descent with Warm Restarts[C]//The 5th International Confe-rence on Learning Representations.2017.
[46]HUYNH-THU Q,GHANBARI M.Scope of Validity of PSNR in Image/Video Quality Assessment[J].Electronics Letters,2008,44(13):800-801.
[47]WANG Z,BOVIK A C,SHEIKH H R,et al.Image Quality Assessment:From Error Visibility to Structural Similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[48]XUE W,ZHANG L,MOU X,et al.Gradient Magnitude Similarity Deviation:A Highly Efficient Perceptual Image Quality Index[J].IEEE Transactions on Image Processing,2013,23(2):684-695.
[49]ZHANG R,ISOLA P,EFROS A A,et al.The UnreasonableEffectiveness of Deep Features as A Perceptual Metric[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2018:586-595.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed