计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 404-418.doi: 10.11896/jsjkx.250600065
郭靖臣, 杨奎武, 丁梦迪, 魏江宏
GUO Jingchen, YANG Kuiwu, DING Mengdi, WEI Jianghong
摘要: 视觉Transformer(ViT)是一种突破传统卷积神经网络(CNN)局部感受野限制的新型架构,凭借其全局建模能力在计算机视觉领域取得了突破性进展。随着ViT在安全领域的应用激增,其与CNN的结构性差异使传统对抗攻击效能骤减,导致ViT真实脆弱性被掩盖、防御机制开发滞后,对抗攻击引发的模型安全风险正推动该领域研究成为热点。首先系统梳理了ViT对抗攻击方法的核心进展,分析了图像分块、位置编码、注意力机制等ViT特有结构对对抗样本攻击的影响;其次对面向ViT的对抗攻击方法进行分类,将现有关键的攻击方法划分为白盒攻击、基于迁移的黑盒攻击以及基于决策的黑盒攻击,并重点介绍针对模型结构的优化攻击、基于输入转换的攻击、基于积分梯度的攻击、针对下游任务的攻击以及针对模型对齐的5类黑盒迁移攻击研究进展;然后深入探讨了不同方法在扰动效率、跨模型迁移性方面的逐步演进,系统总结了各类攻击方法的核心优势与缺陷,揭示了攻击技术演进逻辑和模型缺陷为攻防技术的创新提供参考;最后,对未来的研究方向进行了分析和展望。
中图分类号:
| [1]SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[C]//Proceedings of the 2nd International Conference on Learning Representations(ICLR).La Jolla,CA:LCLR,2014. [2]GU J,TRESP V,QIN Y.Are Vision Transformers Robust to Patch Perturbations?[C]//Computer Vision-ECCV 2022.Cham:Springer Nature Switzerland,2022:404-421. [3]FU Y,ZHANG S,WU S,et al.Patch-Fool:Are Vision Transformers Always Robust Against Adversarial Perturbations?[J].arXiv:2203.08392,2022. [4]WEI Z,CHEN J,GOLDBLUM M,et al.Towards Transferable Adversarial Attacks on Vision Transformers[J].Proceedings of the AAAI Conference on Artificial Intelligence,2022,36(3):2668-2676. [5]MAHMOOD K,MAHMOOD R,VAN DIJK M.On the Robustness of Vision Transformers to Adversarial Examples[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2021:7818-7827. [6]BHANUSHALI A R,MUN H,YUN J.Adversarial Attacks on Automatic Speech Recognition(ASR):A Survey[J].IEEE Access,2024,12:88279-88302. [7]XU D Y,TIAN Y Z,CHEN K,et al.Survey on Adversarial Attack and Defense for Signal Modulation Recognition[J].Computer Research and Development,2025,62(7):1713-1737. [8]LIU D,YANG M,QU X,et al.A Survey of Attacks on Large Vision-Language Models:Resources,Advances,and Future Trends[J].arXiv:2407.07403,2024. [9]FAWOLE O,RAWAT D.Recent Advances in Vision Trans-former Robustness Against Adversarial Attacks in Traffic Sign Detection and Recognition:A Survey[J].ACM Computing Surveys,2025,57(10):1-33. [10]GOYAL S,DODDAPANENI S,KHAPRA M M,et al.A Survey of Adversarial Defenses and Robustness in NLP[J].ACM Computing Surveys,2023,55(14s):1-39. [11]KHURANA D,KOLI A,KHATTER K,et al.Natural language processing:state of the art,current trends and challenges[J].Multimedia Tools and Applications,2023,82(3):3713-3744. [12]KANCA E,AYAS S,BAYKAL KABLAN E,et al.Evaluating and enhancing the robustness of vision transformers against adversarial attacks in medical imaging[J].Medical & Biological Engineering & Computing,2025,63(3):673-690. [13]MADRY A,MAKELOV A,SCHMIDT L,et al.Towards Deep Learning Models Resistant to Adversarial Attacks[J].arXiv:1706.06083,2019. [14]CARLINI N,WAGNER D.Towards Evaluating the Robustness of Neural Networks[J].arXiv:1608.04644,2017. [15]DENG J,DONG W,SOCHER R,et al.ImageNet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255. [16]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale[J].arXiv:2010.11929,2020. [17]ZHOU D,KANG B,JIN X,et al.DeepViT:Towards Deeper Vision Transformer[J].arXiv:2103.11886,2021. [18]HUANG T,HUANG L,YOU S,et al.LightViT:TowardsLight-Weight Convolution-Free Vision Transformers[J].ar-Xiv:2207.05557,2022. [19]CHEN J,WU P,ZHANG X,et al.Add-Vit:CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing[J].Neural Processing Letters,2024,56(3):198. [20]KHAN A,RAUF Z,SOHAIL A,et al.A survey of the vision transformers and their CNN-transformer based variants[J].Artificial Intelligence Review,2023,56(3):2917-2970. [21]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[J].arXiv:2012.12877,2021. [22]YUAN L,CHEN Y,WANG T,et al.Tokens-to-Token ViT:Training Vision Transformers from Scratch on ImageNet[J].arXiv:2101.11986,2021. [23]LIU Z,LIN Y,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [24]HAN K,XIAO A,WU E,et al.Transformer in Transformer[J].arXiv:2103.00112,2021. [25]TANG S,GONG R,WANG Y,et al.RobustART:Benchmarking Robustness on Architecture Design and Training Techniques[J].arXiv:2109.05211,2022. [26]PAUL S,CHEN P Y.Vision Transformers Are Robust Lear-ners[J].Proceedings of the AAAI Conference on Artificial Intelligence,2022,36(2):2071-2081. [27]MAURÍCIO J,DOMINGUES I,BERNARDINO J.ComparingVision Transformers and Convolutional Neural Networks for Image Classification:A Literature Review[J].Applied Sciences,2023,13(9):5521. [28]GU J,TRESP V,QIN Y.Evaluating Model Robustness to Patch Perturbations[C]//ICML 2022 Shift Happens Workshop.2022. [29]BENZ P,HAM S,ZHANG C,et al.Adversarial RobustnessComparison of Vision Transformer and MLP-Mixer to CNNs[J].arXiv:2110.02797,2021. [30]SHAO R,SHI Z,YI J,et al.On the Adversarial Robustness of Vision Transformers[J].arXiv:2103.15670,2021. [31]KIM G,KIM J,LEE J S.Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective[C]//Procee-dings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:3976-3985. [32]JOSHI A,JAGATAP G,HEGDE C.Adversarial Token Attacks on Vision Transformers[J].arXiv:2110.04337,2021. [33]LOVISOTTO G,FINNIE N,MUNOZ M,et al.Give Me Your Attention:Dot-Product Attention Considered Harmful for Adversarial Patch Robustness[J].arXiv:2203.13639,2022. [34]NAVANEET K L,KOOHPAYEGANI S A,SLEIMAN E,et al.SlowFormer:Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2024:24786-24797. [35]NASEER M,RANASINGHE K,KHAN S,et al.On Improving Adversarial Transferability of Vision Transformers[J].arXiv:2106.04169,2021. [36]ZHANG J,HUANG Y,WU W,et al.Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16415-16424. [37]MING D,REN P,WANG Y,et al.Boosting the Transferability of Adversarial Attack on Vision Transformer with Adaptive Token Tuning[J].Advances in Neural Information Processing Systems,2024,37:20887-20918. [38]ZHANG J,HUANG Y,XU Z,et al.Improving the Adversarial Transferability of Vision Transformers with Virtual Dense Connection[J].Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(7):7133-7141. [39]WANG Y,WANG J,YIN Z,et al.Generating Transferable Adversarial Examples against Vision Transformers[C]//Procee-dings of the 30th ACM International Conference on Multimedia.ACM,2022:5181-5190. [40]GUO X,CHEN P,LU Z,et al.Towards transferable adversarial attacks on vision transformers for image classification[J].Journal of Systems Architecture,2024,152:103155. [41]WANG X,ZHANG Z,ZHANG J.Structure Invariant Transformation for better Adversarial Transferability[C]//2023 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2023:4584-4596. [42]ZHOU H,TAN Y,WANG Y,et al.Improving the Transferabi-lity of Adversarial Examples with Restructure Embedded Patches[J].arXiv:2204.12680,2022. [43]MA W,LI Y,JIA X,et al.Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks via Momentum Integrated Gradients[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:4630-4639. [44]REN Y,ZHAO Z,LIN C,et al.Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path[J].Proceedings of the AAAI Conference on Artificial Intelligence,2025,39(7):6731-6739. [45]BAN Y,DONG Y.Pre-trained Adversarial Perturbations[J].Advances in Neural Information Processing Systems,2022,35:1196-1209. [46]ZHOU Z,HU S,ZHAO R,et al.Downstream-agnostic Adver-sarial Examples[C]//2023 IEEE/CVF International Conference on Computer Vision(ICCV).2023:4322-4332. [47]ZHENG W,MA X,HUANG H,et al.Downstream TransferAttack:Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers[J].arXiv:2408.01705,2024. [48]CHEN Z,GUO H,JIANG K,et al.Boosting Adversarial Transferability with Spatial Adversarial Alignment[J].arXiv:2501.01015,2025. [49]SHI Y,HAN Y,TAN Y,et al.Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal[J].Advances in Neural Information Processing Systems,2022,35:12921-12933. [50]MUMCU F,YILMAZ Y.Sequential architecture-agnostic black-box attack design and analysis[J].Pattern Recognition,2024,147:110066. [51]ZHOU C,SHI X,WANG Y G.Query-Efficient Hard-LabelBlack-Box Attack against Vision Transformers[J].arXiv:2407.00389,2024. [52]BHOJANAPALLI S,CHAKRABARTI A,GLASNER D,et al.Understanding Robustness of Transformers for Image Classification[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).2021:10211-10221. [53]PAPERNOT N,MCDANIEL P,JHA S,et al.The Limitations of Deep Learning in Adversarial Settings[C]//2016 IEEE European Symposium on Security and Privacy(EuroS&P).IEEE,2016:372-387. [54]WIYATNO R,XU A.Maximal Jacobian-based Saliency MapAttack[J].arXiv:1808.07945,2018. [55]CHEN Z,XIE L,NIU J,et al.Visformer:The Vision-friendly Transformer[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).2021:569-578. [56]TOUVRON H,CORD M,SABLAYROLLES A,et al.Goingdeeper with Image Transformers[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2021:32-42. [57]WANG X,REN J,LIN S,et al.A Unified Approach to Interpreting and Boosting Adversarial Transferability[J].arXiv:2010.04055,2020. [58]XIE C,ZHANG Z,ZHOU Y,et al.Improving Transferability of Adversarial Examples With Input Diversity[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:2725-2734. [59]SUNDARARAJAN M,TALY A,YAN Q.Axiomatic Attribu-tion for Deep Networks[C]//Proceedings of the 34th International Conference on Machine Learning.PMLR,2017:3319-3328. [60]XIE C,ZHANG Z,ZHOU Y,et al.Improving Transferability of Adversarial Examples With Input Diversity[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:2725-2734. [61]DONG Y,PANG T,SU H,et al.Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2019:4307-4316. [62]LIN J,SONG C,HE K,et al.Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks[J].arXiv:1908.06281,2019. [63]MOOSAVI-DEZFOOLI S M,FAWZI A,FAWZI O,et al.Universal Adversarial Perturbations[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017:86-94. [64]MA A,FARAHMAND A M,PAN Y,et al.Improving Adversarial Transferability via Model Alignment[C]//LEONARDIS A,RICCI E,ROTH S,et al.Computer Vision-ECCV 2024.Cham:Springer Nature Switzerland,2025:74-92. [65]BRENDEL W,RAUBER J,BETHGE M.Decision-Based Ad-versarial Attacks:Reliable Attacks Against Black-Box Machine Learning Models[J].arXiv:1712.04248,2018. [66]TARTAKOVSKY A,NIKIFOROV I,BASSEVILLE M.Se-quential Analysis:Hypothesis testing and changepoint detection[M].CRC Press:2014. [67]CHENG M,SINGH S,CHEN P,et al.Sign-OPT:A Query-Efficient Hard-label Adversarial Attack[J].arXiv:1909.10773,2019. [68]ALAYRAC J B,DONAHUE J,LUC P,et al.Flamingo:a Visual Language Model for Few-Shot Learning[J].Advances in Neural Information Processing Systems,2022,35:23716-23736. [69]SHAO M.Designing Physical-World Universal Attacks on Vision Transformers[C]//Neurips Safe Generative AI Workshop 2024.2024. [70]LI X,ZHAO C,DENG X,et al.VTFR-AT:Adversarial Training with Visual Transformation and Feature Robustness[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2024,8(4):3129-3140. |
|
||