计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 423-432.doi: 10.11896/jsjkx.240700202
• 信息安全 • 上一篇
熊熙1,2,3, 丁广政1,2,3, 王娟1,2,3, 张帅4
XIONG Xi1,2,3, DING Guangzheng1,2,3, WANG Juan1,2,3, ZHANG Shuai4
摘要: 在商业应用领域,基于深度学习的文本模型发挥着关键作用,但其亦被揭示易受对抗性样本的影响,例如通过在评论中夹杂混肴词汇以使模型做出错误响应。好的文本攻击算法不仅可以评估该类模型的鲁棒性,还能够检测现有防御方法的有效性,从而降低对抗性样本带来的潜在危害。鉴于目前黑盒环境下生成对抗文本的方法普遍存在对抗文本质量不高且攻击效率低下的问题,提出了一种基于单词替换的双重语义过滤(Dual-level Semantic Filtering,DLSF)攻击算法。其综合了目前存在的候选词集合获取方法,并有效避免了集合中不相关单词的干扰,丰富了候选词的类别和数量。在迭代搜索过程中采用双重过滤的束搜索策略,减少模型访问次数的同时,也能保证获取到最优的对抗文本。在文本分类和自然语言推理任务上的实验结果显示,该方法在提升对抗文本质量的同时,显著提高了攻击效率。具体来说,在IMDB数据集上的攻击成功率高达99.7%,语义相似度达到0.975,而模型访问次数仅为TAMPERS的17%。此外,目标模型在经过对抗样本进行对抗增强训练后,在MR数据集上的攻击成功率从92.9%降至65.4%,进一步验证了DLSF有效提升了文本模型的鲁棒性。
中图分类号:
[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[J].arXiv:1512.03385,2015. [2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018. [3]MA P,HALIASSOS A,FERNANDEZ-LOPEZ A,et al.Auto-AVSR:Audio-Visual Speech Recognition with Automatic Labels[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing.2023:1-5. [4]SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013. [5]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining andHarnessing Adversarial Examples[J].arXiv:1412.6572,2014. [6]MOOSAVI-DEZFOOLI S M,FAWZI A,FROSSARD P.DeepFool:A Simple and Accurate Method to Fool Deep Neural Networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2574-2582. [7]CHEN P Y,ZHANG H,SHARMA Y,et al.ZOO:Zeroth OrderOptimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.ACM,2017:15-26. [8]YUAN X,HE P,ZHU Q,et al.Adversarial Examples:Attacks and Defenses for Deep Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(9):2805-2824. [9]HOSSEINI H,KANNAN S,ZHANG B,et al.DeceivingGoogle's Perspective API Built for Detecting Toxic Comments[J].arXiv:1702.08138,2017. [10]PAPERNOT N,MCDANIEL P,SWAMI A,et al.Crafting Adversarial Input Sequences for Recurrent Neural Networks[J].arXiv:1604.08275,2016. [11]LIU H,XU Z,ZHANG X,et al.SSPAttack:A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13228-13235. [12]ZHAO X,ZHANG L,XU D,et al.Generating Textual Adversaries with Minimal Perturbation[J].arXiv:2211.06571,2022. [13]MAHESHWARY R,MAHESHWARY S,PUDI V.Generating Natural Language Attacks in a Hard Label Black Box Setting[J].arXiv:2012.14956,2020. [14]LI L,MA R,GUO Q,et al.BERT-ATTACK:Adversarial Attack Against BERT Using BERT[J].arXiv:2004.09984,2020. [15]ZHU H,ZHAO Q,WU Y.BeamAttack:Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces[J].arXiv:2303.07199,2023. [16]YOO J Y,MORRIS J X,LIFLAND E,et al.Searching for aSearch Method:Benchmarking Search Algorithms for Generating NLP Adversarial Examples[J].arXiv:2009.06368,2020. [17]JIN D,JIN Z,ZHOU J T,et al.Is BERT Really Robust?AStrong Baseline for Natural Language Attack on Text Classification and Entailment[C]//National Conference on Artificial Intelligence.2020:123-131. [18]REN S,DENG Y,HE K,et al.Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1085-1097. [19]ALZANTOT M,SHARMA Y,ELGOHARY A,et al.Generating Natural Language Adversarial Examples[J].arXiv:1804.07998,2018. [20]ZANG Y,QI F,YANG C,et al.Word-level Textual Adversarial Attacking as Combinatorial Optimization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6066-6080. [21]CHOI Y,KIM H,LEE J H.TABS:Efficient Textual Adversari-al Attack for Pre-trained NL Code Model Using Semantic Beam Search[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:5490-5498. [22]DONG Z,DONG Q,HAO C.HowNet and Its Computation of Meaning[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations.2010:53-56. [23]MILLER G A.WordNet:a lexical database for english[J].Communications of the ACM,1995,38(11):39-41. [24]MRKIĆ N, SÉAGHDHA D Ó,THOMSON B,et al.Counter-fitting Word Vectors to Linguistic Constraints[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:142-148. [25]GARG S,RAMAKRISHNAN G.BAE:BERT-based Adversarial Examples for Text Classification[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:6174-6181. [26]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[J].arXiv:1908.10084,2019. [27]SOCHER R,PERELYGIN A,WU J,et al.Recursive Deep Mo-dels for Semantic Compositionality Over a Sentiment Treebank[EB/OL].https://aclanthology.org/D13-1170.pdf. [28]PANG B,LEE L.Seeing stars:Exploiting class relationships for sentiment categorization with respect to rating scales[C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics.2005:115-124. [29]ZHANG X,LECUN Y.Text Understanding from Scratch[J].arXiv:1502.01710,2016. [30]MAAS A L,DALY R E,PHAM P T,et al.Learning Word Vectors for Sentiment Analysis[C]//Annual Meeting of the Association for Computational Linguistics.2011. [31]BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:632-642. [32]WILLIAMS A,NANGIA N,BOWMAN S.A Broad-CoverageChallenge Corpus for Sentence Understanding through Inference[EB/OL].https://aclanthology.org/N18-1101.pdf. [33]CER D,YANG Y,KONG S Y,et al.Universal Sentence Encoder[J].arXiv:1803.11175,2018. |
|