DLSF:基于双重语义过滤的文本对抗攻击方法

doi:10.11896/jsjkx.240700202

计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 423-432.doi: 10.11896/jsjkx.240700202

• 信息安全 • 上一篇

DLSF:基于双重语义过滤的文本对抗攻击方法

熊熙^1,2,3, 丁广政^1,2,3, 王娟^1,2,3, 张帅⁴

1 成都信息工程大学网络空间安全学院(芯谷产业学院) 成都 610225
2 先进密码技术与系统安全四川省重点实验室(芯谷产业学院) 成都 610225
3 先进微处理器技术国家工程研究中心(工业控制与安全分中心) 成都 610225
4 北京理工大学信息与电子学院北京 100081

收稿日期:2024-10-10 修回日期:2025-03-15 出版日期:2025-10-15 发布日期:2025-10-14
通讯作者: 王娟(jjmao2009@163.com)
作者简介:(flyxiongxi@gmail.com)
基金资助:
四川省科技计划项目(2024NSFSC2043,2024NSFSC1744,2024NSFSC1185);教育部人文社会科学研究基金(22YJAZH120)

DLSF:A Textual Adversarial Attack Method Based on Dual-level Semantic Filtering

XIONG Xi^1,2,3, DING Guangzheng^1,2,3, WANG Juan^1,2,3, ZHANG Shuai⁴

1 School of Cybersecurity(Xin Gu Industrial College),Chengdu University of Information Technology,Chengdu 610225,China
2 Advanced Cryptography and System Security Key Laboratory(Xin Gu Industrial College),Chengdu 610225,China
3 SUGON Industrial Control and Security Center,Chengdu 610225,China
4 School of Information and Electronics,Beijing Institute of Technology,Beijing 100081,China

Received:2024-10-10 Revised:2025-03-15 Online:2025-10-15 Published:2025-10-14
About author:XIONG Xi,born in 1983,Ph.D,professor,is a senior member of CCF(No.68561S).His main research interests include information security,natural language processing and information extraction.
WANG Juan,born in 1981,Ph.D,professor.Her main research interests include network and IoT security,AI security and industrial control system security.
Supported by:
Sichuan Province Science and Technology Program(2024NSFSC2043,2024NSFSC1744,2024NSFSC1185) and Foundation for Humanities and Social Sciences of Ministry of Education of China(22YJAZH120).

摘要/Abstract

摘要： 在商业应用领域,基于深度学习的文本模型发挥着关键作用,但其亦被揭示易受对抗性样本的影响,例如通过在评论中夹杂混肴词汇以使模型做出错误响应。好的文本攻击算法不仅可以评估该类模型的鲁棒性,还能够检测现有防御方法的有效性,从而降低对抗性样本带来的潜在危害。鉴于目前黑盒环境下生成对抗文本的方法普遍存在对抗文本质量不高且攻击效率低下的问题,提出了一种基于单词替换的双重语义过滤(Dual-level Semantic Filtering,DLSF)攻击算法。其综合了目前存在的候选词集合获取方法,并有效避免了集合中不相关单词的干扰,丰富了候选词的类别和数量。在迭代搜索过程中采用双重过滤的束搜索策略,减少模型访问次数的同时,也能保证获取到最优的对抗文本。在文本分类和自然语言推理任务上的实验结果显示,该方法在提升对抗文本质量的同时,显著提高了攻击效率。具体来说,在IMDB数据集上的攻击成功率高达99.7%,语义相似度达到0.975,而模型访问次数仅为TAMPERS的17%。此外,目标模型在经过对抗样本进行对抗增强训练后,在MR数据集上的攻击成功率从92.9%降至65.4%,进一步验证了DLSF有效提升了文本模型的鲁棒性。

关键词: 文本对抗攻击, 黑盒攻击, 束搜索, 鲁棒性, 文本模型

Abstract: In the field of commercial applications,deep learning-based text models play a crucial role but are also susceptible to adversarial samples,such as the incorporation of confusing vocabulary into reviews leading to erroneous model responses.A strong attack algorithm can assess the robustness of such models and test the effectiveness of existing defense methods,thereby reducing potential harms from adversarial samples.Considering the prevalent issues of low-quality adversarial texts and inefficient attack methods in black-box settings,this paper proposes a dual-level semantic filtering attack algorithm based on word substitution.This algorithm amalgamates existing methodologies for assembling candidate word sets,effectively eliminates interfe-rence from irrelevant words,and thereby enriches the variety and quantity of candidate words.It employs a dual-filter beam search strategy during the iterative search process,which not only reduces the frequency of model access,but also guarantees the acquisition of optimal adversarial texts.Experimental results on text classification and natural language inference tasks demonstrate that this method significantly enhances the quality of adversarial texts and attack efficiency.Specifically,the attack success rate on the IMDB dataset reaches 99.7%,semantic similarity reaches 0.975,with the number of model accesses being only 17% of those required by TAMPERS.Furthermore,after adversarial augmentation training with adversarial samples,the target model's attack success rate on the MR dataset decreases from 92.9% to 65.4%,further confirming that DLSF effectively enhances the robustness of the target model.

Key words: Textual adversarial attack,Black-box attack,Beam search,Robustness,Text model

中图分类号:

TP391

熊熙, 丁广政, 王娟, 张帅. DLSF:基于双重语义过滤的文本对抗攻击方法[J]. 计算机科学, 2025, 52(10): 423-432. https://doi.org/10.11896/jsjkx.240700202

XIONG Xi, DING Guangzheng, WANG Juan, ZHANG Shuai. DLSF:A Textual Adversarial Attack Method Based on Dual-level Semantic Filtering[J]. Computer Science, 2025, 52(10): 423-432. https://doi.org/10.11896/jsjkx.240700202

参考文献

[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[J].arXiv:1512.03385,2015.
[2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[3]MA P,HALIASSOS A,FERNANDEZ-LOPEZ A,et al.Auto-AVSR:Audio-Visual Speech Recognition with Automatic Labels[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing.2023:1-5.
[4]SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
[5]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining andHarnessing Adversarial Examples[J].arXiv:1412.6572,2014.
[6]MOOSAVI-DEZFOOLI S M,FAWZI A,FROSSARD P.DeepFool:A Simple and Accurate Method to Fool Deep Neural Networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2574-2582.
[7]CHEN P Y,ZHANG H,SHARMA Y,et al.ZOO:Zeroth OrderOptimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.ACM,2017:15-26.
[8]YUAN X,HE P,ZHU Q,et al.Adversarial Examples:Attacks and Defenses for Deep Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(9):2805-2824.
[9]HOSSEINI H,KANNAN S,ZHANG B,et al.DeceivingGoogle's Perspective API Built for Detecting Toxic Comments[J].arXiv:1702.08138,2017.
[10]PAPERNOT N,MCDANIEL P,SWAMI A,et al.Crafting Adversarial Input Sequences for Recurrent Neural Networks[J].arXiv:1604.08275,2016.
[11]LIU H,XU Z,ZHANG X,et al.SSPAttack:A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13228-13235.
[12]ZHAO X,ZHANG L,XU D,et al.Generating Textual Adversaries with Minimal Perturbation[J].arXiv:2211.06571,2022.
[13]MAHESHWARY R,MAHESHWARY S,PUDI V.Generating Natural Language Attacks in a Hard Label Black Box Setting[J].arXiv:2012.14956,2020.
[14]LI L,MA R,GUO Q,et al.BERT-ATTACK:Adversarial Attack Against BERT Using BERT[J].arXiv:2004.09984,2020.
[15]ZHU H,ZHAO Q,WU Y.BeamAttack:Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces[J].arXiv:2303.07199,2023.
[16]YOO J Y,MORRIS J X,LIFLAND E,et al.Searching for aSearch Method:Benchmarking Search Algorithms for Generating NLP Adversarial Examples[J].arXiv:2009.06368,2020.
[17]JIN D,JIN Z,ZHOU J T,et al.Is BERT Really Robust?AStrong Baseline for Natural Language Attack on Text Classification and Entailment[C]//National Conference on Artificial Intelligence.2020:123-131.
[18]REN S,DENG Y,HE K,et al.Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1085-1097.
[19]ALZANTOT M,SHARMA Y,ELGOHARY A,et al.Generating Natural Language Adversarial Examples[J].arXiv:1804.07998,2018.
[20]ZANG Y,QI F,YANG C,et al.Word-level Textual Adversarial Attacking as Combinatorial Optimization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6066-6080.
[21]CHOI Y,KIM H,LEE J H.TABS:Efficient Textual Adversari-al Attack for Pre-trained NL Code Model Using Semantic Beam Search[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:5490-5498.
[22]DONG Z,DONG Q,HAO C.HowNet and Its Computation of Meaning[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations.2010:53-56.
[23]MILLER G A.WordNet:a lexical database for english[J].Communications of the ACM,1995,38(11):39-41.
[24]MRKŠIĆ N, SÉAGHDHA D Ó,THOMSON B,et al.Counter-fitting Word Vectors to Linguistic Constraints[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:142-148.
[25]GARG S,RAMAKRISHNAN G.BAE:BERT-based Adversarial Examples for Text Classification[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:6174-6181.
[26]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[J].arXiv:1908.10084,2019.
[27]SOCHER R,PERELYGIN A,WU J,et al.Recursive Deep Mo-dels for Semantic Compositionality Over a Sentiment Treebank[EB／OL].https://aclanthology.org/D13-1170.pdf.
[28]PANG B,LEE L.Seeing stars:Exploiting class relationships for sentiment categorization with respect to rating scales[C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics.2005:115-124.
[29]ZHANG X,LECUN Y.Text Understanding from Scratch[J].arXiv:1502.01710,2016.
[30]MAAS A L,DALY R E,PHAM P T,et al.Learning Word Vectors for Sentiment Analysis[C]//Annual Meeting of the Association for Computational Linguistics.2011.
[31]BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:632-642.
[32]WILLIAMS A,NANGIA N,BOWMAN S.A Broad-CoverageChallenge Corpus for Sentence Understanding through Inference[EB/OL].https://aclanthology.org/N18-1101.pdf.
[33]CER D,YANG Y,KONG S Y,et al.Universal Sentence Encoder[J].arXiv:1803.11175,2018.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

DLSF:基于双重语义过滤的文本对抗攻击方法

DLSF:A Textual Adversarial Attack Method Based on Dual-level Semantic Filtering

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0