DLSF:基于双重语义过滤的文本对抗攻击方法

doi:10.11896/jsjkx.240700202

Abstract

Abstract: In the field of commercial applications,deep learning-based text models play a crucial role but are also susceptible to adversarial samples,such as the incorporation of confusing vocabulary into reviews leading to erroneous model responses.A strong attack algorithm can assess the robustness of such models and test the effectiveness of existing defense methods,thereby reducing potential harms from adversarial samples.Considering the prevalent issues of low-quality adversarial texts and inefficient attack methods in black-box settings,this paper proposes a dual-level semantic filtering attack algorithm based on word substitution.This algorithm amalgamates existing methodologies for assembling candidate word sets,effectively eliminates interfe-rence from irrelevant words,and thereby enriches the variety and quantity of candidate words.It employs a dual-filter beam search strategy during the iterative search process,which not only reduces the frequency of model access,but also guarantees the acquisition of optimal adversarial texts.Experimental results on text classification and natural language inference tasks demonstrate that this method significantly enhances the quality of adversarial texts and attack efficiency.Specifically,the attack success rate on the IMDB dataset reaches 99.7%,semantic similarity reaches 0.975,with the number of model accesses being only 17% of those required by TAMPERS.Furthermore,after adversarial augmentation training with adversarial samples,the target model's attack success rate on the MR dataset decreases from 92.9% to 65.4%,further confirming that DLSF effectively enhances the robustness of the target model.

Key words: Textual adversarial attack,Black-box attack,Beam search,Robustness,Text model

CLC Number:

TP391

XIONG Xi, DING Guangzheng, WANG Juan, ZHANG Shuai. DLSF:A Textual Adversarial Attack Method Based on Dual-level Semantic Filtering[J].Computer Science, 2025, 52(10): 423-432.

References

[1]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[J].arXiv:1512.03385,2015.
[2]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[3]MA P,HALIASSOS A,FERNANDEZ-LOPEZ A,et al.Auto-AVSR:Audio-Visual Speech Recognition with Automatic Labels[C]//2023 IEEE International Conference on Acoustics,Speech and Signal Processing.2023:1-5.
[4]SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
[5]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining andHarnessing Adversarial Examples[J].arXiv:1412.6572,2014.
[6]MOOSAVI-DEZFOOLI S M,FAWZI A,FROSSARD P.DeepFool:A Simple and Accurate Method to Fool Deep Neural Networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:2574-2582.
[7]CHEN P Y,ZHANG H,SHARMA Y,et al.ZOO:Zeroth OrderOptimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.ACM,2017:15-26.
[8]YUAN X,HE P,ZHU Q,et al.Adversarial Examples:Attacks and Defenses for Deep Learning[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(9):2805-2824.
[9]HOSSEINI H,KANNAN S,ZHANG B,et al.DeceivingGoogle's Perspective API Built for Detecting Toxic Comments[J].arXiv:1702.08138,2017.
[10]PAPERNOT N,MCDANIEL P,SWAMI A,et al.Crafting Adversarial Input Sequences for Recurrent Neural Networks[J].arXiv:1604.08275,2016.
[11]LIU H,XU Z,ZHANG X,et al.SSPAttack:A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13228-13235.
[12]ZHAO X,ZHANG L,XU D,et al.Generating Textual Adversaries with Minimal Perturbation[J].arXiv:2211.06571,2022.
[13]MAHESHWARY R,MAHESHWARY S,PUDI V.Generating Natural Language Attacks in a Hard Label Black Box Setting[J].arXiv:2012.14956,2020.
[14]LI L,MA R,GUO Q,et al.BERT-ATTACK:Adversarial Attack Against BERT Using BERT[J].arXiv:2004.09984,2020.
[15]ZHU H,ZHAO Q,WU Y.BeamAttack:Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces[J].arXiv:2303.07199,2023.
[16]YOO J Y,MORRIS J X,LIFLAND E,et al.Searching for aSearch Method:Benchmarking Search Algorithms for Generating NLP Adversarial Examples[J].arXiv:2009.06368,2020.
[17]JIN D,JIN Z,ZHOU J T,et al.Is BERT Really Robust?AStrong Baseline for Natural Language Attack on Text Classification and Entailment[C]//National Conference on Artificial Intelligence.2020:123-131.
[18]REN S,DENG Y,HE K,et al.Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1085-1097.
[19]ALZANTOT M,SHARMA Y,ELGOHARY A,et al.Generating Natural Language Adversarial Examples[J].arXiv:1804.07998,2018.
[20]ZANG Y,QI F,YANG C,et al.Word-level Textual Adversarial Attacking as Combinatorial Optimization[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:6066-6080.
[21]CHOI Y,KIM H,LEE J H.TABS:Efficient Textual Adversari-al Attack for Pre-trained NL Code Model Using Semantic Beam Search[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:5490-5498.
[22]DONG Z,DONG Q,HAO C.HowNet and Its Computation of Meaning[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations.2010:53-56.
[23]MILLER G A.WordNet:a lexical database for english[J].Communications of the ACM,1995,38(11):39-41.
[24]MRKŠIĆ N, SÉAGHDHA D Ó,THOMSON B,et al.Counter-fitting Word Vectors to Linguistic Constraints[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:142-148.
[25]GARG S,RAMAKRISHNAN G.BAE:BERT-based Adversarial Examples for Text Classification[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:6174-6181.
[26]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[J].arXiv:1908.10084,2019.
[27]SOCHER R,PERELYGIN A,WU J,et al.Recursive Deep Mo-dels for Semantic Compositionality Over a Sentiment Treebank[EB／OL].https://aclanthology.org/D13-1170.pdf.
[28]PANG B,LEE L.Seeing stars:Exploiting class relationships for sentiment categorization with respect to rating scales[C]//Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics.2005:115-124.
[29]ZHANG X,LECUN Y.Text Understanding from Scratch[J].arXiv:1502.01710,2016.
[30]MAAS A L,DALY R E,PHAM P T,et al.Learning Word Vectors for Sentiment Analysis[C]//Annual Meeting of the Association for Computational Linguistics.2011.
[31]BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:632-642.
[32]WILLIAMS A,NANGIA N,BOWMAN S.A Broad-CoverageChallenge Corpus for Sentence Understanding through Inference[EB/OL].https://aclanthology.org/N18-1101.pdf.
[33]CER D,YANG Y,KONG S Y,et al.Universal Sentence Encoder[J].arXiv:1803.11175,2018.

Related Articles 15

[1]	WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[2]	ZHENG Hanyuan, GE Rongjun, HE Shengji, LI Nan. Direct PET to CT Attenuation Correction Algorithm Based on Imaging Slice Continuity [J]. Computer Science, 2025, 52(10): 115-122.
[3]	XU Hengyu, CHEN Kun, XU Lin, SUN Mingzhai, LU Zhou. SAM-Retina:Arteriovenous Segmentation in Dual-modal Retinal Image Based on SAM [J]. Computer Science, 2025, 52(10): 123-133.
[4]	WEN Jing, ZHANG Songsong, LI Xufeng. Target Tracking Method Based on Cross Scale Fusion of Features and Trajectory Prompts [J]. Computer Science, 2025, 52(10): 144-150.
[5]	SHENG Xiaomeng, ZHAO Junli, WANG Guodong, WANG Yang. Immediate Generation Algorithm of High-fidelity Head Avatars Based on NeRF [J]. Computer Science, 2025, 52(10): 159-167.
[6]	ZHENG Dichen, HE Jikai, LIU Yi, GAO Fan, ZHANG Dengyin. Low Light Image Adaptive Enhancement Algorithm Based on Retinex Theory [J]. Computer Science, 2025, 52(10): 168-175.
[7]	RUAN Ning, LI Chun, MA Haoyue, JIA Yi, LI Tao. Review of Quantum-inspired Metaheuristic Algorithms and Its Applications [J]. Computer Science, 2025, 52(10): 190-200.
[8]	XIONG Zhuozhi, GU Zhouhong, FENG Hongwei, XIAO Yanghua. Subject Knowledge Evaluation Method for Language Models Based on Multiple ChoiceQuestions [J]. Computer Science, 2025, 52(10): 201-207.
[9]	WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory [J]. Computer Science, 2025, 52(10): 208-216.
[10]	CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[11]	LI Sihui, CAI Guoyong, JIANG Hang, WEN Yimin. Novel Discrete Diffusion Text Generation Model with Convex Loss Function [J]. Computer Science, 2025, 52(10): 231-238.
[12]	ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246.
[13]	CHEN Jiahao, DUAN Liguo, CHANG Xuanwei, LI Aiping, CUI Juanjuan, HAO Yuanbin. Text Sentiment Classification Method Based on Large-batch Adversarial Strategy and EnhancedFeature Extraction [J]. Computer Science, 2025, 52(10): 247-257.
[14]	WANG Ye, WANG Zhongqing. Text Simplification for Aspect-based Sentiment Analysis Based on Large Language Model [J]. Computer Science, 2025, 52(10): 258-265.
[15]	ZHAO Jinshuang, HUANG Degen. Summary Faithfulness Evaluation Based on Data Augmentation and Two-stage Training [J]. Computer Science, 2025, 52(10): 266-274.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

DLSF:A Textual Adversarial Attack Method Based on Dual-level Semantic Filtering

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0