基于检索增强分类与解耦表示的NLP对抗鲁棒性提升方法

doi:10.11896/jsjkx.250500005

计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 428-434.doi: 10.11896/jsjkx.250500005

• 信息安全 • 上一篇

基于检索增强分类与解耦表示的NLP对抗鲁棒性提升方法

张錋, 张道娟, 陈凯, 赵宇飞, 张英杰, 费克雄

中国电力科学研究院有限公司电力网络安全防护与监测技术实验室北京 102209

收稿日期:2025-05-06 修回日期:2025-09-03 出版日期:2025-12-15 发布日期:2025-12-09
通讯作者: 张道娟(zhangdaojuan@epri.sgcc.com.cn)
作者简介:(zhangpeng@epri.sgcc.com.cn)
基金资助:
国家电网有限公司科技项目:面向电力人工智能模型的攻击防御方法研究(5700-202358708A-3-3-JC)

Enhancing NLP Robustness Against Attacks with Retrieval-augmented Classification and Decoupled Representations

ZHANG Peng, ZHANG Daojuan, CHEN Kai, ZHAO Yufei, ZHANG Yingjie, FEI Kexiong

State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, China Electric Power Research Institute Co., Ltd., Beijing 102209, China

Received:2025-05-06 Revised:2025-09-03 Published:2025-12-15 Online:2025-12-09
About author:ZHANG Peng,born in 1981,master,senior engineer.His main research interests include AI security,intelligent attack and defense,and threat detection.
ZHANG Daojuan,born in 1989,Ph.D,senior engineer.Her main research interests include AI security,intelligent attack and defense,and threat detection.
Supported by:
This work was supported by the Science and Technology Project of State Grid Corporation of China:Research on Attack and Defense Methods for Electric Power Artificial Intelligence Models (5700-202358708A-3-3-JC).

摘要/Abstract

摘要： 虽然自然语言处理(Natural Language Processing,NLP)模型在各类文本分类任务中表现优异,但面对对抗性攻击时依然存在较大脆弱性。为应对这一问题,提出了一种创新性的检索增强分类方法,有效提升了模型在对抗环境下的鲁棒性。该方法引入了 k-最近邻(K-Nearest-Neighbor,KNN)检索机制,将模型自身的标签预测结果与检索到的相似样本标签分布相结合,使模型在遭受攻击时能做出更为稳健的判断。该方法的一大创新在于将分类与检索所用的表示空间分开设计,从而避免了共享表示带来的性能下降和训练不稳定。通过在多种基准数据集和多样化对抗攻击场景下的实验,证明了所提出的方法显著提升了模型的鲁棒性:在对抗攻击下,可使模型准确率下30个百分点到40个百分点,即使在强烈攻击下依然能够保持较为稳定的表现。大量实验进一步验证了该方法的有效性,表明检索增强分类和解耦表示对于构建更可靠的系统具有重要意义。

关键词: 对抗性防御, 检索增强分类, 自然语言处理, 模型鲁棒性, KNN检索, 表征学习

Abstract: While NLP models have achieved state-of-the-art performance across various classification tasks,their vulnerability to adversarial attacks remains a significant challenge.This paper introduces a novel retrieval-augmented classification approach designed to enhance model robustness against such attacks.By leveraging KNN retrieval mechanism,this method interpolates the predicted label distributions with those of retrieved instances,strengthening the model’s decision-making process in adversarial settings.A key innovation of this work is the decoupling of the representation spaces used for classification and retrieval,which mitigates performance degradation and training instability caused by shared representations.The proposed method is evaluated across a range of benchmark datasets under various adversarial attack scenarios,demonstrating substantial improvements in model robustness.Specifically,the accuracy drops typically observed under adversarial conditions are reduced by 30 percentage points to 40 percentage points,with the proposed approach maintaining performance stability even under intense attacks.Comprehensive experiments validate the effectiveness of the proposed method,highlighting the impact of both retrieval-augmented classification and decoupled representations in creating more resilient and reliable systems.

Key words: Adversarial defense, Retrieval-augmented classification, Natural language processing, Model robustness, KNN retrie-val, Representation learning

中图分类号:

TP391

张錋, 张道娟, 陈凯, 赵宇飞, 张英杰, 费克雄. 基于检索增强分类与解耦表示的NLP对抗鲁棒性提升方法[J]. 计算机科学, 2025, 52(12): 428-434. https://doi.org/10.11896/jsjkx.250500005

ZHANG Peng, ZHANG Daojuan, CHEN Kai, ZHAO Yufei, ZHANG Yingjie, FEI Kexiong. Enhancing NLP Robustness Against Attacks with Retrieval-augmented Classification and Decoupled Representations[J]. Computer Science, 2025, 52(12): 428-434. https://doi.org/10.11896/jsjkx.250500005

参考文献

[1]LU S Y,LIU M Z,YIN L R,et al.The multi-modal fusion in visual question answering:a review of attention mechanisms[J].PeerJ Comuter Science,2023,9:e1400.
[2]OMAR R,MANGUKIYA O,KALNIS P,et al.Chatgpt versus traditional question answering for knowledge graphs:Current status and future directions towards knowledge graph chatbots[J].arXiv:2302.06466,2023.
[3]ZHUANG Y C,YU Y,WANG K,et al.Toolqa:A dataset for llm question answering with external tools[J].Advances in Neural Information Processing Systems,2023,36:50117-50143.
[4]LI B Z,DONATELLI L,KOLLER A,et al.Slog:A structural generalization benchmark for semantic parsing[J].arXiv:2310.15040,2023.
[5]ZHUO T Y,LI Z,HUANG Y J,et al.On robustness of prompt-based semantic parsing with large pre-trained language model:An empirical study on codex[J].arXiv:2301.12868,2023.
[6]CHEN Y R,ZHANG S Y,QI G L,et al.Parameterizing con-text:Unleashing the power of parameter-efficient fine-tuning and in-context tuning for continual table semantic parsing[C]//Advances in Neural Information Processing Systems.2024.
[7]HUI B Y,YANG J,CUI Z Y,et al.Qwen2.5-coder technical report[J].arXiv:2409.12186,2024.
[8]LIU S K,CHAI L Z,YANG J,et al.Mdeval:Massively multilingual code debugging[J].arXiv:2411.02310,2024.
[9]CHAI L Z,LIU S K,YANG J,et al.Mceval:Massively multilingual code evaluation[J].arXiv:2406.07436,2024.
[10]GARRIDO-MERCHAN E C,GOZALO-BRIZUELA R,GONZ-ALEZ-CARVAJAL S.Comparing bert against traditional machine learning models in text classification[J].Journal of Computational and Cognitive Engineering,2023,2(4):352-356.
[11]BEKAMIRI H,HAIN D S,JUROWETZKI R.Patentsberta:A deep nlp based hybrid model for patent distance and classification using augmented sbert[J].Technological Forecasting and Social Change,2024,206:123536.
[12]OLUSEGUN R,OLADUNNI T,AUDU H,et al.Text mining and emotion classification on monkeypox twitter dataset:A deep learning-natural language processing(nlp) approach[J].IEEE Access,2023,11:49882-49894.
[13]SHAYEGANI E,AL MAMUN M A,FU Y,et al.Survey of vulnerabilities in large language models revealed by adversarial attacks[J].arXiv:2310.10844,2023.
[14]LIU S B,LIU G R,ZHU B R,et al.Balancing innovation and privacy:Data security strategies in natural language processing applications[C]//2024 5th International Conference on Machine Learning and Computer Application(ICMLCA).IEEE,2024:609-613.
[15]TAN K L,LEE C P,LIM K M.A survey of sentiment analysis:Approaches,datasets,and future research[J].Applied Sciences,2023,13(7):4550.
[16]KOZYREVA A,HERZOG S M,LEWANDOWSKY S,et al.Resolving content moderation dilemmas between free speech and harmful misinformation[C]//Proceedings of the National Academy of Sciences.2023.
[17]MOTIE S,RAAHEMI B.Financial fraud detection using graph neural networks:A systematic review[J].Expert Systems with Applications,2024,240:122156.
[18]GAO Y,CAO Z W,MIAO Z J,et al.Efficient k-nearest-neighbor machine translation with dynamic retrieval[J].arXiv:2406.06073,2024.
[19]GUO G D,WANG H,BELL D,et al.Knn model-based approach in classification[C]//On The Move to Meaningful Internet Systems 2003:CoopIS,DOA,and ODBAS.Berlin:Springer,2003:986-996.
[20]KHANDELWAL U,LEVY O,JURAFSKY D,et al.Generalization through Memorization:Nearest Neighbor Language Models[C]//International Conference on Learning Representations(ICLR).2020.
[21]KHANDELWAL U,FAN A,JURAFSKY D,et al.Nearestneighbor machine translation[C]//International Conference on Learning Representations(ICLR).2021.
[22]SU X A,WANG R,DAI X Y.Contrastive learning-enhancednearest neighbor mechanism for multi-label text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.ACL,2022:672-679.
[23]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[C]//Proceedings of the International Conference on Learning Representations(ICLR).2015.
[24]MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[C]//Procee-dings of the International Conference on Learning Representations(ICLR).2018.
[25]HU H,RICHARDSON K,XU L,et al.OCNLI:Original Chi-nese Natural Language Inference[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.ACL,2020:3512-3526.
[26]WILLIAMS A,NANGIA N,BOWMAN S.A broad-coveragechallenge corpus for sentence understanding through inference[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2018:1112-1122.
[27]CUI Y M,CHE W X,LIU T,et al.Revisiting pre-trained models for Chinese natural language processing[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.ACL,2020:657-668.
[28]CUI Y M,CHE W X,LIU T,et al.Pre-training with whole word masking for chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[29]LIU Y H,OTT M,GOYAL N,et al.Roberta:A robustly optimized BERT pretraining approach[J].arXiv:1907.11692,2019.
[30]DONG Y P,LIAO F Z,PANG T Y,et al.Boosting adversarial attacks with momentum[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:9185-9193.
[31]YE M C,CHEN J ,MIAO C L,et al.Leapattack:Hard-label adversarial attack on text via gradient-based optimization[C]//Proceedings of the 28th ACM SIGKDD Conference on Know-ledge Discovery and Data Mining.2022:2307-2315.
[32]LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmentedgeneration for knowledge-intensive nlp tasks[J].Advances in Neural Information Processing Systems,2020,33:9459-9474.
[33]LIU S J,WU J,BAO J Y,et al.Towards a robust retrieval-based summarization system[J].arXiv:2403.19889,2024.
[34]SIRIWARDHANA S,WEERASEKERA R,WEN E T,et al.Improving the domain adaptation of retrieval augmented generation models for open domain question answering[J].Transactions of the Association for Computational Linguistics,2023,11:1-17.
[35]ZHU Y H,REN C Y,XIE S Y,et al.Realm:Rag-driven en-hancement of multimodal electronic health records analysis via large language models[J].arXiv:2402.07016,2024.
[36]WU S Y,XIONG Y,CUI Y F,et al.Retrieval-augmented gene-ration for natural language processing:A survey[J].arXiv:2407.13193,2024.
[37]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[38]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[39]RIBEIRO M T,WU T,GUESTRIN C,et al.Beyond accuracy:Behavioral testing of NLP models withCheckList[J].arXiv:2005.04118,2020.
[40]YOO K Y,KIM J,JANG J,et al.Detection of word adversarial examples in text classification:Benchmark and baseline via robust density estimation[J].arXiv:2203.01677,2022.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于检索增强分类与解耦表示的NLP对抗鲁棒性提升方法

Enhancing NLP Robustness Against Attacks with Retrieval-augmented Classification and Decoupled Representations

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0