Computer Science ›› 2025, Vol. 52 ›› Issue (12): 428-434.doi: 10.11896/jsjkx.250500005

• Information Security • Previous Articles    

Enhancing NLP Robustness Against Attacks with Retrieval-augmented Classification and Decoupled Representations

ZHANG Peng, ZHANG Daojuan, CHEN Kai, ZHAO Yufei, ZHANG Yingjie, FEI Kexiong   

  1. State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, China Electric Power Research Institute Co., Ltd., Beijing 102209, China
  • Received:2025-05-06 Revised:2025-09-03 Online:2025-12-15 Published:2025-12-09
  • About author:ZHANG Peng,born in 1981,master,senior engineer.His main research interests include AI security,intelligent attack and defense,and threat detection.
    ZHANG Daojuan,born in 1989,Ph.D,senior engineer.Her main research interests include AI security,intelligent attack and defense,and threat detection.
  • Supported by:
    This work was supported by the Science and Technology Project of State Grid Corporation of China:Research on Attack and Defense Methods for Electric Power Artificial Intelligence Models (5700-202358708A-3-3-JC).

Abstract: While NLP models have achieved state-of-the-art performance across various classification tasks,their vulnerability to adversarial attacks remains a significant challenge.This paper introduces a novel retrieval-augmented classification approach designed to enhance model robustness against such attacks.By leveraging KNN retrieval mechanism,this method interpolates the predicted label distributions with those of retrieved instances,strengthening the model’s decision-making process in adversarial settings.A key innovation of this work is the decoupling of the representation spaces used for classification and retrieval,which mitigates performance degradation and training instability caused by shared representations.The proposed method is evaluated across a range of benchmark datasets under various adversarial attack scenarios,demonstrating substantial improvements in model robustness.Specifically,the accuracy drops typically observed under adversarial conditions are reduced by 30 percentage points to 40 percentage points,with the proposed approach maintaining performance stability even under intense attacks.Comprehensive experiments validate the effectiveness of the proposed method,highlighting the impact of both retrieval-augmented classification and decoupled representations in creating more resilient and reliable systems.

Key words: Adversarial defense, Retrieval-augmented classification, Natural language processing, Model robustness, KNN retrie-val, Representation learning

CLC Number: 

  • TP391
[1]LU S Y,LIU M Z,YIN L R,et al.The multi-modal fusion in visual question answering:a review of attention mechanisms[J].PeerJ Comuter Science,2023,9:e1400.
[2]OMAR R,MANGUKIYA O,KALNIS P,et al.Chatgpt versus traditional question answering for knowledge graphs:Current status and future directions towards knowledge graph chatbots[J].arXiv:2302.06466,2023.
[3]ZHUANG Y C,YU Y,WANG K,et al.Toolqa:A dataset for llm question answering with external tools[J].Advances in Neural Information Processing Systems,2023,36:50117-50143.
[4]LI B Z,DONATELLI L,KOLLER A,et al.Slog:A structural generalization benchmark for semantic parsing[J].arXiv:2310.15040,2023.
[5]ZHUO T Y,LI Z,HUANG Y J,et al.On robustness of prompt-based semantic parsing with large pre-trained language model:An empirical study on codex[J].arXiv:2301.12868,2023.
[6]CHEN Y R,ZHANG S Y,QI G L,et al.Parameterizing con-text:Unleashing the power of parameter-efficient fine-tuning and in-context tuning for continual table semantic parsing[C]//Advances in Neural Information Processing Systems.2024.
[7]HUI B Y,YANG J,CUI Z Y,et al.Qwen2.5-coder technical report[J].arXiv:2409.12186,2024.
[8]LIU S K,CHAI L Z,YANG J,et al.Mdeval:Massively multilingual code debugging[J].arXiv:2411.02310,2024.
[9]CHAI L Z,LIU S K,YANG J,et al.Mceval:Massively multilingual code evaluation[J].arXiv:2406.07436,2024.
[10]GARRIDO-MERCHAN E C,GOZALO-BRIZUELA R,GONZ-ALEZ-CARVAJAL S.Comparing bert against traditional machine learning models in text classification[J].Journal of Computational and Cognitive Engineering,2023,2(4):352-356.
[11]BEKAMIRI H,HAIN D S,JUROWETZKI R.Patentsberta:A deep nlp based hybrid model for patent distance and classification using augmented sbert[J].Technological Forecasting and Social Change,2024,206:123536.
[12]OLUSEGUN R,OLADUNNI T,AUDU H,et al.Text mining and emotion classification on monkeypox twitter dataset:A deep learning-natural language processing(nlp) approach[J].IEEE Access,2023,11:49882-49894.
[13]SHAYEGANI E,AL MAMUN M A,FU Y,et al.Survey of vulnerabilities in large language models revealed by adversarial attacks[J].arXiv:2310.10844,2023.
[14]LIU S B,LIU G R,ZHU B R,et al.Balancing innovation and privacy:Data security strategies in natural language processing applications[C]//2024 5th International Conference on Machine Learning and Computer Application(ICMLCA).IEEE,2024:609-613.
[15]TAN K L,LEE C P,LIM K M.A survey of sentiment analysis:Approaches,datasets,and future research[J].Applied Sciences,2023,13(7):4550.
[16]KOZYREVA A,HERZOG S M,LEWANDOWSKY S,et al.Resolving content moderation dilemmas between free speech and harmful misinformation[C]//Proceedings of the National Academy of Sciences.2023.
[17]MOTIE S,RAAHEMI B.Financial fraud detection using graph neural networks:A systematic review[J].Expert Systems with Applications,2024,240:122156.
[18]GAO Y,CAO Z W,MIAO Z J,et al.Efficient k-nearest-neighbor machine translation with dynamic retrieval[J].arXiv:2406.06073,2024.
[19]GUO G D,WANG H,BELL D,et al.Knn model-based approach in classification[C]//On The Move to Meaningful Internet Systems 2003:CoopIS,DOA,and ODBAS.Berlin:Springer,2003:986-996.
[20]KHANDELWAL U,LEVY O,JURAFSKY D,et al.Generalization through Memorization:Nearest Neighbor Language Models[C]//International Conference on Learning Representations(ICLR).2020.
[21]KHANDELWAL U,FAN A,JURAFSKY D,et al.Nearestneighbor machine translation[C]//International Conference on Learning Representations(ICLR).2021.
[22]SU X A,WANG R,DAI X Y.Contrastive learning-enhancednearest neighbor mechanism for multi-label text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.ACL,2022:672-679.
[23]GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[C]//Proceedings of the International Conference on Learning Representations(ICLR).2015.
[24]MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[C]//Procee-dings of the International Conference on Learning Representations(ICLR).2018.
[25]HU H,RICHARDSON K,XU L,et al.OCNLI:Original Chi-nese Natural Language Inference[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.ACL,2020:3512-3526.
[26]WILLIAMS A,NANGIA N,BOWMAN S.A broad-coveragechallenge corpus for sentence understanding through inference[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2018:1112-1122.
[27]CUI Y M,CHE W X,LIU T,et al.Revisiting pre-trained models for Chinese natural language processing[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.ACL,2020:657-668.
[28]CUI Y M,CHE W X,LIU T,et al.Pre-training with whole word masking for chinese bert[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3504-3514.
[29]LIU Y H,OTT M,GOYAL N,et al.Roberta:A robustly optimized BERT pretraining approach[J].arXiv:1907.11692,2019.
[30]DONG Y P,LIAO F Z,PANG T Y,et al.Boosting adversarial attacks with momentum[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2018:9185-9193.
[31]YE M C,CHEN J ,MIAO C L,et al.Leapattack:Hard-label adversarial attack on text via gradient-based optimization[C]//Proceedings of the 28th ACM SIGKDD Conference on Know-ledge Discovery and Data Mining.2022:2307-2315.
[32]LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmentedgeneration for knowledge-intensive nlp tasks[J].Advances in Neural Information Processing Systems,2020,33:9459-9474.
[33]LIU S J,WU J,BAO J Y,et al.Towards a robust retrieval-based summarization system[J].arXiv:2403.19889,2024.
[34]SIRIWARDHANA S,WEERASEKERA R,WEN E T,et al.Improving the domain adaptation of retrieval augmented generation models for open domain question answering[J].Transactions of the Association for Computational Linguistics,2023,11:1-17.
[35]ZHU Y H,REN C Y,XIE S Y,et al.Realm:Rag-driven en-hancement of multimodal electronic health records analysis via large language models[J].arXiv:2402.07016,2024.
[36]WU S Y,XIONG Y,CUI Y F,et al.Retrieval-augmented gene-ration for natural language processing:A survey[J].arXiv:2407.13193,2024.
[37]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186.
[38]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[39]RIBEIRO M T,WU T,GUESTRIN C,et al.Beyond accuracy:Behavioral testing of NLP models withCheckList[J].arXiv:2005.04118,2020.
[40]YOO K Y,KIM J,JANG J,et al.Detection of word adversarial examples in text classification:Benchmark and baseline via robust density estimation[J].arXiv:2203.01677,2022.
[1] CHENG Zhangtao, HUANG Haoran, XUE He, LIU Leyuan, ZHONG Ting, ZHOU Fan. Event Causality Identification Model Based on Prompt Learning and Hypergraph [J]. Computer Science, 2025, 52(9): 303-312.
[2] ZHU Rui, YE Yaqin, LI Shengwen, TANG Zijian, XIAO Yue. Dynamic Community Detection with Hierarchical Modularity Optimization [J]. Computer Science, 2025, 52(8): 127-135.
[3] LIU Le, XIAO Rong, YANG Xiao. Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction [J]. Computer Science, 2025, 52(8): 277-287.
[4] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[5] LIAO Sirui, HUANG Feihu, ZHAN Pengxiang, PENG Jian, ZHANG Linghao. DCDAD:Differentiated Context Dependency for Time Series Anomaly Detection Method [J]. Computer Science, 2025, 52(6): 106-117.
[6] GUO Xuan, HOU Jinlin, WANG Wenjun, JIAO Pengfei. Dynamic Link Prediction Method for Adaptively Modeling Network Dynamics [J]. Computer Science, 2025, 52(6): 118-128.
[7] TAN Qiyin, YU Jiong, CHEN Zixin. Outlier Detection Method Based on Adaptive Graph Autoencoder [J]. Computer Science, 2025, 52(6): 129-138.
[8] WANG Jinghong, WU Zhibing, WANG Xizhao, LI Haokang. Semantic-aware Heterogeneous Graph Attention Network Based on Multi-view RepresentationLearning [J]. Computer Science, 2025, 52(6): 167-178.
[9] WU Jie, WAN Yuan, LIU Qiujie. Consistent Block Diagonal and Exclusive Multi-view Subspace Clustering [J]. Computer Science, 2025, 52(4): 138-146.
[10] LIU Yanlun, XIAO Zheng, NIE Zhenyu, LE Yuquan, LI Kenli. Case Element Association with Evidence Extraction for Adjudication Assistance [J]. Computer Science, 2025, 52(2): 222-230.
[11] XU Siyao, ZENG Jianjun, ZHANG Weiyan, YE Qi, ZHU Yan. Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models [J]. Computer Science, 2025, 52(2): 253-260.
[12] LIU Weijie, TANG Zecheng, LI Juntao. MemLong:Memory-augmented Retrieval for Long Text Modeling [J]. Computer Science, 2025, 52(12): 231-238.
[13] YUAN Tianhao, WANG Yongjun, WANG Baoshan, WANG Zhongyuan. Review of Artificial Intelligence Generated Content Applications in Natural Language Processing [J]. Computer Science, 2025, 52(11A): 241200156-12.
[14] WEI Hao, ZHANG Zongyu, DIAO Hongyue, DENG Yaochen. Review of Application of Information Extraction Technology in Digital Humanities [J]. Computer Science, 2025, 52(11A): 250600198-10.
[15] ZHAO Hongyi, LI Zhiyuan, BU Fanliang. Multi-language Embedding Graph Convolutional Network for Hate Speech Detection [J]. Computer Science, 2025, 52(11A): 241200023-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!