计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 258-267.doi: 10.11896/jsjkx.200500078

• 信息安全 • 上一篇    下一篇

面向自然语言处理的深度学习对抗样本综述

仝鑫, 王斌君, 王润正, 潘孝勤   

  1. 中国人民公安大学信息网络安全学院 北京 100038
  • 收稿日期:2020-05-18 修回日期:2020-08-25 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 王斌君(wangbinjun@ppsuc.edu.cn)
  • 作者简介:tongxindotnet@outlook.com
  • 基金资助:
    2020 CCF-绿盟科技“鲲鹏”科研基金(CCF-NSFOCUS 2020011);公安部科技强警基础专项(2018GABJC03);国家社会科学基金重点项目(20AZD114);中国人民公安大学拔尖创新人才培养经费支持硕士研究生项目(2020ssky005);中国人民公安大学公共安全行为科学研究与技术创新项目

Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing

TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin   

  1. School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China
  • Received:2020-05-18 Revised:2020-08-25 Online:2021-01-15 Published:2021-01-15
  • About author:TONG Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include adversarial examples and natural language processing.
    WANG Bin-jun,born in 1962,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include natural language processing and information security.
  • Supported by:
    2020 CCF-Nsfocus “Kunpeng” Research Fund(CCF-NSFOCUS 2020011),Science and Technology Strengthening Police Basic Program of Ministry of Public Security(2018GABJC03),Key Program of the National Social Science Foundation of China(20AZD114),Top Talent Training Special Funding Graduate Research and Innovation Project of People's Public Security University of China(2020ssky005),and Scientific Research and Technological Innovation on Public Security Behavior of People's Public Security University of China.

摘要: 深度学习模型被证明存在脆弱性并容易遭到对抗样本的攻击,但目前对于对抗样本的研究主要集中在计算机视觉领域而忽略了自然语言处理模型的安全问题。针对自然语言处理领域同样面临对抗样本的风险,在阐明对抗样本相关概念的基础上,文中首先对基于深度学习的自然语言处理模型的复杂结构、难以探知的训练过程和朴素的基本原理等脆弱性成因进行分析,进一步阐述了文本对抗样本的特点、分类和评价指标,并对该领域对抗技术涉及到的典型任务和数据集进行了阐述;然后按照扰动级别对主流的字、词、句和多级扰动组合的文本对抗样本生成技术进行了梳理,并对相关防御方法进行了归纳总结;最后对目前自然语言处理对抗样本领域攻防双方存在的痛点问题进行了进一步的讨论和展望。

关键词: 对抗样本, 鲁棒性, 人工智能安全, 深度学习, 自然语言处理

Abstract: Deep learning models have been proven to be vulnerable and easy to be attacked by adversarial examples,but the current researches on adversarial samples mainly focus on the field of computer vision and ignore the security of natural language processing models.In response to the same risk of adversarial samples faced in the field of natural language processing(NLP),this paper clarifies the concepts related to adversarial samples as the basis of further research.Firstly,it analyzes causes of vulnerabilities,including complex structure of the natural language processing model based on deep learning,the training process that is difficult to detect and the naive basic principles,further elaborates the characteristics,classification and evaluation metrics of text adversarial examples,and introduces the typical tasks and classical datasets involved in the adversarial examples related to researches in the field of natural language processing.Secondly,according to different perturbation levels,it sorts out various text adversarial examples generation technology of mainstream char-level,word-level,sentence-level and multi-level.What's more,it summarizes defense methods,which are relevant to data,models and inference,and compares their advantages and disadvantages.Finally,the pain points of both attack and defense sides in thefield of current NLP adversarial samples are further discussed and anticipated.

Key words: Adversarial examples, AI security, Deep learning, Natural language processing, Robustness

中图分类号: 

  • TP301
[1] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Memory[J].Neural computation,1997,9(8):1735-1780.
[2] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[3] PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:1532-1543.
[4] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[5] YANG Z,DAI Z,YANG Y,et al.XLNet:Generalized Autoregressive Pretraining for Language Understanding[C]//Advances in Neural Information Processing Systems.2019:5754-5764.
[6] WANG W,WANG L,TANG B,et al.Towards a Robust Deep Neural Network in Text Domain A Survey[J].arXiv:1902.07285,2019.
[7] SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
[8] PAN W B,WANG X Y.Survey on Generating Adversarial Examples[J].Journal of Software,2020,31(1):67-81.
[9] ASHISH V.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[10] NIVEN T,KAO H Y.Probing Neural Network Comprehension of Natural Language Arguments[J].arXiv:1907.07355,2019.
[11] KUSNER M,SUN Y,KOLKIN N,et al.From word embeddings to document distances[C]//International Conference on Machine Learning.2015:957-966.
[12] HUANG G,GUO C,KUSNER M J,et al.Supervised WordMover's Distance[C]//Advances in Neural Information Processing Systems.2016:4862-4870.
[13] WU L.Word mover's embedding:From word2vec to document embedding[J].arXiv:1811.01713,2018.
[14] DONG Y,FU Q A,YANG X,et al.Benchmarking Adversarial Robustness[J].arXiv:1912.11852,2019.
[15] MICHEL P,LI X,NEUBIG G,et al.On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models[J].ar-Xiv:1903.06620,2019.
[16] GIANNA M D C,ANTONIO G,FRANCESCO R,et al.Ran-king a stream of news[C]//Proceedings of the 14th Internatio-nal Conference on World Wide Web.2005:97-106.
[17] RICHARD S.Recursive deep models for semantic compositio-nality over a sentiment Treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Proces-sing.2013:1631-1642.
[18] CETTOLO M,GIRARDI C,FEDERICO M.Wit3:Web inventory of transcribed and translated talks[C]//Conference of European Association for Machine Translation.2012:261-268.
[19] RAJPURKAR P,ZHANG J,LOPYREV K,et al.SQuAD:100 000+ Questions for Machine Comprehension of Text[J].arXiv:1606.05250,2016.
[20] RAJPURKAR P,JIA R,LIANG P.Know What You Don'tKnow:Unanswerable Questions for SQuAD[J].arXiv:1806.03822,2018.
[21] GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the V in VQA matter:Elevating the role of image understanding in Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6904-6913.
[22] BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[J].arXiv:1508.05326,2015.
[23] WILLIAMS A,NANGIA N,BOWMAN S R.A broad-coverage challenge corpus for sentence understanding through inference[J].arXiv:1704.05426,2017.
[24] ERIK F,SANG T K,DE MEULDER F D.Introduction to the CoNLL-2003 shared task:Language-independent named entity recognition[J].arXiv:0306050,2003.
[25] BELINKOV Y,BISK Y.Synthetic and natural noise both break neural machine translation[J].arXiv:1711.02173,2017.
[26] GAO J,LANCHANTIN J,SOFFA M L,et al.Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//2018 IEEE Security and Privacy Workshops (SPW).IEEE,2018:50-56.
[27] WANG W Q,WANG R.Adversarial Examples Generation Approach for Tendency Classification on Chinese Texts[J].Journal of Software,2019,30(8):2415-2427.
[28] EBRAHIMI J,LOWD D,DOU D.On adversarial examples for character-level neural machine translation[J].arXiv:1806.09030,2018.
[29] EGER S,?AHIN G G,RüCKLè A,et al.Text processing like humans do:Visually attacking and shielding NLP systems[J].arXiv:1903.11508,2019.
[30] PAPERNOT N,MCDANIEL P,SWAMI A,et al.Crafting adversarial input sequences for recurrent neural networks[C]//MILCOM 2016-2016 IEEE Military Communications Confe-rence.IEEE,2016:49-54.
[31] GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014.
[32] JIN D,JIN Z,ZHOU J T,et al.Is BERT Really Robust?A Strong Baseline for Natural Language Attack on Text Classification and Entailment[J].AAAI2020,arXiv:1907.11932,2019.
[33] SAMANTA S,MEHTA S.Towards crafting text adversarial samples[J].arXiv:1707.02812,2017.
[34] SATO M,SUZUKI J,SHINDO H,et al.Interpretable adversarial perturbation in input embedding space for text[J].arXiv:1805.02917,2018.
[35] ZHANG H,ZHOU H,MIAO N,et al.Generating Fluent Adversarial Examples for Natural Languages[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5564-5569.
[36] ALZANTOT M,SHARMA Y,ELGOHARY A,et al.Generating natural language adversarial examples[J].arXiv:1804.07998,2018.
[37] ZANG Y,YANG C,QI F,et al.Textual Adversarial Attack as Combinatorial Optimization[J].arXiv:1910.12196,2019.
[38] REN S,DENG Y,HE K,et al.Generating natural language adversarial examples through probability weighted word saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1085-1097.
[39] JIA R,LIANG P.Adversarial examples for evaluating reading comprehension systems[J].arXiv:1707.07328,2017.
[40] MINERVINI P,RIEDEL S.Adversarially regularising neural nli models to integrate logical background knowledge[J].arXiv:1808.08609,2018.
[41] CHENG Y,JIANG L,MACHEREY W.Robust neural machine translation with doubly adversarial inputs[J].arXiv:1906.02443,2019.
[42] IYYER M,WIETING J,GIMPEL K,et al.Adversarial example generation with syntactically controlled paraphrase networks[J].arXiv:1804.06059,2018.
[43] ZHAO Z,DUA D,SINGH S.Generating natural adversarial examples[J].arXiv:1710.11342,2017.
[44] ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein gan[J].arXiv:1701.07875,2017.
[45] WALLACE E,RODRIGUEZ P,FENG S,et al.Trick me if you can:Human-in-the-loop generation of adversarial examples for question answering[J].Transactions of the Association for Computational Linguistics,2019,7(2019):387-401.
[46] RIBEIRO M T,SINGH S,GUESTRIN C.Semantically equivalent adversarial rules for debugging nlp models[C]//Procee-dings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:856-865.
[47] LI J,JI S,DU T,et al.Textbugger:Generating adversarial text against real-world applications[J].arXiv:1812.05271,2018.
[48] EBRAHIMI J,RAO A,LOWD D,et al.Hotflip:White-box adversarial examples for text classification[J].arXiv:1712.06751,2017.
[49] VIJAYARAGHAVAN P,ROY D.Generating Black-Box Ad-versarial Examples for Text Classifiers Using a Deep Reinforced Model[J].arXiv:1909.07873,2019.
[50] LIANG B,LI H,SU M,et al.Deep text classification can be fooled[J].arXiv:1704.08006,2017.
[51] GARDNER M,ARTZI Y,BASMOVA V,et al.Evaluating nlp models via contrast sets[J].arXiv:2004.02709,2020.
[52] PRUTHI D,DHINGRA B,LIPTON Z C.Combating adversarial misspellings with robust word recognition[J].arXiv:1905.11268,2019.
[53] ZHOU Y,JIANG J Y,CHANG K W,et al.Learning to discriminate perturbations for blocking adversarial attacks in text classification[J].arXiv:1909.03084,2019.
[54] TANAY T,GRIFFIN L D.A New Angle on L2 Regularization[J].arXiv:1806.11186,2018.
[55] PAPERNOT N,MCDANIEL P,WU X,et al.Distillation as a defense to adversarial perturbations against deep neural networks[C]//2016 IEEE Symposium on Security and Privacy(SP).IEEE,2016:582-597.
[56] MIYATO T,DAI A M,GOODFELLOW I.Adversarial training methods for semi-supervised text classification[J].arXiv:1605.07725,2016.
[57] MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[J].arXiv:1706.06083,2017.
[58] LI L,QIU X.TextAT:Adversarial Training for Natural Language Understanding with Token-Level Perturbation[J].arXiv:2004.14543,2020.
[59] DINAN E,HUMEAU S,CHINTAGUNTA B,et al.Build itbreak it fix it for dialogue safety:Robustness from adversarial human attack[J].arXiv:1908.06083,2019.
[60] HE W,WEI J,CHEN X,et al.Adversarial example defense:Ensembles of weak defenses are not strong[C]//11th USENIX Workshop on Offensive Technologies (WOOT 17).2017.
[61] KO C Y,LYU Z,WENG T W,et al.POPQORN:Quantifying robustness of recurrent neural networks[J].arXiv:1905.07387,2019.
[62] SHI Z,ZHANG H,CHANG K W,et al.Robustness verification for transformers[J].arXiv:2002.06622,2020.
[63] GOODMAN D,XIN H,YANG W,et al.Advbox:a toolbox to generate adversarial examples that fool neural networks[J].arXiv:2001.05574,2020.
[64] ATHALYE A,CARLINI N,WAGNER D.Obfuscated gradi-ents give a false sense of security:Circumventing defenses to adversarial examples[J].arXiv:1802.00420,2018.
[65] WALLACE E,FENG S,KANDPAL N,et al.Universal adversarial triggers for nlp[J].arXiv:1908.07125,2019.
[66] LIANG R G,LYU P Z,et al.A Survey of Audiovisual Deepfake Detection Techniques[J].Journal of Cyber Security,2020,5(2):1-17.
[67] YU L,ZHANG W,et al.Seqgan:Sequence generative adversarial nets with policy gradient[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[1] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[4] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[5] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[6] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[8] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[9] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[10] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[11] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[12] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[13] 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫.
小样本雷达辐射源识别的深度学习方法综述
Survey of Deep Learning for Radar Emitter Identification Based on Small Sample
计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138
[14] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[15] 孙福权, 崔志清, 邹彭, 张琨.
基于多尺度特征的脑肿瘤分割算法
Brain Tumor Segmentation Algorithm Based on Multi-scale Features
计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!