计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 258-267.doi: 10.11896/jsjkx.200500078

• 信息安全 • 上一篇    下一篇

面向自然语言处理的深度学习对抗样本综述

仝鑫, 王斌君, 王润正, 潘孝勤   

  1. 中国人民公安大学信息网络安全学院 北京 100038
  • 收稿日期:2020-05-18 修回日期:2020-08-25 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 王斌君(wangbinjun@ppsuc.edu.cn)
  • 作者简介:tongxindotnet@outlook.com
  • 基金资助:
    2020 CCF-绿盟科技“鲲鹏”科研基金(CCF-NSFOCUS 2020011);公安部科技强警基础专项(2018GABJC03);国家社会科学基金重点项目(20AZD114);中国人民公安大学拔尖创新人才培养经费支持硕士研究生项目(2020ssky005);中国人民公安大学公共安全行为科学研究与技术创新项目

Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing

TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin   

  1. School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China
  • Received:2020-05-18 Revised:2020-08-25 Online:2021-01-15 Published:2021-01-15
  • About author:TONG Xin,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include adversarial examples and natural language processing.
    WANG Bin-jun,born in 1962,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include natural language processing and information security.
  • Supported by:
    2020 CCF-Nsfocus “Kunpeng” Research Fund(CCF-NSFOCUS 2020011),Science and Technology Strengthening Police Basic Program of Ministry of Public Security(2018GABJC03),Key Program of the National Social Science Foundation of China(20AZD114),Top Talent Training Special Funding Graduate Research and Innovation Project of People's Public Security University of China(2020ssky005),and Scientific Research and Technological Innovation on Public Security Behavior of People's Public Security University of China.

摘要: 深度学习模型被证明存在脆弱性并容易遭到对抗样本的攻击,但目前对于对抗样本的研究主要集中在计算机视觉领域而忽略了自然语言处理模型的安全问题。针对自然语言处理领域同样面临对抗样本的风险,在阐明对抗样本相关概念的基础上,文中首先对基于深度学习的自然语言处理模型的复杂结构、难以探知的训练过程和朴素的基本原理等脆弱性成因进行分析,进一步阐述了文本对抗样本的特点、分类和评价指标,并对该领域对抗技术涉及到的典型任务和数据集进行了阐述;然后按照扰动级别对主流的字、词、句和多级扰动组合的文本对抗样本生成技术进行了梳理,并对相关防御方法进行了归纳总结;最后对目前自然语言处理对抗样本领域攻防双方存在的痛点问题进行了进一步的讨论和展望。

关键词: 自然语言处理, 深度学习, 人工智能安全, 对抗样本, 鲁棒性

Abstract: Deep learning models have been proven to be vulnerable and easy to be attacked by adversarial examples,but the current researches on adversarial samples mainly focus on the field of computer vision and ignore the security of natural language processing models.In response to the same risk of adversarial samples faced in the field of natural language processing(NLP),this paper clarifies the concepts related to adversarial samples as the basis of further research.Firstly,it analyzes causes of vulnerabilities,including complex structure of the natural language processing model based on deep learning,the training process that is difficult to detect and the naive basic principles,further elaborates the characteristics,classification and evaluation metrics of text adversarial examples,and introduces the typical tasks and classical datasets involved in the adversarial examples related to researches in the field of natural language processing.Secondly,according to different perturbation levels,it sorts out various text adversarial examples generation technology of mainstream char-level,word-level,sentence-level and multi-level.What's more,it summarizes defense methods,which are relevant to data,models and inference,and compares their advantages and disadvantages.Finally,the pain points of both attack and defense sides in thefield of current NLP adversarial samples are further discussed and anticipated.

Key words: Natural language processing, Deep learning, AI security, Adversarial examples, Robustness

中图分类号: 

  • TP301
[1] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Memory[J].Neural computation,1997,9(8):1735-1780.
[2] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[3] PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:1532-1543.
[4] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[5] YANG Z,DAI Z,YANG Y,et al.XLNet:Generalized Autoregressive Pretraining for Language Understanding[C]//Advances in Neural Information Processing Systems.2019:5754-5764.
[6] WANG W,WANG L,TANG B,et al.Towards a Robust Deep Neural Network in Text Domain A Survey[J].arXiv:1902.07285,2019.
[7] SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[J].arXiv:1312.6199,2013.
[8] PAN W B,WANG X Y.Survey on Generating Adversarial Examples[J].Journal of Software,2020,31(1):67-81.
[9] ASHISH V.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[10] NIVEN T,KAO H Y.Probing Neural Network Comprehension of Natural Language Arguments[J].arXiv:1907.07355,2019.
[11] KUSNER M,SUN Y,KOLKIN N,et al.From word embeddings to document distances[C]//International Conference on Machine Learning.2015:957-966.
[12] HUANG G,GUO C,KUSNER M J,et al.Supervised WordMover's Distance[C]//Advances in Neural Information Processing Systems.2016:4862-4870.
[13] WU L.Word mover's embedding:From word2vec to document embedding[J].arXiv:1811.01713,2018.
[14] DONG Y,FU Q A,YANG X,et al.Benchmarking Adversarial Robustness[J].arXiv:1912.11852,2019.
[15] MICHEL P,LI X,NEUBIG G,et al.On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models[J].ar-Xiv:1903.06620,2019.
[16] GIANNA M D C,ANTONIO G,FRANCESCO R,et al.Ran-king a stream of news[C]//Proceedings of the 14th Internatio-nal Conference on World Wide Web.2005:97-106.
[17] RICHARD S.Recursive deep models for semantic compositio-nality over a sentiment Treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Proces-sing.2013:1631-1642.
[18] CETTOLO M,GIRARDI C,FEDERICO M.Wit3:Web inventory of transcribed and translated talks[C]//Conference of European Association for Machine Translation.2012:261-268.
[19] RAJPURKAR P,ZHANG J,LOPYREV K,et al.SQuAD:100 000+ Questions for Machine Comprehension of Text[J].arXiv:1606.05250,2016.
[20] RAJPURKAR P,JIA R,LIANG P.Know What You Don'tKnow:Unanswerable Questions for SQuAD[J].arXiv:1806.03822,2018.
[21] GOYAL Y,KHOT T,SUMMERS-STAY D,et al.Making the V in VQA matter:Elevating the role of image understanding in Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:6904-6913.
[22] BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[J].arXiv:1508.05326,2015.
[23] WILLIAMS A,NANGIA N,BOWMAN S R.A broad-coverage challenge corpus for sentence understanding through inference[J].arXiv:1704.05426,2017.
[24] ERIK F,SANG T K,DE MEULDER F D.Introduction to the CoNLL-2003 shared task:Language-independent named entity recognition[J].arXiv:0306050,2003.
[25] BELINKOV Y,BISK Y.Synthetic and natural noise both break neural machine translation[J].arXiv:1711.02173,2017.
[26] GAO J,LANCHANTIN J,SOFFA M L,et al.Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//2018 IEEE Security and Privacy Workshops (SPW).IEEE,2018:50-56.
[27] WANG W Q,WANG R.Adversarial Examples Generation Approach for Tendency Classification on Chinese Texts[J].Journal of Software,2019,30(8):2415-2427.
[28] EBRAHIMI J,LOWD D,DOU D.On adversarial examples for character-level neural machine translation[J].arXiv:1806.09030,2018.
[29] EGER S,?AHIN G G,RüCKLè A,et al.Text processing like humans do:Visually attacking and shielding NLP systems[J].arXiv:1903.11508,2019.
[30] PAPERNOT N,MCDANIEL P,SWAMI A,et al.Crafting adversarial input sequences for recurrent neural networks[C]//MILCOM 2016-2016 IEEE Military Communications Confe-rence.IEEE,2016:49-54.
[31] GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[J].arXiv:1412.6572,2014.
[32] JIN D,JIN Z,ZHOU J T,et al.Is BERT Really Robust?A Strong Baseline for Natural Language Attack on Text Classification and Entailment[J].AAAI2020,arXiv:1907.11932,2019.
[33] SAMANTA S,MEHTA S.Towards crafting text adversarial samples[J].arXiv:1707.02812,2017.
[34] SATO M,SUZUKI J,SHINDO H,et al.Interpretable adversarial perturbation in input embedding space for text[J].arXiv:1805.02917,2018.
[35] ZHANG H,ZHOU H,MIAO N,et al.Generating Fluent Adversarial Examples for Natural Languages[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:5564-5569.
[36] ALZANTOT M,SHARMA Y,ELGOHARY A,et al.Generating natural language adversarial examples[J].arXiv:1804.07998,2018.
[37] ZANG Y,YANG C,QI F,et al.Textual Adversarial Attack as Combinatorial Optimization[J].arXiv:1910.12196,2019.
[38] REN S,DENG Y,HE K,et al.Generating natural language adversarial examples through probability weighted word saliency[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1085-1097.
[39] JIA R,LIANG P.Adversarial examples for evaluating reading comprehension systems[J].arXiv:1707.07328,2017.
[40] MINERVINI P,RIEDEL S.Adversarially regularising neural nli models to integrate logical background knowledge[J].arXiv:1808.08609,2018.
[41] CHENG Y,JIANG L,MACHEREY W.Robust neural machine translation with doubly adversarial inputs[J].arXiv:1906.02443,2019.
[42] IYYER M,WIETING J,GIMPEL K,et al.Adversarial example generation with syntactically controlled paraphrase networks[J].arXiv:1804.06059,2018.
[43] ZHAO Z,DUA D,SINGH S.Generating natural adversarial examples[J].arXiv:1710.11342,2017.
[44] ARJOVSKY M,CHINTALA S,BOTTOU L.Wasserstein gan[J].arXiv:1701.07875,2017.
[45] WALLACE E,RODRIGUEZ P,FENG S,et al.Trick me if you can:Human-in-the-loop generation of adversarial examples for question answering[J].Transactions of the Association for Computational Linguistics,2019,7(2019):387-401.
[46] RIBEIRO M T,SINGH S,GUESTRIN C.Semantically equivalent adversarial rules for debugging nlp models[C]//Procee-dings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:856-865.
[47] LI J,JI S,DU T,et al.Textbugger:Generating adversarial text against real-world applications[J].arXiv:1812.05271,2018.
[48] EBRAHIMI J,RAO A,LOWD D,et al.Hotflip:White-box adversarial examples for text classification[J].arXiv:1712.06751,2017.
[49] VIJAYARAGHAVAN P,ROY D.Generating Black-Box Ad-versarial Examples for Text Classifiers Using a Deep Reinforced Model[J].arXiv:1909.07873,2019.
[50] LIANG B,LI H,SU M,et al.Deep text classification can be fooled[J].arXiv:1704.08006,2017.
[51] GARDNER M,ARTZI Y,BASMOVA V,et al.Evaluating nlp models via contrast sets[J].arXiv:2004.02709,2020.
[52] PRUTHI D,DHINGRA B,LIPTON Z C.Combating adversarial misspellings with robust word recognition[J].arXiv:1905.11268,2019.
[53] ZHOU Y,JIANG J Y,CHANG K W,et al.Learning to discriminate perturbations for blocking adversarial attacks in text classification[J].arXiv:1909.03084,2019.
[54] TANAY T,GRIFFIN L D.A New Angle on L2 Regularization[J].arXiv:1806.11186,2018.
[55] PAPERNOT N,MCDANIEL P,WU X,et al.Distillation as a defense to adversarial perturbations against deep neural networks[C]//2016 IEEE Symposium on Security and Privacy(SP).IEEE,2016:582-597.
[56] MIYATO T,DAI A M,GOODFELLOW I.Adversarial training methods for semi-supervised text classification[J].arXiv:1605.07725,2016.
[57] MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[J].arXiv:1706.06083,2017.
[58] LI L,QIU X.TextAT:Adversarial Training for Natural Language Understanding with Token-Level Perturbation[J].arXiv:2004.14543,2020.
[59] DINAN E,HUMEAU S,CHINTAGUNTA B,et al.Build itbreak it fix it for dialogue safety:Robustness from adversarial human attack[J].arXiv:1908.06083,2019.
[60] HE W,WEI J,CHEN X,et al.Adversarial example defense:Ensembles of weak defenses are not strong[C]//11th USENIX Workshop on Offensive Technologies (WOOT 17).2017.
[61] KO C Y,LYU Z,WENG T W,et al.POPQORN:Quantifying robustness of recurrent neural networks[J].arXiv:1905.07387,2019.
[62] SHI Z,ZHANG H,CHANG K W,et al.Robustness verification for transformers[J].arXiv:2002.06622,2020.
[63] GOODMAN D,XIN H,YANG W,et al.Advbox:a toolbox to generate adversarial examples that fool neural networks[J].arXiv:2001.05574,2020.
[64] ATHALYE A,CARLINI N,WAGNER D.Obfuscated gradi-ents give a false sense of security:Circumventing defenses to adversarial examples[J].arXiv:1802.00420,2018.
[65] WALLACE E,FENG S,KANDPAL N,et al.Universal adversarial triggers for nlp[J].arXiv:1908.07125,2019.
[66] LIANG R G,LYU P Z,et al.A Survey of Audiovisual Deepfake Detection Techniques[J].Journal of Cyber Security,2020,5(2):1-17.
[67] YU L,ZHANG W,et al.Seqgan:Sequence generative adversarial nets with policy gradient[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[1] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[2] 于文家, 丁世飞. 基于自注意力机制的条件生成对抗网络[J]. 计算机科学, 2021, 48(1): 241-246.
[3] 陆龙龙, 陈统, 潘敏学, 张天. CodeSearcher:基于自然语言功能描述的代码查询[J]. 计算机科学, 2020, 47(9): 1-9.
[4] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[5] 田野, 寿黎但, 陈珂, 骆歆远, 陈刚. 基于字段嵌入的数据库自然语言查询接口[J]. 计算机科学, 2020, 47(9): 60-66.
[6] 何鑫, 许娟, 金莹莹. 行为关联网络:完整的变化行为建模[J]. 计算机科学, 2020, 47(9): 123-128.
[7] 叶亚男, 迟静, 于志平, 战玉丽, 张彩明. 基于改进CycleGan模型和区域分割的表情动画合成[J]. 计算机科学, 2020, 47(9): 142-149.
[8] 邓良, 许庚林, 李梦杰, 陈章进. 基于深度学习与多哈希相似度加权实现快速人脸识别[J]. 计算机科学, 2020, 47(9): 163-168.
[9] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[10] 袁野, 和晓歌, 朱定坤, 王富利, 谢浩然, 汪俊, 魏明强, 郭延文. 视觉图像显著性检测综述[J]. 计算机科学, 2020, 47(7): 84-91.
[11] 王文刀, 王润泽, 魏鑫磊, 漆云亮, 马义德. 基于堆叠式双向LSTM的心电图自动识别算法[J]. 计算机科学, 2020, 47(7): 118-124.
[12] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
[13] 张志扬, 张凤荔, 谭琪, 王瑞锦. 基于深度学习的信息级联预测方法综述[J]. 计算机科学, 2020, 47(7): 141-153.
[14] 蒋文斌, 符智, 彭晶, 祝简. 一种基于4Bit编码的深度学习梯度压缩算法[J]. 计算机科学, 2020, 47(7): 220-226.
[15] 张迎, 张宜飞, 王中卿, 王红玲. 基于主次关系特征的自动文摘方法[J]. 计算机科学, 2020, 47(6A): 6-11.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[3] 朱淑芹,王文宏,李俊青. 针对基于感知器模型的混沌图像加密算法的选择明文攻击[J]. 计算机科学, 2018, 45(4): 178 -181 .
[4] 侯彦娥,孔云峰,党兰学. 求解多车型校车路径问题的混合集合划分的GRASP算法[J]. 计算机科学, 2018, 45(4): 240 -246 .
[5] 魏芹双,武优西,刘靖宇,朱怀忠. 基于密度约束和间隙约束的对比模式挖掘[J]. 计算机科学, 2018, 45(4): 252 -256 .
[6] 李慧,周林,辛文波. 基于双层规划的网络化防空作战编队结构优化[J]. 计算机科学, 2018, 45(4): 266 -272 .
[7] 赵利博,刘奇,付方玲,何凌. 基于小波变换和倒谱分析的腭裂高鼻音等级自动识别[J]. 计算机科学, 2018, 45(4): 278 -284 .
[8] 戴文静, 袁家斌. 隐含子群问题的研究现状[J]. 计算机科学, 2018, 45(6): 1 -8 .
[9] 杨沛安, 武杨, 苏莉娅, 刘宝旭. 网络空间威胁情报共享技术综述[J]. 计算机科学, 2018, 45(6): 9 -18 .
[10] 胡雅鹏, 丁维龙, 王桂玲. 一种面向异构大数据计算框架的监控及调度服务[J]. 计算机科学, 2018, 45(6): 67 -71 .