计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 425-433.doi: 10.11896/jsjkx.240900007

• 信息安全 • 上一篇    下一篇

一种基于深度分区聚合的神经网络后门样本过滤方法

郭嘉铭1, 杜文韬1, 杨超2,3   

  1. 1 湖北大学网络空间安全学院 武汉 430062
    2 湖北大学计算机学院 武汉 430062
    3 智慧政务与人工智能应用湖北省工程研究中心 武汉 430062
  • 收稿日期:2024-09-02 修回日期:2024-11-21 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 杨超(stevenyc@hubu.edu.cn)
  • 作者简介:(1196951311@qq.com)
  • 基金资助:
    国家自然科学基金(61977021);湖北省重点研发计划(2021BAA188)

Neural Network Backdoor Sample Filtering Method Based on Deep Partition Aggregation

GUO Jiaming1, DU Wentao1, YANG Chao2,3   

  1. 1 School of Cyber Science and Technology,Hubei University,Wuhan 430062,China
    2 School of Computer Science,Hubei University,Wuhan 430062,China
    3 Engineering Research Center of Hubei Province in Intelligent Government Affairs and Application of Artificial Intelligence,Wuhan 430062,China
  • Received:2024-09-02 Revised:2024-11-21 Online:2025-11-15 Published:2025-11-06
  • About author:GUO Jiaming,born in 2000,postgra-duate.His main research interests include AI security and offence-defense.
    YANG Chao,born in 1982,Ph.D,professor,postgraduate supervisor,is a member of CCF(No.94791M).His main research interests include information security and computer immunology.
  • Supported by:
    National Natural Science Foundation of China(61977021) and Key R&D Program of Hubei Province,China(2021BAA188).

摘要: 深度神经网络易受后门攻击,攻击者可以通过数据投毒的方式植入后门并劫持模型的行为。其中,类特定攻击映射关系复杂、与正常任务关联紧密,因而能绕过大多数防御方法,具有更高的威胁性。文中研究了类特定攻击在植入后门的过程中攻击成功率与模型分类性能的关系,总结出3条性质,并以此为基础设计了一种针对类特定攻击的样本过滤方法。该方法使用深度分区聚合(Deep Partition Aggregation,DPA)的集成学习方法与投票法对数据集进行反复迭代过滤。根据类特定攻击的3条性质,从数学层面证明了该过滤方法的有效性,并在标准分类数据集上进行了大量实验,在迭代4轮后均能过滤95%以上的后门样本。同时,与最新的样本过滤方法的对比实验结果,体现了所提过滤方法在针对类特定攻击时的优越性。文中实验基于Github的开源项目backdoorbox开展。

关键词: 深度学习, 数据投毒, 后门攻击, 类特定攻击, 集成学习, 样本过滤

Abstract: Deep neural networks are vulnerable to backdoor attacks,where attackers can implant backdoors and hijack model behavior by poisoning data.Among them,class-specific attacks can bypass most defense methods due to their complex mapping relationships and close association with normal tasks,making them more threatening.This paper studies the relationship between attack success rate and model classification performance in the process of implanting backdoors for class-specific attacks,summarizes three properties,and designs a sample filtering method based on these properties to address class-specific attacks.This methoduses the Deep Partition Aggregation(DPA) ensemble learning method and voting method to iteratively filter backdoor samples.This paper mathematically proves the effectiveness of this filtering method based on three properties of class-specific attacks,and conducts extensive experiments on standard classification datasets.After four iterations,it filters more than 95% of backdoor samples in all experiments.At the same time,the results of comparative experiments with the latest sample filtering methods,demonstrate the superiority of proposed method in addressing class-specific attacks.The experiments in this paper are based on the open-source project backdoorbox on Github.

Key words: Deep learning, Data poisoning, Backdoor attack, Class-specific attacks, Ensemble learning, Sample filtering

中图分类号: 

  • TN915.08
[1]BROWN A,HUH J,CHUNG J S,et al.VoxSRC 2021:The Third VoxCeleb Speaker Recognition Challenge[J].arXiv:2201.04583,2022.
[2]QIU X P,SUN T X,XU Y G,et al.Pre-trained models for natural language processing:A survey[J].Science China(Technological Sciences),2020,63(10):1872-1897.
[3]BISONG E.Building Machine Learning and Deep Learning Models on Google Cloud Platform:A Comprehensive Guide for Beginners[M].Berkely:Apress,2019.
[4]YAN B,LAN J,YAN Z.Backdoor attacks against voice recognition systems:A survey[J].arXiv:2307.13643,2023.
[5]LI Y,JIANG Y,LI Z,et al.Backdoor learning:A survey[J].IEEE Transactions on Neural Networks and Learning Systems,2022,35(1):5-22.
[6]GAO Y,DOAN B G,ZHANG Z,et al.Backdoor attacks and countermeasures on deep learning:A comprehensive review[J].arXiv:2007.10760,2020.
[7]JAVAHERIPI M,SAMRAGH M,FIELDS G,et al.Cleann:Accelerated trojan shield for embedded neural networks[C]//Proceedings of the 39th International Conference on Computer-Aided Design.2020:1-9.
[8]TIAN Z,CUI L,LIANG J,et al.A comprehensive survey on poisoning attacks and countermeasures in machine learning[J].ACM Computing Surveys,2022,55(8):1-35.
[9]GU T,LIU K,DOLAN-GAVITT B,et al.Evaluating Backdooring Attacks on Deep Neural Networks[J].IEEE Access,2019,7:47230-47244.
[10]WANG B,YAO Y,SHAN S,et al.Neural cleanse:Identifying and mitigating backdoor attacks in neural networks[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:707-723.
[11]DONG Y,YANG X,DENG Z,et al.Black-box detection ofbackdoor attacks with limited information and data[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:16482-16491.
[12]CHOU E,TRAMER F,PELLEGRINO G.Sentinet:Detectinglocalized universal attacks against deep learning systems[C]//2020 IEEE Security and Privacy Workshops(SPW).IEEE,2020:48-54.
[13]GUO J,LI Y,CHEN X,et al.Scale-up:An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency[J].arXiv:2302.03251,2023.
[14]HOU L,FENG R,HUA Z,et al.IBD-PSC:Input-level Backdoor Detection via Parameter-oriented Scaling Consistency[J].arXiv:2405.09786,2024.
[15]LEVINE A,FEIZI S.Deep partition aggregation:Provable de-fense against general poisoning attacks[J].arXiv:2006.14768,2020.
[16]KRIZHEVSKY A.Learning multiple layers of features from tiny images[J/OL].http://www.cs.toronto.edu/~kriz/lear-ning-features-2009-TR.pdf.
[17]SAADNA Y,BEHLOUL A.An overview of traffic sign detection and classification methods[J].International Journal of Multimedia Information Retrieval,2017,6:193-210.
[18]CHEN X,LIU C,LI B,et al.Targeted backdoor attacks on deep learning systems using data poisoning[J].arXiv:1712.05526,2017.
[19]LI Y,ZHAI T,JIANG Y,et al.Backdoor attack in the physical world[J].arXiv:2104.02361,2021.
[20]NGUYEN A,TRAN A.Wanet-imperceptible warping-basedbackdoor attack[J].arXiv:2102.10369,2021.
[21]DOAN K,LAO Y,ZHAO W,et al.Lira:Learnable,imperceptible and robust backdoor attacks[C]//Proceedings of the IEEE/CVF international conference on computer vision.2021:11966-11976.
[22]SOURI H,FOWL L,CHELLAPPA R,et al.Sleeper agent:Scalable hidden trigger backdoors for neural networks trained from scratch[J].Advances in Neural Information Processing Systems,2022,35:19165-19178.
[23]TRAN B,LI J,MADRY A.Spectral Signatures in Backdoor Attacks[J].arXiv:1811.00636,2018.
[24]HAYASE J,KONG W.SPECTRE:Defending against backdoor attacks using robust covariance estimation[C]//International Conference on Machine Learning.2021:4129-4139.
[25]ZENG Y,PARK W,MAO Z M,et al.Rethinking the backdoorattacks' triggers:A frequency perspective[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:16473-16481.
[26]HUANG H,MA X,ERFANI S,et al.Distilling cognitive backdoor patterns within an image[J].arXiv:2301.10908,2023.
[27]AMARNATH C,BALWANI A H,MA K,et al.Tesda:Transform enabled statistical detection of attacks in deep neural networks[J].arXiv:2110.08447,2021.
[28]CHEN B,CARVALHO W,BARACALDO N,et al.Detectingbackdoor attacks on deep neural networks by activation clustering[J].arXiv:1811.03728,2018.
[29]LIU G,KHREISHAH A,SHARADGAH F,et al.An adaptive black-box defense against trojan attacks(trojdef)[J].IEEE Transactions on Neural Networks and Learning Systems,2022,35(4):5367-5381.
[30]CHEN W,WU B,WANG H.Effective backdoor defense by exploiting sensitivity of poisoned samples[J].Advances in Neural Information Processing Systems,2022,35:9727-9737.
[31]LECUN Y,JACKEL L D,BOTTOU L,et al.Learning algo-rithms for classification:A comparison on handwritten digit re-cognition[J].Neural Networks:the Statistical Mechanics Perspective,1995,261(276):2.
[32]BREIMAN L.Bagging Predictors[J].Machine Learning,1996,24:123-140.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!