计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 374-383.doi: 10.11896/jsjkx.250300064

• 信息安全 • 上一篇    下一篇

基于特征分布的高鲁棒模型结构后门方法

陈先意1,2,3, 张成娟2, 钱江峰4, 郭倩彬2, 崔琦1,2, 付章杰1,2   

  1. 1 南京信息工程大学数字取证教育部工程研究中心 南京 210044
    2 南京信息工程大学计算机学院、网络空间安全学院 南京 210044
    3 江苏羽驰区块链科技研究院有限公司 南京 210018
    4 南瑞集团(国网电力科学研究院)有限公司 南京 211106
  • 收稿日期:2025-03-12 修回日期:2025-05-21 出版日期:2025-12-15 发布日期:2025-12-09
  • 通讯作者: 崔琦(cuiqi@nuist.edu.cn)
  • 作者简介:(xianyi_chen@nuist.edu.cn)

Highly Robust Model Structure Backdoor Method Based on Feature Distribution

CHEN Xianyi1,2,3, ZHANG Chengjuan2, QIAN Jiangfeng4, GUO Qianbin2, CUI Qi1,2, FU Zhangjie1,2   

  1. 1 Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044, China
    2 School of Computer Science, School of Cyber Science and Engineering, Nanjing University of Information Science and Technology, Naning 210044, China
    3 Jiangsu Yuchi Blockchain Technology Research Institute Co., Ltd., Nanjing 210018, China
    4 NARI Group Corporation(State Grid Electric Power Research Institute), Nanjing 211106, China
  • Received:2025-03-12 Revised:2025-05-21 Published:2025-12-15 Online:2025-12-09
  • About author:CHEN Xianyi,born in 1986,Ph.D,associate professor,master’s supervisor,is a member of CCF(No.56536M).His main research interests include artificial intelligencesecurity and big data security.
    CUI Qi,born in 1994,Ph.D,associate professor,master’s supervisor.His main research interests include information hiding and deep learning model security.

摘要: 模型后门攻击通常将待触发后门隐藏在模型参数中,而在其构造的特定样本下激活预设输出。但这类方法容易遭受参数净化等防御技术的削弱,导致后门难以触发。为此,首次基于特征分布设计后门触发机制,构建了不依赖模型参数的结构后门,从而实现高隐蔽、高鲁棒的后门植入。首先,使用模型特征空间中的分布式触发器生成后门图像,使得后门激活更加稳定,从而提升攻击可靠性;其次,构建由分布检测器和后门寄存器组成的后门结构,并嵌入至目标层中,该结构化后门不依赖模型参数,可显著增强后门的鲁棒性和抗检测性;最后,利用分布检测器提取分布式触发模式,同时后门寄存器将被激活并完成模型特征污染,从而确保后门在预期条件下精确触发,使得后门效果更具针对性。实验结果表明,所提方法在模型经历20轮参数修改后依然能够达到100%的攻击成功率,且能够躲避多种先进的后门检测器。

关键词: 后门攻击, 深度神经网络, 机器学习, 鲁棒性, 模型安全

Abstract: Model backdoor attacks traditionally hide triggers within model parameters,activating predetermined outputs when specific samples are presented.However,such methods are vulnerable to defense techniques like parameter pruning,making backdoors difficult to trigger.This paper introduces a novel approach based on feature distribution for backdoor triggering,creating a structure-based backdoor independent of model parameters,achieving high concealment and robustness.Firstly,distribution-based triggers in the model’s feature space are used to generate backdoor images,enabling more stable backdoor activation and improving attack reliability.Secondly,a backdoor structure consisting of a distribution detector and backdoor register is embedded within target layers.This structured backdoor doesn’t rely on model parameters,significantly enhancing robustness and resis-tance to detection.Finally,the distribution detector extracts distribution-based trigger patterns while the backdoor register activates and contaminates model features,ensuring precise backdoor triggering under expected conditions for more targeted effects.Experimental results demonstrate that the proposed method maintains a 100% attack success rate even after 20 rounds of para-meter modifications and can evade multiple advanced backdoor detection mechanisms.

Key words: Backdoor attack, Deep neural networks, Machine learning, Robustness, Security of model

中图分类号: 

  • TP393
[1]LAURIOLA I,LAVELLI A,AIOLLI F.An introduction todeep learning in natural language processing:Models,techniques,and tools[J].Neurocomputing,2022,470:443-456.
[2]MIN B,ROSS H,SULEM E,et al.Recent advances in natural language processing via large pre-trained language models:A survey[J].ACM Computing Surveys,2023,56(2):1-40.
[3]ZAHRA A,PERWAIZ N,SHAHZAD M,et al.Person re-identification:A retrospective on domain specific open challenges and future trends[J].Pattern Recognition,2023,142:109669.
[4]CHIB P S,SINGH P.Recent advancements in end-to-end auto-nomous driving using deep learning:A survey[J].IEEE Transactions on Intelligent Vehicles,2023,9(1):103-118.
[5]MENGARA O,AVILA A,FALK T H.Backdoor Attacks toDeep Neural Networks:A Survey of the Literature,Challenges,and Future Research Directions[J].IEEE Access,2024,12:29004-29023.
[6]LI Y,ZHANG S,WANG W,et al.Backdoor attacks to deeplearning models and countermeasures:A survey[J].IEEE Open Journal of the Computer Society,2023,4:134-146.
[7]LI Y,JIANG Y,LI Z,et al.Backdoor learning:A survey[J].IEEE Transactions on Neural Networks and Learning Systems,2022,35(1):5-22.
[8]GUO W,TONDI B,BARNI M.An overview of backdoor at-tacks against deep neural networks and possible defences[J].IEEE Open Journal of Signal Processing,2022,3:261-287.
[9]GU T,DOLAN-GAVITT B,GARG S.Badnets:Identifying vu-lnerabilities in the machine learning model supply chain[J].ar-Xiv:1708.06733,2017.
[10]CHEN X,LIU C,LI B,et al.Targeted backdoor attacks on deep learning systems using data poisoning[J].arXiv:1712.05526,2017.
[11]LI S,XUE M,ZHAO B Z H,et al.Invisible backdoor attacks on deep neural networks via steganography and regularization[J].IEEE Transactions on Dependable and Secure Computing,2020,18(5):2088-2105.
[12]CHENG S,TAO G,LIU Y,et al.Lotus:Evasive and resilient backdoor attacks through sub-partitioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:24798-24809.
[13]HUANG Y,XU J F,GUO Q,et al.Personalization as a shortcut for few-shot backdoor attack against text-to-image diffusion models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:21169-21178.
[14]NGUYEN T A,TRAN A.Input-aware dynamic backdoor attack[J].Advances in Neural Information Processing Systems,2020,33:3454-3464.
[15]ZOU M,SHI Y,WANG C,et al.Potrojan:powerful neural-level trojan designs in deep learning models[J].arXiv:1802.03043,2018.
[16]QI X,XIE T,PAN R,et al.Towards practical deployment-stage backdoor attack on deep neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:13347-13357.
[17]BOBER-IRIZAR M,SHUMAILOV I,ZHAO Y,et al.Architectural backdoors in neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:24595-24604.
[18]CLIFFORD E,SHUMAILOV I,ZHAO Y,et al.ImpNet:Imperceptible and blackbox-undetectable backdoors in compiled neural networks[C]//2024 IEEE Conference on Secure and Trustworthy Machine Learning(SaTML).IEEE,2024:344-357.
[19]GAO Y,XU C,WANG D,et al.Strip:A defence against trojan attacks on deep neural networks[C]//Proceedings of the 35th Annual Computer Security Applications Conference.2019:113-125.
[20]DOAN B G,ABBASNEJAD E,RANASINGHE D C.Februus:Input purification defense against trojan attacks on deep neural network systems[C]//Proceedings of the 36th Annual Compu-ter Security Applications Conference.2020:897-912.
[21]WANG B,YAO Y,SHAN S,et al.Neural cleanse:Identifying and mitigating backdoor attacks in neural networks[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:707-723.
[22]ZHENG R,TANG R,LI J,et al.Pre-activation distributions expose backdoor neurons[J].Advances in Neural Information Processing Systems,2022,35:18667-18680.
[23]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[24]LECUN Y.The MNIST database of handwritten digits[EB/OL].http://yann.lecun.com/exdb/mnist/.
[25]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images:TR-2009[R].2009.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!