计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 351-361.doi: 10.11896/jsjkx.220800269

• 信息安全 • 上一篇    下一篇

面向频谱接入深度强化学习模型的后门攻击方法

魏楠1, 魏祥麟2, 范建华2, 薛羽1, 胡永扬2   

  1. 1 南京信息工程大学计算机与软件学院 南京 210044
    2 国防科技大学第六十三研究所 南京 210007
  • 收稿日期:2022-08-31 修回日期:2022-09-28 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 范建华(fjh7659@126.com)
  • 作者简介:20201249431@nuist.edu.cn

Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model

WEI Nan1, WEI Xianglin2, FAN Jianhua2, XUE Yu1, HU Yongyang2   

  1. 1 School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China
  • Received:2022-08-31 Revised:2022-09-28 Online:2023-01-15 Published:2023-01-09
  • About author:WEI Nan,born in 1998,postgraduate.Her main research interests include deep reinforcement learning and spectrum intelligent computing.
    FAN Jianhua,born in 1971,Ph.D,research fellow.His main research intere-sts include software defined radio and spectrum intelligent computing.

摘要: 深度强化学习(Deep Reinforcement Learning,DRL)方法以其在智能体感知和决策方面的优势,在多用户智能动态频谱接入问题上得到广泛关注。然而,深度神经网络的弱可解释性使得DRL模型容易受到后门攻击威胁。针对认知无线网络下基于深度强化学习模型的动态频谱接入(Dynamic Spectrum Access,DSA)场景,提出了一种非侵入、开销低的后门攻击方法。攻击者通过监听信道使用情况来选择非侵入的后门触发器,随后将后门样本添加到次用户的DRL模型训练池,并在训练阶段将后门植入DRL模型中;在推理阶段,攻击者主动发送信号激活模型中的触发器,使次用户做出目标动作,降低次用户的信道接入成功率。仿真结果表明,所提后门攻击方法能够在不同规模的DSA场景下达到90%以上的攻击成功率,相比持续攻击可以减少20%~30%的攻击开销,并适用于3种不同类型的DRL模型。

关键词: 动态频谱接入, 深度强化学习, 后门攻击, 触发器

Abstract: Deep reinforcement learning(DRL) has attracted much attention in multi-user intelligent dynamic spectrum access due to its advantages in sensing and decision making.However,the weak interpretability of deep neural networks(DNNs) makes DRL models vulnerable to backdoor attacks.In this paper,a non-invasive backdoor attack method with low-cost is proposed against DSA-oriented DRL models in cognitive wireless networks.The attacker monitors the wireless channels to select backdoor triggers,and generates backdoor samples into the experience pool of a secondary user's DRL model.Then,the trigger can be implanted into the DRL model during the training phase.The attacker actively sends signals to activate the triggers in the DRL model during the inference phase,inducing secondary users to take the actions set by the attacker,thereby reducing their success rate of channel access.A series of simulation show that the proposed backdoor attack method can reduce the attack cost by 20%~30% while achieving an attack success rate over 90%,and is suitable for three different DRL models.

Key words: Dynamic spectrum access, Deep reinforcement learning, Backdoor attack, Trigger

中图分类号: 

  • TN925
[1]ALWARAFYA,ABDALLAHM,ÇIFTLERBS,et al.The Frontiers of Deep Reinforcement Learning for Resource Management in Future Wireless HetNets: Techniques,Challenges,and Research Directions[J].IEEE Open Journal of the Communications Society,2022(3): 322-365.
[2]HAN H,XU Y F,JIN Z,et al.Primary-User-Friendly Dynamic Spectrum Anti-Jamming Access: A GAN-Enhanced Deep Reinforcement Learning Approach[J].IEEE Wireless Communications Letters,2022,11(2): 258-262.
[3]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6.
[4]CHEN J Y,ZHANG Y,WANG X K,et al.A Survey of Attack,Defense and Related Security Analysis for Deep Reinforcement Learning[J].Acta Automatica Sinica,2022,48(1):21-39.
[5]LIU J W,GAO F,LUO X L.Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438.
[6]CHANG H H,SONG H,YI Y,et al.Distributive DynamicSpectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach[J].IEEE Internet of Things Journal,2019,6(2):1938-1948.
[7]ZHANG Y,CAI P,PAN C,et al.Multi-Agent Deep Reinforcement Learning-Based Cooperative Spectrum Sensing with Upper Confidence Bound Exploration[J].IEEE Access,2019(7): 118898-118906.
[8]TOMOVIC S,RADUSINOVIC I.A Novel Deep Q-learningMethod for Dynamic Spectrum Access[C]//2020 28th Telecommunications Forum(TELFOR).2020:1-4.
[9]XU Y,YU J,BUEHRER R M.The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments with Partial Observations[J].IEEE Transactions on Wireless Communications,2020,19(7): 4494-4506.
[10]ZHONG C,LU Z,GURSOY M C,et al.A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access[J].IEEE Transactions on Cognitive Communications and Networking,2019,5(4):1125-1139.
[11]GU T,LIU K,DOLAN-GAVITT B,et al.BadNets: Evaluating Backdooring Attacks on Deep Neural Networks[J].IEEE Access,2019(7): 47230-47244.
[12]GAO Y,DOAN B G,ZHANG Z,et al.Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review[J].arXiv: 2007.10760,2020.
[13]YANG Z,IYER N,REIMANN J,et al.Design of intentionalbackdoors in sequential models[J].arXiv: 1902.09972,2019.
[14]KIOURTI P,WARDEGA K,JHA S,et al.TrojDRL:Evalua-tion of Backdoor Attacks on Deep Reinforcement Learning[C]//2020 57th ACM/IEEE Design Automation Conference(DAC).ACM,2020:1-6.
[15]WANG Y,SARKER E,LI W Q,et al.Stop-and-Go: Exploring Backdoor Attacks on Deep Reinforcement Learning-based Traffic Congestion Control Systems[J].IEEE Transactions on Information Forensics and Security,2021(16): 4772-4787.
[16]ISLAM S,BADSHA S,KHALIL I,et al.A Triggerless Backdoor Attack and Defense Mechanism for Intelligent Task Offloading in Multi-UAV Systems[J].IEEE Internet of Things Journal(in press),doi: 10.1109/JIOT.2022.3172936,2022.
[17]SAGDUYU Y E,SHI Y,ERPEK T.Adversarial Deep Learning for Over-the-Air Spectrum Poisoning Attacks[J].IEEE Transac-tions on Mobile Computing,2021,20(2):306-319.
[18]KIM B,SHI Y,SAGDUYU Y E,et al.Adversarial Attacksagainst Deep Learning Based Power Control in Wireless Communications[C]//2021 IEEE Globecom Workshops(GC Wkshps).2021:1-6.
[19]LUO Z P,ZHAO S Q,LU Z,et al.When Attackers Meet AI: Learning-Empowered Attacks in Cooperative Spectrum Sensing[J].IEEE Transactions on Mobile Computing,2022,21(5):1892-1908.
[20]ZHONG C,WANG F,GURSOY M C,et al.Adversarial Jamming Attacks on Deep Reinforcement Learning Based Dynamic Multichannel Access[C]// 2020 IEEE Wireless Communications and Networking Conference(WCNC).IEEE,2020:1-6.
[21]PAN X N,CHEN Z,LI J Z,et al.A dynamic spectrum access algorithm based on prioritized experience replay deep Q-Learning[J].Telecommunication Engineering,2020,60(5):489-495.
[22]XING L,XU Y H,LI G Q,et al.Channel estimation algorithm for MIMO systems based on deep learning[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2022,34(4):685-693.
[23]PENG A J,WANG S X,PAN C H,et al.Multi-pair two-way massive MIMO DF relaying over Rician fading channels under imperfect CSI[J].IEEE Wireless Communication Letters,2022,11(2):225-229.
[24]ASHCRAFT C,KARRA K.Poisoning Deep ReinforcementLearning Agents with In-Distribution Triggers[J].arXiv: 2106.07798,2021.
[1] 黄昱洲, 王立松, 秦小麟.
一种基于深度强化学习的无人小车双层路径规划方法
Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning
计算机科学, 2023, 50(1): 194-204. https://doi.org/10.11896/jsjkx.220500241
[2] 徐平安, 刘全.
基于相似度约束的双策略蒸馏深度强化学习方法
Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation
计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167
[3] 张启阳, 陈希亮, 张巧.
基于轨迹感知的稀疏奖励探索方法
Sparse Reward Exploration Method Based on Trajectory Perception
计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[8] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[9] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[10] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[11] 蔡岳, 王恩良, 孙哲, 孙知信.
基于双重指针网络的车货匹配双重序列决策研究
Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network
计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257
[12] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[13] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[14] 周仕承, 刘京菊, 钟晓峰, 卢灿举.
基于深度强化学习的智能化渗透测试路径发现
Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning
计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
[15] 李贝贝, 宋佳芮, 杜卿芸, 何俊江.
DRL-IDS:基于深度强化学习的工业物联网入侵检测系统
DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things
计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!