计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 351-361.doi: 10.11896/jsjkx.220800269
魏楠1, 魏祥麟2, 范建华2, 薛羽1, 胡永扬2
WEI Nan1, WEI Xianglin2, FAN Jianhua2, XUE Yu1, HU Yongyang2
摘要: 深度强化学习(Deep Reinforcement Learning,DRL)方法以其在智能体感知和决策方面的优势,在多用户智能动态频谱接入问题上得到广泛关注。然而,深度神经网络的弱可解释性使得DRL模型容易受到后门攻击威胁。针对认知无线网络下基于深度强化学习模型的动态频谱接入(Dynamic Spectrum Access,DSA)场景,提出了一种非侵入、开销低的后门攻击方法。攻击者通过监听信道使用情况来选择非侵入的后门触发器,随后将后门样本添加到次用户的DRL模型训练池,并在训练阶段将后门植入DRL模型中;在推理阶段,攻击者主动发送信号激活模型中的触发器,使次用户做出目标动作,降低次用户的信道接入成功率。仿真结果表明,所提后门攻击方法能够在不同规模的DSA场景下达到90%以上的攻击成功率,相比持续攻击可以减少20%~30%的攻击开销,并适用于3种不同类型的DRL模型。
中图分类号:
[1]ALWARAFYA,ABDALLAHM,ÇIFTLERBS,et al.The Frontiers of Deep Reinforcement Learning for Resource Management in Future Wireless HetNets: Techniques,Challenges,and Research Directions[J].IEEE Open Journal of the Communications Society,2022(3): 322-365. [2]HAN H,XU Y F,JIN Z,et al.Primary-User-Friendly Dynamic Spectrum Anti-Jamming Access: A GAN-Enhanced Deep Reinforcement Learning Approach[J].IEEE Wireless Communications Letters,2022,11(2): 258-262. [3]ZHAO X Y,DING S F.Research on Deep Reinforcement Lear-ning[J].Computer Science,2018,45(7):1-6. [4]CHEN J Y,ZHANG Y,WANG X K,et al.A Survey of Attack,Defense and Related Security Analysis for Deep Reinforcement Learning[J].Acta Automatica Sinica,2022,48(1):21-39. [5]LIU J W,GAO F,LUO X L.Survey of Deep Reinforcement Learning Based on Value Function and Policy Gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438. [6]CHANG H H,SONG H,YI Y,et al.Distributive DynamicSpectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach[J].IEEE Internet of Things Journal,2019,6(2):1938-1948. [7]ZHANG Y,CAI P,PAN C,et al.Multi-Agent Deep Reinforcement Learning-Based Cooperative Spectrum Sensing with Upper Confidence Bound Exploration[J].IEEE Access,2019(7): 118898-118906. [8]TOMOVIC S,RADUSINOVIC I.A Novel Deep Q-learningMethod for Dynamic Spectrum Access[C]//2020 28th Telecommunications Forum(TELFOR).2020:1-4. [9]XU Y,YU J,BUEHRER R M.The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments with Partial Observations[J].IEEE Transactions on Wireless Communications,2020,19(7): 4494-4506. [10]ZHONG C,LU Z,GURSOY M C,et al.A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access[J].IEEE Transactions on Cognitive Communications and Networking,2019,5(4):1125-1139. [11]GU T,LIU K,DOLAN-GAVITT B,et al.BadNets: Evaluating Backdooring Attacks on Deep Neural Networks[J].IEEE Access,2019(7): 47230-47244. [12]GAO Y,DOAN B G,ZHANG Z,et al.Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review[J].arXiv: 2007.10760,2020. [13]YANG Z,IYER N,REIMANN J,et al.Design of intentionalbackdoors in sequential models[J].arXiv: 1902.09972,2019. [14]KIOURTI P,WARDEGA K,JHA S,et al.TrojDRL:Evalua-tion of Backdoor Attacks on Deep Reinforcement Learning[C]//2020 57th ACM/IEEE Design Automation Conference(DAC).ACM,2020:1-6. [15]WANG Y,SARKER E,LI W Q,et al.Stop-and-Go: Exploring Backdoor Attacks on Deep Reinforcement Learning-based Traffic Congestion Control Systems[J].IEEE Transactions on Information Forensics and Security,2021(16): 4772-4787. [16]ISLAM S,BADSHA S,KHALIL I,et al.A Triggerless Backdoor Attack and Defense Mechanism for Intelligent Task Offloading in Multi-UAV Systems[J].IEEE Internet of Things Journal(in press),doi: 10.1109/JIOT.2022.3172936,2022. [17]SAGDUYU Y E,SHI Y,ERPEK T.Adversarial Deep Learning for Over-the-Air Spectrum Poisoning Attacks[J].IEEE Transac-tions on Mobile Computing,2021,20(2):306-319. [18]KIM B,SHI Y,SAGDUYU Y E,et al.Adversarial Attacksagainst Deep Learning Based Power Control in Wireless Communications[C]//2021 IEEE Globecom Workshops(GC Wkshps).2021:1-6. [19]LUO Z P,ZHAO S Q,LU Z,et al.When Attackers Meet AI: Learning-Empowered Attacks in Cooperative Spectrum Sensing[J].IEEE Transactions on Mobile Computing,2022,21(5):1892-1908. [20]ZHONG C,WANG F,GURSOY M C,et al.Adversarial Jamming Attacks on Deep Reinforcement Learning Based Dynamic Multichannel Access[C]// 2020 IEEE Wireless Communications and Networking Conference(WCNC).IEEE,2020:1-6. [21]PAN X N,CHEN Z,LI J Z,et al.A dynamic spectrum access algorithm based on prioritized experience replay deep Q-Learning[J].Telecommunication Engineering,2020,60(5):489-495. [22]XING L,XU Y H,LI G Q,et al.Channel estimation algorithm for MIMO systems based on deep learning[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2022,34(4):685-693. [23]PENG A J,WANG S X,PAN C H,et al.Multi-pair two-way massive MIMO DF relaying over Rician fading channels under imperfect CSI[J].IEEE Wireless Communication Letters,2022,11(2):225-229. [24]ASHCRAFT C,KARRA K.Poisoning Deep ReinforcementLearning Agents with In-Distribution Triggers[J].arXiv: 2106.07798,2021. |
[1] | 黄昱洲, 王立松, 秦小麟. 一种基于深度强化学习的无人小车双层路径规划方法 Bi-level Path Planning Method for Unmanned Vehicle Based on Deep Reinforcement Learning 计算机科学, 2023, 50(1): 194-204. https://doi.org/10.11896/jsjkx.220500241 |
[2] | 徐平安, 刘全. 基于相似度约束的双策略蒸馏深度强化学习方法 Deep Reinforcement Learning Based on Similarity Constrained Dual Policy Distillation 计算机科学, 2023, 50(1): 253-261. https://doi.org/10.11896/jsjkx.211100167 |
[3] | 张启阳, 陈希亮, 张巧. 基于轨迹感知的稀疏奖励探索方法 Sparse Reward Exploration Method Based on Trajectory Perception 计算机科学, 2023, 50(1): 262-269. https://doi.org/10.11896/jsjkx.220700010 |
[4] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[5] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[6] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[7] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[8] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[9] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[10] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[11] | 蔡岳, 王恩良, 孙哲, 孙知信. 基于双重指针网络的车货匹配双重序列决策研究 Study on Dual Sequence Decision-making for Trucks and Cargo Matching Based on Dual Pointer Network 计算机科学, 2022, 49(11A): 210800257-9. https://doi.org/10.11896/jsjkx.210800257 |
[12] | 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084 |
[13] | 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078 |
[14] | 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057 |
[15] | 李贝贝, 宋佳芮, 杜卿芸, 何俊江. DRL-IDS:基于深度强化学习的工业物联网入侵检测系统 DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things 计算机科学, 2021, 48(7): 47-54. https://doi.org/10.11896/jsjkx.210400021 |
|