计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 282-286.doi: 10.11896/jsjkx.200100135

• 信息安全 • 上一篇    下一篇

一种基于强化学习的嵌入式系统抗拒绝服务攻击的缓存调度方案

黄锦灏1, 丁钰真1, 肖亮1, 沈志荣1, 朱珍民2   

  1. 1 厦门大学信息学院 厦门361005
    2 中国科学院大学计算技术研究所 北京100190
  • 收稿日期:2020-01-21 出版日期:2020-07-15 发布日期:2020-07-16
  • 通讯作者: 肖亮(lxiao@xmu.edu.cn)
  • 作者简介:506109624@qq.com
  • 基金资助:
    国家自然科学基金(61971366, 61671396)

Reinforcement Learning Based Cache Scheduling Against Denial-of-Service Attacks in Embedded Systems

HUANG Jin-hao1, DING Yu-zhen1, XIAO Liang1, SHEN Zhi-rong1, ZHU Zhen-min2   

  1. 1 School of Informatics,Xiamen University,Xiamen 361005,China
    2 Institute of Computing technology,Chinese Academy of Sciences University,Beijing 100190,China
  • Received:2020-01-21 Online:2020-07-15 Published:2020-07-16
  • About author:HUANG Jin-hao,born in 1996,postgraduate.His main research interests include network security and wireless communication.
    XIAO Liang,born in 1980,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include network security and wireless communication.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61971366,61671396)

摘要: 在多核嵌入式操作系统中,中央处理器对共享最后一级缓存(Last Level Cache,LLC)的资源调度决定了各用户进程的指令周期数(Instructions Per Cycle,IPC),以及对拒绝服务(Denial-of-Service,DoS)攻击的鲁棒性。但是,现有缓存调度方案依赖于具体的LLC调度模型和DoS攻击模型,使中央处理器难以在不同调度环境中的每个调度周期及时获得用户进程的运行信息。因此,文中提出一种基于强化学习的嵌入式系统LLC调度技术,以抵御拒绝服务攻击。该技术根据用户进程的LLC占用起始位置和终止位置,结合反馈的指令周期数、载入未命中率和存储未命中率等信息,优化LLC的占用位置和占用空间。在动态LLC调度环境下,中央处理器不需要预知DoS攻击模型,即可提高指令周期数并同时降低恶意进程的DoS攻击成功率。在多租户虚拟机共同参与的多核嵌入式操作系统中的仿真结果表明,所提技术可以显著提高指令周期数并降低DoS攻击的成功率。

关键词: DoS攻击, 缓存调度, 嵌入式系统, 强化学习

Abstract: The sharing last level cache (LLC) scheduling of the central processor determines the instructions per cycle (IPC) of the user processes and the robustness of denial-of-service (DoS) attacks in the multicore embedded operating systems.However,existing scheduling schemes rely on the specific LLC scheduling model and DoS attack model,which makes it difficult for the processor to obtain the running information of the user processes in each scheduling cycle under different scheduling environments.Therefore,this paper proposes a reinforcement learning (RL) based LLC scheduling scheme to against DoS attacks in embedded systems,which optimizes the occupied position and the occupied space based on the measured occupied start and end positions,the previous IPC,load miss rate and store miss rate.The processor can jointly increase the IPC and reduce the success rate of the DoS attack from the malicious process without knowing the DoS attack model in the dynamic LLC scheduling environment.Simulations are implemented on the multicore embedded operating systems where multitenant virtual machines participate together,which show that the proposed scheme can significantly increase the IPC and reduce the success rate of the DoS attack.

Key words: DoS attack, Embedded systems, LLC scheduling, Reinforcement learning

中图分类号: 

  • TP316
[1]DETTI A,BRACCIALE L,LORETI P,et al.Modeling LRUcache with invalidation [J].Computer Networks,2018,134(7):55-65.
[2]UDIPI A N,MURALIMANOHAR N,BALSUBRAMONIANR,et al.Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems[C]//International Symposium on Computer Architecture.ACM,2011:425-436.
[3]BROCK J,YE C,DING C,et al.Optimal cache partition-sharing[C]//International Conference on Parallel Processing.IEEE,2015:749-758.
[4]CHANG J,SOHI G S.Cooperative cache partitioning for chipmultiprocessors[C]//International Conference on Supercompu-ting.ACM,2014:402-412.
[5]HERDRICH A,VERPLANKE E,AUTEE P,et al.Cache QoS:From concept to reality in the Intel® Xeon® processor E5-2600 v3 product family[C]//International Symposium on High Performance Computer Architecture.IEEE,2016:657-668.
[6]OTOOM M,JALEEL A,TRANCOSO P.Using personalitymetrics to improve cache interference management in multicore processors[C]//ACM International Conference on Computing Frontiers.ACM,2017:251-254.
[7]KASTURE H,SANCHEZ D.Ubik:efficient cache sharing with strict qos for latency-critical workloads[C]// Architectural Support for Programming Languages and Operating Systems.ACM,2014:729-742.
[8]GRUSS D,MAURICE C,WAGNER K,et al.Flush+ Flush:a fast and stealthy cache attack[C]//Detection of Intrusions and Malware,and Vulnerability Assessment.Berlin:Springer,2016:279-299.
[9]BECHTEL M,YUN H.Denial-of-service attacks on sharedcache in multicore:Analysis and prevention[C]//Real-Time and Embedded Technology and Applications Symposium.IEEE,2019:357-367.
[10]XIANG Y,WANG X,HUANG Z,et al.DCAPS:dynamic cache allocation with partial sharing[C]//EuroSys.ACM,2018:13.
[11]XU C,RAJAMANI K,FERREIRA A,et al.dCat:dynamic cachemanagement for efficient,performance-sensitive infrastructure-as-a-service[C]//EuroSys.ACM,2018:14.
[12]MUTLU T M O.Memory performance attacks:Denial of memory service in multi-core systems[C]//USENIX Security Symposium.USENIX,2007:18.
[13]KIM Y,PAPAMICHAEL M,MUTLU O,et al.Thread cluster memory scheduling:Exploiting differences in memory access behavior[C]//IEEE/ACM International Symposium on Microarchitecture.ACM,2010:65-76.
[14]KERAMIDAS G,PETOUMENOS P,KAXIRAS S,et al.Pre-venting denial-of-service attacks in shared CMP caches[C]//International Workshop on Embedded Computer Systems.Berlin:Springer,2006:359-372.
[15]WATKINS C J C H,DAYAN P.Q-learning [J].Machine Lear-ning,1992,8(3-4):279-292.
[16]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529.
[17]YE R,XU Q.Learning-based power management for multicore processors via idle period manipulation [J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2014,33(7):1043-1055.
[18]JAIN R,PANDA P R,SUBRAMONEY S.A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning[C]//Design,Automation & Test in Europe Confe-rence & Exhibition.IEEE,2017:800-805.
[19]MIN M,WAN X,XIAO L,et al.Learning-based privacy-aware offloading for healthcare IoT with energy harvesting [J].IEEE Internet of Things Journal,2018,6(3):4307-4316.
[20]MIN M,XIAO L,XIE C,et al.Defense against advanced persistent threats in dynamic cloud storage:A colonel blotto game approach [J].IEEE Internet of Things Journal,2018,5(6):4250-4261.
[21]HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//IEEE International Conference on Computer Vision.IEEE,2015:1026-1034.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[4] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[5] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[8] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[9] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[10] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[11] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[12] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[13] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[14] 周琴, 罗飞, 丁炜超, 顾春华, 郑帅.
基于逐次超松弛技术的Double Speedy Q-Learning算法
Double Speedy Q-Learning Based on Successive Over Relaxation
计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[15] 李素, 宋宝燕, 李冬, 王俊陆.
面向金融活动的复合区块链关联事件溯源方法
Composite Blockchain Associated Event Tracing Method for Financial Activities
计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!