计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 47-54.doi: 10.11896/jsjkx.210400021

所属专题: 人工智能安全

• 人工智能安全* • 上一篇    下一篇

DRL-IDS:基于深度强化学习的工业物联网入侵检测系统

李贝贝, 宋佳芮, 杜卿芸, 何俊江   

  1. 四川大学网络空间安全学院 成都610041
  • 收稿日期:2021-03-31 修回日期:2021-04-28 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 何俊江(hejunjiang@stu.scu.edu.cn)
  • 基金资助:
    国家重点研发计划项目(2020YFB1805400);国家自然科学基金(U19A2068,62002248);中国博士后科学基金(2019TQ0217,2020M673277);四川省重点研发项目(20ZDYF3145); 中央高校基本科研业务经费(YJ201933)

DRL-IDS:Deep Reinforcement Learning Based Intrusion Detection System for Industrial Internet of Things

LI Bei-bei, SONG Jia-rui, DU Qing-yun, HE Jun-jiang   

  1. School of Cyber Science and Engineering,Sichuan University,Chengdu 610041,China
  • Received:2021-03-31 Revised:2021-04-28 Online:2021-07-15 Published:2021-07-02
  • About author:LI Bei-bei,born in 1992,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include cyber-physical system security,industrial control system security,big data & privacy preservation,and applied cryptography.(libeibei@scu.edu.cn)
    HE Jun-jiang,born in 1993,Ph.D,assistant professor.His main research inte-rests include cyber security,artificial immune system,data mining,machine learning,and evolutionary computing.
  • Supported by:
    National Key Research and Development Program of China(2020YFB1805400), National Natural Science Foundation of China(U19A2068,62002248),China Postdoctoral Science Foundation(2019TQ0217,2020M673277),Provincial Key Research and Development Program of Sichuan(20ZDYF3145) and Fundamental Research Funds for the Central Universities(YJ201933).

摘要: 近年来,工业物联网迅猛发展,在实现工业数字化、自动化、智能化的同时也带来了大量的网络威胁,且复杂、多样的工业物联网环境为网络入侵者创造了全新的攻击面。传统的入侵检测技术已无法满足当前工业物联网环境下的网络威胁发现需求。对此,文中提出了一种基于深度强化学习算法近端策略优化(Proximal Policy Optimization 2.0,PPO2)的工业物联网入侵检测系统。该系统将深度学习的感知能力和强化学习的决策能力相结合,以实现对工业物联网多种类型网络攻击的有效检测。首先,运用基于LightGBM的特征选择算法筛选出工业物联网数据中最有效的特征集合;然后,结合深度学习算法将多层感知器网络的隐藏层作为PPO2算法中的价值网络和策略网络的共享网络结构;最后,基于PPO2算法构建入侵检测模型,并使用ReLU(Rectified Linear Unit)进行分类输出。在美国能源部橡树岭国家实验室公开发布的工业物联网真实数据集上开展的大量实验表明,所提出的入侵检测系统在检测对工业物联网的多种类型网络攻击时,获得了99.09%的准确率,且在准确率、精密度、召回率、F1评分等指标上均优于目前基于LSTM,CNN,RNN等深度学习模型和DDQN,DQN等深度强化学习模型的入侵检测系统。

关键词: PPO2算法, 工业物联网, 入侵检测系统, 深度强化学习, 网络安全

Abstract: In recent years,the Industrial Internet of Things (IIoT) has developed rapidly.While realizing industrial digitization,automation,and intelligence,the IIoT has introduced tremendous cyber threats.Further,the complex,heterogeneous,and distributed IIoT environment has created a brand-new attack surface for cyber intruders.Traditional intrusion detection techniques no longer fulfill the needs of intrusion detection for the current IIoT environment.This paper proposes a deep reinforcement learning algorithm (i.e.,Proximal Policy Optimization 2.0,PPO2) based intrusion detection system for the IIoT.The proposed intrusion detection system combines the perceptual ability of deep learning with the decision-making ability of reinforcement learning,which can effectively detect multiple types of cyber attacks for the IIoT.First,a LightGBM-based feature selection algorithm is used to filter the most effective feature sets in IIoT data.Then,the hidden layer of the multilayer perceptron network is used as the shared network structure of the value network and policy network in the PPO2 algorithm.At last,the PPO2 algorithm is used to construct the intrusion detection model and ReLU (Rectified Linear Unit) is employed for classification output.Extensive experiments conducted on a real IIoT dataset released by the Oak Ridge National Laboratory,sponsored by the U.S.Department of Energy,show that the proposed intrusion detection system achieves 99.09% accuracy in detecting multiple types of network attacks for the IIoT,and it outperforms state-of-the-art deep learning models (e.g.,LSTM,CNN,RNN) based and deep reinforcement learning models (e.g.,DDQN and DQN) based intrusion detection systems,in terms of the accuracy,precision,recall,and F1 score.

Key words: Cyber security, Deep reinforcement learning, Industrial internet of things, Intrusion detection system, PPO2 algorithm

中图分类号: 

  • TP393
[1]ZHOU W G. Analysis of Hidden Dangers of Industrial Internet of Things and Exploration of Protection Strategies[J].Electro-nics World,2019(21):13-18.
[2]LING M H,YAU K L A,QADIR J,et al.Application of reinforcement learning for security enhancement in cognitive radio networks[J].Applied Soft Computing,2015,37:809-829.
[3]LU X,XIAO L,XU T,et al.Reinforcement Learning BasedPHY Authentication for VANETs[J].IEEE Transactions on Vehicular Technology,2020,69(3):3068-3079.
[4]LOPEZ-MARTIN M,CARRO B,SANCHEZ-ESGUEVILLASA.Application of deep reinforcement learning to intrusion detection for supervised problems[J].Expert Systems with Applications,2020,141:112963.
[5]HSU Y F,MATSUOKA M.A Deep Reinforcement LearningApproach for Anomaly Network Intrusion Detection System[C]//2020 IEEE 9th International Conference on Cloud Networking (CloudNet).2020:1-6.
[6]PENG A N,ZHOU W,JIA Y,et al. Overview of Research on Security of Internet of Things Operating System[J]. Journal on Communications,2018,39(3):22-34.
[7]AL-HAWAWREH M,MOUSTAFA N,SITNIKOVA E.Identification of malicious activities in industrial internet of things based on deep learning models[J].Journal of Information Secu-rity and Applications,2018,41:1-11.
[8]ROY B,CHEUNG H.A Deep Learning Approach for Intrusion Detection in Internet of Things using Bi-Directional Long Short-Term Memory Recurrent Neural Network[C]//28th International Telecommunication Networks and Applications Confe-rence (ITNAC).2018:1-6.
[9]YANG H,CHENG L,CHUAH M C.Deep-Learning-BasedNetwork Intrusion Detection for SCADA Systems[C]//2019 IEEE Conference on Communications and Network Security (CNS).Washington,DC,USA:IEEE,2019:3-5.
[10]ISMAIL M,SHAABAN M,NAIDU M,et al.Deep LearningDetection of Electricity Theft Cyber-Attacks in Renewable Distributed Generation[C]//IEEE Transactions on Smart Grid,2020:3428-3431.
[11]LI B,WU Y,SONG J,et al.DeepFed:Federated Deep Learning for Intrusion Detection in Industrial Cyber-Physical Systems[J].IEEE Transactions on Industrial Informatics,2021,17(8):5615-5624.
[12]KURT M N,OGUNDIJO O,LI C,et al.Online Cyber-Attack Detection in Smart Grid:A Reinforcement Learning Approach[J].IEEE Transactions on Smart Grid,2019,10(5):5174-5185.
[13]SETHI K,EDUPUGANTI S,KUMAR R,et al.A context-aware robust intrusion detection system:a reinforcement learning-based approach[J].International Journal of Information Security,2020,19:657-678.
[14]OTOUM S,KANTARCI B,MOUFTAH H.Empowering Reinforcement Learning on Big Sensed Data for Intrusion Detection[C]//2019 IEEE International Conference on Communications(ICC 2019).2019:1-7.
[15]CAMINERO G,LOPEZ-MARTIN M,CARRO B.Adversarialenvironment reinforcement learning algorithm for intrusion detection[J].Computer Networks,2019,159:96-109.
[16]SONG J,LI B,WU Y,et al.ReAL:A New ResNet-ALSTM Based Intrusion Detection System for the Internet of Energy[C]//2020 IEEE 45th Conference on Local Computer Networks (LCN).2020:491-496.
[17]NAHLER G.Pearson correlation coefficient[J].Dictionary of Pharmaceutical Medicine,2009,1025:132-132.
[18]WANG H,CHEN H Y,LIU S F.Intrusion Detection SystemBased on Improved Naive Bayes Algorithm[J].Computer Scien-ce,2014,41(4):111-115,119.
[19]WU Y,MANSIMOV E,LIAO S.Scalable Trust-Region Method for Deep Reinforcement Learning Using Kronecker-Factored Approximation[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.California:Curran Associates Inc,2017:5285-5294.
[20]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous Me-thods for Deep Reinforcement Learning[C]//International Conference on Machine Learning(PMLR 2016).2016:1928-1937.
[21]SCHULMAN J,WOLSKI F,DHARIWAL P.Proximal Policy Optimization Algorithms[EB/OL].http://arxiv.org/abs/1707.06347.
[22]HILL A.Stable-baselines[EB/OL].(2021).https://stablebase-lines.readthedocs.io/en/master/.
[23]MORRIS T,GAO W.Industrial Control System Traffic DataSets for Intrusion Detection Research[C]//International Conference on Critical Infrastructure Protection.Berlin,Heidelberg:Springer,2014:65-78.
[24]VAN HASSELT H,GUEZ A,SILVER D.Deep Reinforcement Learning with Double Q-learning[EB/OL].http://arxiv.org/abs/1509.06461v2.
[25]MIRZA A,COSAN S.Computer network intrusion detectionusing sequential LSTM Neural Networks autoencoders[C]//2018 26th Signal Processing and Communications Applications Conference (SIU).Izmir,Turkey:IEEE,2018:2-5.
[26]MELIBOYEV A,ALIKHANOV J,KIM W.1D CNN BasedNetwork Intrusion Detection with Normalization on Imbalanced Data[EB/OL].http://arxiv.org/abs/2003.00476v2.
[27]YIN C L,ZHU Y F,FEI J L,et al.A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks[J].IEEE Access,2017,5:21954-21961.
[1] 柳杰灵, 凌晓波, 张蕾, 王博, 王之梁, 李子木, 张辉, 杨家海, 吴程楠.
基于战术关联的网络安全风险评估框架
Network Security Risk Assessment Framework Based on Tactical Correlation
计算机科学, 2022, 49(9): 306-311. https://doi.org/10.11896/jsjkx.210600171
[2] 王磊, 李晓宇.
基于随机洋葱路由的LBS移动隐私保护方案
LBS Mobile Privacy Protection Scheme Based on Random Onion Routing
计算机科学, 2022, 49(9): 347-354. https://doi.org/10.11896/jsjkx.210800077
[3] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[4] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[5] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[6] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[7] 陶礼靖, 邱菡, 朱俊虎, 李航天.
面向网络安全训练评估的受训者行为描述模型
Model for the Description of Trainee Behavior for Cyber Security Exercises Assessment
计算机科学, 2022, 49(6A): 480-484. https://doi.org/10.11896/jsjkx.210800048
[8] 杜鸿毅, 杨华, 刘艳红, 杨鸿鹏.
基于网络媒体的非线性动力学信息传播模型
Nonlinear Dynamics Information Dissemination Model Based on Network Media
计算机科学, 2022, 49(6A): 280-284. https://doi.org/10.11896/jsjkx.210500043
[9] 吕鹏鹏, 王少影, 周文芳, 连阳阳, 高丽芳.
基于进化神经网络的电力信息网安全态势量化方法
Quantitative Method of Power Information Network Security Situation Based on Evolutionary Neural Network
计算机科学, 2022, 49(6A): 588-593. https://doi.org/10.11896/jsjkx.210200151
[10] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[11] 魏辉, 陈泽茂, 张立强.
一种基于顺序和频率模式的系统调用轨迹异常检测框架
Anomaly Detection Framework of System Call Trace Based on Sequence and Frequency Patterns
计算机科学, 2022, 49(6): 350-355. https://doi.org/10.11896/jsjkx.210500031
[12] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[13] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[14] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[15] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!