移动边缘计算中基于深度强化学习的任务卸载研究进展

doi:10.11896/jsjkx.200800095

计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 316-323.doi: 10.11896/jsjkx.200800095

移动边缘计算中基于深度强化学习的任务卸载研究进展

梁俊斌^1,2, 张海涵^1,2, 蒋婵³, 王天舒⁴

1 广西大学计算机与电子信息学院南宁530004
2 广西多媒体通信与网络技术重点实验室南宁530004
3 广西大学行健文理学院南宁530004
4 东软集团(南宁)有限公司南宁530007

收稿日期:2020-08-16 修回日期:2020-12-02 出版日期:2021-07-15 发布日期:2021-07-02
通讯作者: 张海涵(1146795832@qq.com)
基金资助:
国家自然科学基金(61562005);广西重点研发计划项目(桂科AB19259006);广西自然科学基金(2019GXNSFAA185042,2018GXNSFBA281169)

Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing

LIANG Jun-bin^1,2, ZHANG Hai-han^1,2, JIANG Chan³, WANG Tian-shu⁴

1 School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China
2 Guangxi Key Laboratory of Multimedia Communication and Network Technology,Nanning 530004,China
3 XingJian College of Science and Liberal Arts of Guangxi University,Nanning 530004,China
4 Neusoft Group (Nanning) Co.,Ltd,Nanning 530007,China

Received:2020-08-16 Revised:2020-12-02 Online:2021-07-15 Published:2021-07-02
About author:LIANG Jun-bin,born in 1979,Ph.D,professor,Ph.D supervisor.His main research interests include wireless sensor networks,network deployment and optimization.(liangjb2002@163.com)
ZHANG Hai-han,born in 1993,postgraduate.His main research include wireless sensor networks and artificial intelligence.
Supported by:
National Natural Science Foundation of China(61562005),Guangxi Key Research and Development Plan Project(AB19259006) and Natural Science Foundation of Guangxi(2019GXNSFAA185042,2018GXNSFBA281169).

摘要/Abstract

摘要： 移动边缘计算是近年出现的一种新型网络计算模式,它允许将具有较强计算能力和存储性能的服务器节点放置在更加靠近移动设备的网络边缘(如基站附近),让移动设备可以近距离地卸载任务到边缘设备进行处理,从而解决了传统网络由于移动设备的计算和存储能力弱且能量较有限,从而不得不耗费大量时间、能量且不安全地将任务卸载到远方的云平台进行处理的弊端。但是,如何让仅掌握局部有限信息(如邻居数量)的设备根据任务的大小和数量选择卸载任务到本地,还是在无线信道随时间变化的动态网络中选择延迟、能耗均最优的移动边缘计算服务器进行全部或部分的任务卸载,是一个多目标规划问题,求解难度较高。传统的优化技术(如凸优化等)很难获得较好的结果。而深度强化学习是一种将深度学习与强化学习相结合的新型人工智能算法技术,能够对复杂的协作、博弈等问题作出更准确的决策,在工业、农业、商业等多个领域具有广阔的应用前景。近年来,利用深度强化学习来优化移动边缘计算网络中的任务卸载成为一种新的研究趋势。最近三年来,一些研究者对其进行了初步的探索,并达到了比以往单独使用深度学习或强化学习更低的延迟和能耗,但是仍存在很多不足之处。为了进一步推进该领域的研究,文中对近年来国内外的相关工作进行了详细地分析、对比和总结,归纳了它们的优缺点,并对未来可能深入研究的方向进行了讨论。

关键词: 强化学习, 任务卸载, 深度强化学习, 深度学习, 卸载决策, 移动边缘计算

Abstract: Mobile edge computing is a new type of network computing mode that has emerged in recent years.It allows server nodes with strong computing power and storage performance to be placed closer to the edge of the network of mobile devices (such as near base stations ),allowing mobile devices to offload tasks to edge devices for processing closely,thereby alleviates the disadvantages of traditional networks that have to spend a lot of time,energy and unsafely offload tasks to remote cloud platforms for processing due to weak computing and storage capabilities of mobile devices and limited energy.However,how to make a device that only has limited local information (such as the number of neighbors ) chooses to offload tasks to the local site according to the size and number of tasks,or chooses the mobile edge computing server with the optimal delay and energy consumption in a dynamic network where the wireless channel changes with time,to perform all or part of the task offloading,is a multi-objective programming problem and has a high degree of difficulty in solving.It is difficult to obtain better results with traditional optimization techniques(such as convex optimization).Deep reinforcement learning is a new type of artificial intelligence algorithm technology that combines deep learning and reinforcement learning.It can make more accurate decision-making results for complex collaboration,game and other issues.It has broad application prospects in many fields such as industry,agriculture and commerce.In recent years,It has become a new research trend to use deep reinforcement learning method to optimize task offloading in mobile edge computing networks.In the past three years,some researchers have conducted preliminary explorations on it,and achieved lower latency and energy consumption than using deep learning or reinforcement learning alone in the past,but there are still many shortcomings.In order to further advance the research in this field,this paper analyzes,compares and summarizes the domestic and foreign related work in recent years,summarizes their advantages and disadvantages,and discusses the possible future in research directions.

Key words: Deep learning, Deep reinforcement learning, Mobile edge computing, Offloading decision, Reinforcement learning, Task offloading

中图分类号:

TP393

梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展[J]. 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095

LIANG Jun-bin, ZHANG Hai-han, JIANG Chan, WANG Tian-shu. Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing[J]. Computer Science, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095

参考文献

[1]YOU C,HUANG K,CHAE H,et al.Energy-Efficient Resource Allocation for Mobile-Edge Computation Offloading[J].IEEE Transactions on Wireless Communications,2017,16(3):1397-1411.
[2]JEONG S,SIMEONE O,KANG J,et al.Mobile Edge Computing via a UAV-Mounted Cloudlet:Optimization of Bit Allocation and Path Planning[J].IEEE Transactions on Vehicular Technology,2018,67(3):2049-2063.
[3]SARDELLITTI S,SCUTARI G,BARBAROSSA S,et al.Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing[J].IEEE Transactions on Signal and Information Processing Over Networks,2015,1(2):89-103.
[4]CHEN Y,ZHANG N,ZHANG Y,et al.TOFFEE:Task Off-loading and Frequency Scaling for Energy Efficiency of Mobile Devices in Mobile Edge Computing[J].IEEE Transactions on Cloud Computing,2019(99):1-1.
[5]REN J,YU G,CAI Y,et al.Latency Optimization for Resource Allocation in Mobile-Edge Computation Offloading[J].IEEE Transactions on Wireless Communications,2018,17(8):5506-5519.
[6]TALEB T,SAMDANIS K,MADA B,et al.On Multi-Access Edge Computing:A Survey of the Emerging 5G Network Edge Cloud Architecture and Orchestration[J].IEEE Communications Surveys and Tutorials,2017,19(3):1657-1681.
[7]TRAN T X,HAJISAMI A,PANDEY P,et al.CollaborativeMobile Edge Computing in 5G Networks:New Paradigms,Scenarios,and Challenges[J].IEEE Communications Magazine,2017,55(4):54-61.
[8]PAPADIMITRIOU C H,TSITSIKLIS J N.The complexity of Markov decision processes[J].Mathematics of Operations Research,1987,12(3):441-450.
[9]CHEN Y,ZHANG N,ZHANG Y,et al.Energy efficient dynamic offloading in mobile edge computing for internet of things[J/OL].IEEE Transactions on Cloud Computing,2019.http://www.semanticscholar.org/paper/Energy-Efficient-Dynamic-Offloading-in-Mobile-Edge-Chen-Zhang/fbe4cb7777cdd2485d1e5fb0072c896b045027fc.
[10]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[11]ARULKUMARAN K,DEISENROTH M P,BRUNDAGE M,et al.Deep Reinforcement Learning:A Brief Survey[J].IEEE Signal Processing Magazine,2017,34(6):26-38.
[12]HE Y,ZHAO N,YIN H,et al.Integrated Networking,Cac-hing,and Computing for Connected Vehicles:A Deep Reinforcement Learning Approach[J].IEEE Transactions on Vehicular Technology,2018,67(1):44-55.
[13]WANG C,LIANG C,YU F R,et al.Computation Offloadingand Resource Allocation in Wireless Cellular Networks With Mobile Edge Computing[J].IEEE Transactions on Wireless Communications,2017,16(8):4924-4938.
[14]ZHOU Y,YU F R,CHEN J,et al.Resource Allocation for Information-Centric Virtualized Heterogeneous Networks With In-Network Caching and Mobile Edge Computing[J].IEEE Transactions on Vehicular Technology,2017,66(12):11339-11351.
[15]YOU C,HUANG K,CHAE H,et al.Energy-Efficient Resource Allocation for Mobile-Edge Computation Offloading[J].IEEE Transactions on Wireless Communications,2017,16(3):1397-1411.
[16]LYU X,TIAN H,NI W,et al.Energy-Efficient Admission of Delay-Sensitive Tasks for Mobile Edge Computing[J].IEEE Transactions on Communications,2018,66(6):2603-2616.
[17]ZHAO P,TIAN H,FAN S,et al.Information Prediction andDynamic Programming-Based RAN Slicing for Mobile Edge Computing[J].IEEE Wireless Communications Letters,2018,7(4):614-617.
[18]LI J,GAO H,LYU T,et al.Deep reinforcement learning based computation offloading and resource allocation for MEC[C]//Wireless Communications and Networking Conference.2018:1-6.
[19]HUANG L,BI S,ZHANG Y A,et al.Deep ReinforcementLearning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks[J／OL].http://arXiv.org/abs/1808.01977v6.
[20]CHEN X,ZHANG H,WU C,et al.Optimized Computation Offloading Performance in Virtual Edge Computing Systems Via Deep Reinforcement Learning[J].IEEE Internet of Things Journal,2019,6(3):4005-4018.
[21]HUANG L,FENG X,ZHANG C,et al.Deep reinforcementlearning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing[J].Digital Communications and Networks,2019,5(1):10-17.
[22]YAO P,CHEN X,CHEN Y,et al.Deep reinforcement learning based offloading scheme for mobile edge computing[C]//2019 IEEE International Conference on Smart Internet of Things (SmartIoT).IEEE,2019:417-421.
[23]HE Y,YU F R,ZHAO N,et al.Software-Defined Networkswith Mobile Edge Computing and Caching for Smart Cities:A Big Data Deep Reinforcement Learning Approach[J].IEEE Communications Magazine,2017,55(12):31-37.
[24]CHEN M,LIANG B,DONG M,et al.Joint offloading decision and resource allocation for multi-user multi-task mobile cloud[C]//International Conference on Communications.2016:1-6.
[25]VAN HASSELT H,GUEZ A,SILVER D,et al.Deep reinforcement learning with double Q-Learning[C]//National Conference On Artificial Intelligence.2016:2094-2100.
[26]MIN M,XIAO L,CHEN Y,et al.Learning-Based Computation Offloading for IoT Devices With Energy Harvesting[J].IEEE Transactions on Vehicular Technology,2019,68(2):1930-1941.
[27]NING Z,DONG P,WANG X,et al.Deep reinforcement learning for vehicular edge computing: An intelligent offloading system[J].ACM Transactions on Intelligent Systems and Technology (TIST),2019,10(6):1-24.
[28]LU H,GU C,LUO F,et al.Optimization of lightweight task offloading strategy for mobile edge computing based on deep reinforcement learning[J].Future Generation Computer Systems,2020,102:847-861.
[29]PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEETransactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[30]SRIVASTAVA N,HINTON G E,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[31]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[32]GUPTA H,DASTJERDI A V,GHOSH S K,et al.iFogSim:A toolkit for modeling and simulation of resource management techniques in the Internet of Things,Edge and Fog computing environments[J].Software-Practice and Experience,2017,47(9):1275-1296.
[33]HUANG B,LI Y,LI Z,et al.Security and Cost-Aware Computation Offloading via Deep Reinforcement Learning in Mobile Edge Computing[J].Wireless Communications and Mobile Computing,2019(2019):1-20.
[34]ZHANG K,ZHU Y,LENG S,et al.Deep Learning Empowered Task Offloading for Mobile Edge Computing in Urban Infor-matics[J].IEEE Internet of Things Journal,2019,6(5):7635-7647.
[35]XU X,ZHANG X,GAO H,et al.BeCome:Blockchain-Enabled Computation Offloading for IoT in Mobile Edge Computing[J].IEEE Transactions on Industrial Informatics,2020,16(6):4187-4195.
[36]MAURICE N,PHAM Q V,HWANG W J.Online Computation Offloading in NOMA-based Multi-Access Edge Computing:A Deep Reinforcement Learning Approach[J].IEEE Access,2020(99):1-1.
[37]ALFAKIH T,HASSAN M M,GUMAEI A,et al.Task Off-loading and Resource Allocation for Mobile Edge Computing by Deep Reinforcement Learning Based on SARSA[J].IEEE Access,2020:8:54074-54084.
[38]ZHANG H,WU W,WANG C,et al.Deep ReinforcementLearning-Based Offloading Decision Optimization in Mobile Edge Computing[C]//Wireless Communications and Networking Conference.2019:1-7.
[39]LIU Y,CUI Q,ZHANG J,et al.An Actor-Critic Deep Rein-forcement Learning Based Computation Offloading for Three-Tier Mobile Computing Networks[C]//International Confe-rence on Wireless Communications and Signal Processing.2019:1-6.
[40]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]// International Conference on Machine Learning.2016:1928-1937.
[41]ZHAN W,LUO C,WANG J,et al.Deep Reinforcement Lear-ning-Based Offloading Scheduling for Vehicular Edge Computing[J].IEEE Internet of Things Journal,2020:7(6):5449-5465.
[42]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.ProximalPolicy Optimization Algorithms[J].arXiv:1707.06347,2017.
[43]ZHANG T,CHIANG Y,BORCEA C,et al.Learning-Based Offloading of Tasks with Diverse Delay Sensitivities for Mobile Edge Computing[C]//Global Communications Conference.2019.
[44]FENG J,YU F R,PEI Q,et al.Cooperative Computation Offloading and Resource Allocation for Blockchain-Enabled Mobile Edge Computing:A Deep Reinforcement Learning Approach[J].IEEE Internet of Things Journal,2019:1-1.
[45]XIONG Z,ZHANG Y,NIYATO D,et al.When Mobile Blockchain Meets Edge Computing[J].IEEE Communications Magazine,2018,56(8):33-39.
[46]LILLICRAP T,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[C]//InternationalConfe-rence on Learning Representations.2016.
[47]CHEN Z,WANG X D.Decentralized Computation Offloadingfor Multi-User Mobile Edge Computing:A Deep Reinforcement Learning Approach[J].arXiv:1812.07394,2018.
[48]SILVER D,LEVER G,HEESS N,et al.Deterministic PolicyGradient Algorithms[C]//International Conference on Machine Learning.2014:387-395.
[49]QIU X,LIU L,CHEN W,et al.Online Deep ReinforcementLearning for Computation Offloading in Blockchain-Empowered Mobile Edge Computing[J].IEEE Transactions on Vehicular Technology,2019,68(8):8050-8062.
[50]SRINIVAS M,PATNAIK L M.Adaptive probabilities of cros-sover and mutation in genetic algorithms[J].IEEE Transactions on Systems,Man,and Cybernetics,2002,24(4):656-667.
[51]SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized Experience Replay[J/OL].http://arXiv:org/bas/1511.05952v4.
[52]HE X M,LU H D,HUANG H W,et al.QoE-Based Cooperative Task Offloading with Deep Reinforcement Learning in Mobile Edge Networks[J].IEEE Wireless Communications,2020,27(3):111-117.
[53]VAN HUYNH N,HOANG D T,NGUYEN D N,et al.Optimal and Fast Real-Time Resource Slicing With Deep Dueling Neural Networks[J].IEEE Journal on Selected Areas in Communications,2019,37(6):1455-1470.
[54]REN Y L,YU X M,CHEN X Y,et al.Vehicular Network Edge Intelligent Management:A Deep Deterministic Policy Gradient Approach for Service Offloading Decision[C]//IWCMC.2020:905-910.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

移动边缘计算中基于深度强化学习的任务卸载研究进展

Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0

[1]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[3]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[4]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[5]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[6]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[7]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[8]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[9]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[10]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[11]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[12]	胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[13]	程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[14]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[15]	周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044