计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 333-339.doi: 10.11896/jsjkx.201100154

• 计算机网络 • 上一篇    下一篇

能量收集无线通信系统中基于强化学习的能量分配策略

王英恺, 王青山   

  1. 合肥工业大学数学学院 合肥230001
  • 收稿日期:2020-11-23 修回日期:2021-02-09 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 王青山(qswang@hfut.edu.cn)
  • 基金资助:
    国家自然科学基金(61571179)

Reinforcement Learning Based Energy Allocation Strategy for Multi-access Wireless Communications with Energy Harvesting

WANG Ying-kai, WANG Qing-shan   

  1. School of Mathematics,Hefei University of Technology,Hefei 230001,China
  • Received:2020-11-23 Revised:2021-02-09 Online:2021-07-15 Published:2021-07-02
  • About author:WANG Ying-kai,born in 1996,postgraduate.His main research interests include reinforcement learning and wireless communication.(2019111237@mail.hfut.edu.cn)
    WANG Qing-shan,born in 1973,Ph.D supervisor,is a member of China Computer Federation.His main research interests include edge computing and gesture recognition.
  • Supported by:
    National Natural Science Foundation of China(61571179).

摘要: 随着物联网的普及,对物联网终端设备可使用能量的要求也在提高。能量收集技术拥有广阔前景,其能通过产生可再生能量来解决设备能量短缺问题。考虑到未知环境中可再生能量的不确定性,物联网终端设备需要合理有效的能量分配策略来保证系统持续稳定工作。文中提出了一种基于DQN的深度强化学习能量分配策略,该策略通过DQN算法直接与未知环境交互来逼近目标最优能量分配策略,而不依赖于环境的先验知识。在此基础上,还基于强化学习的特点和系统的非时变系统特征,提出了一种预训练算法来优化该策略的初始化状态和学习速率。在不同的信道数据条件下进行仿真对比实验,结果显示提出的能量分配策略在不同信道条件下均有好于现有策略的性能,且兼具很强的变场景学习能力。

关键词: 马尔可夫决策过程, 能量管理, 能量收集, 深度强化学习, 无线通信

Abstract: Due to the increasing popularization of the Internet of Things (IoT),the requirements for the power that can be used by the terminal equipment of the IoT are also constantly improving.Energy harvesting technology is a promising solution to overcome equipment energy shortages by generating renewable energy.Considering the uncertainty of renewable energy in the unknown environment,the terminal equipment of the IoT needs a reasonable and effective energy allocation strategy to ensure the continuous and stable operation of the system.In this paper,a DQN-based deep reinforcement learning energy allocation strategy is proposed,which uses DQN algorithm to directly interact with the unknown environment to approach the optimal energy allocation strategy without relying on the prior knowledge of the environment.Moreover,a pre-training algorithm is proposed to optimize the initialization state and learning rate of the strategy based on the characteristics of reinforcement learning and time-inva-riant system.The simulation results under different channel data conditions show that the energy allocation strategy proposed in this paper has better performance than the existing strategy under different channel conditions,and has strong variable scene learning ability.

Key words: Deep reinforcement learning, Energy harvesting, Markov decision process, Resource allocation, Wireless communication

中图分类号: 

  • TP391
[1]YANG Z,XU W,PAN Y,et al.Energy Efficient Resource Allocation in Machine-to-Machine Communications with Multiple Access and Energy Harvesting for IoT[J].IEEE Internet of Things Journal,2017,5(1):229-245.
[2]ULUKUS S,AYLIN Y,ELZA E,et al.Energy HarvestingWireless Communications:A Review of Recent Advances[J].IEEE Journal on Selected Areas in Communications,2015,33(3):360-381.
[3]OZEL O,TUTUNCUOGLU K,ULUKUS S,et al.Fundamental Limits of Energy Harvesting Communications[J].IEEE Communications Magazine,2015,53(4):126-132.
[4]ZHANG L L,XIONG K,ZHANG Y.UAV-assisted WirelessEnergy Harvesting Fog Computing Network Optimization Method[J].Journal of Software,2019,30(1):9-17.
[5]ABU B,JOSIAH H.Making sense of intermittent energy harvesting[C]//Conference on Embedded Networked Sensor Systems.ACM,2018:32-37.
[6]THUC T K,HOSSAIN E,TABASSUM H.Downlink PowerControl in Two-Tier Cellular Networks with Energy-Harvesting Small Cells as Stochastic Games[J].IEEE Transactions on Communications,2015,63(12):5267-5282.
[7]KU M L,LI W,CHEN Y,et al.Advances in Energy Harvesting Communications:Past,Present,and Future Challenges[J].IEEE Communications Surveys & Tutorials,2017,18(2):1384-1412.
[8]YANG J,ULUKUS S.Optimal Packet Scheduling in an Energy Harvesting Communication System[J].IEEE Transactions on Communications,2010,60(1):220-230.
[9]TUTUNCUOGLU K,YENER A.Optimum Transmission Policies for Battery Limited Energy Harvesting Nodes[J].IEEE Transactions on Wireless Communications,2010,11(3):1180-1189.
[10]YUAN F,ZHANG Q T,JIN S,et al.Optimal Harvest-Use-Store Strategy for Energy Harvesting Wireless Systems[J].IEEE Transactions on Wireless Communications,2015,14(2):698-710.
[11]CHI K K,XU X C,WEI X C.Minimal Base Stations Deploy-ment Scheme Satisfying Node Throughput Requirement in Radio Frequency Energy Harvesting Wireless Sensor Networks[J].Computer Science,2018,45(S1):332-336.
[12]TIAN X Z,YAO C,ZHAO C,et al.5G Network oriented Mobile Edge Computation Offloading Strategy[J].Computer Scien-ce,2020,47(S2):286-290.
[13]BLASCO P,GUNDUZ D,DOHLER M.A Learning Theoretic Approach to Energy Harvesting Communication System Optimization[J].IEEE Transactions on Wireless Communications,2013,12(4):1872-1882.
[14]OZEL O,TUTUNCUOGLU K,YANG J,et al.Transmissionwith Energy Harvesting Nodes in Fading Wireless Channels:Optimal Policies[J].IEEE Journal on Selected Areas in Communications,2011,29(8):1732-1743.
[15]AMIRNAVAEI F,DONG M.Online Power Control Optimization for Wireless Transmission with Energy Harvesting and Storage[J].IEEE Transactions on Wireless Communications,2016,15(7):4888-4901.
[16]SUTTON R,BARTO A.Reinforcement Learning:An Introduction[M].MIT Press,1998.
[17]CHU M,LI H,LIAO X,et al.Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems[J].IEEE Internet of Things Journal,2019,6(2):2009-2020.
[18]FRANCESCO F,BHARATHAN B,RAJESH G.Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning[C]//Conference on Embedded Networked Sensor Systems.ACM,2018:7-13.
[19]JIA Z G,WANG Z P,HU J T.Work-in-Progress:Q-Learning Based Routing for Transiently Powered Wireless Sensor Network[C]//International Conference on Hardware/Software Codesign and System Synthesis.ACM,2019:1-2.
[20]WEI Y,YU F R,SONG M,et al.User Scheduling and Resource Allocation in HetNets with Hybrid Energy Supply:An Actor-Critic Reinforcement Learning Approach[J].IEEE Transactions on Wireless Communications,2018,17(1):680-692.
[21]AOUDIA F A,GAUTIER M,BERDER O.RLMan:An Energy Manager based on Reinforcement Learning for Energy Harvesting Wireless Sensor Networks[J].IEEE Transactions on Green Communications & Networking,2018,2(2):408-417.
[22]CHU M,LI H,LIAO X,et al.Power Control in Energy Harvesting Multiple Access System with Reinforcement Learning[J].IEEE Internet of Things Journal,2019,6(5):9175-9186.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[3] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[4] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[5] 王思明, 谭北海, 余荣.
面向6G可信可靠智能的区块链分片与激励机制
Blockchain Sharding and Incentive Mechanism for 6G Dependable Intelligence
计算机科学, 2022, 49(6): 32-38. https://doi.org/10.11896/jsjkx.220400004
[6] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[7] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[8] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[9] 周琴, 罗飞, 丁炜超, 顾春华, 郑帅.
基于逐次超松弛技术的Double Speedy Q-Learning算法
Double Speedy Q-Learning Based on Successive Over Relaxation
计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[10] 代珊珊, 刘全.
基于动作约束深度强化学习的安全自动驾驶方法
Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method
计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084
[11] 张帆, 宫傲宇, 邓磊, 刘芳, 林艳, 张一晋.
面向实际信道观测环境的时限约束无线下行调度策略
Wireless Downlink Scheduling with Deadline Constraint for Realistic Channel Observation Environment
计算机科学, 2021, 48(9): 264-270. https://doi.org/10.11896/jsjkx.210100143
[12] 成昭炜, 沈航, 汪悦, 王敏, 白光伟.
基于深度强化学习的无人机辅助弹性视频多播机制
Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast
计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078
[13] 梁俊斌, 张海涵, 蒋婵, 王天舒.
移动边缘计算中基于深度强化学习的任务卸载研究进展
Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing
计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095
[14] 房婷, 宫傲宇, 张帆, 林艳, 贾林琼, 张一晋.
一种传输时限下认知无线电网络的动态广播策略
Dynamic Broadcasting Strategy in Cognitive Radio Networks Under Delivery Deadline
计算机科学, 2021, 48(7): 340-346. https://doi.org/10.11896/jsjkx.200900001
[15] 周仕承, 刘京菊, 钟晓峰, 卢灿举.
基于深度强化学习的智能化渗透测试路径发现
Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning
计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!