计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 333-339.doi: 10.11896/jsjkx.201100154
王英恺, 王青山
WANG Ying-kai, WANG Qing-shan
摘要: 随着物联网的普及,对物联网终端设备可使用能量的要求也在提高。能量收集技术拥有广阔前景,其能通过产生可再生能量来解决设备能量短缺问题。考虑到未知环境中可再生能量的不确定性,物联网终端设备需要合理有效的能量分配策略来保证系统持续稳定工作。文中提出了一种基于DQN的深度强化学习能量分配策略,该策略通过DQN算法直接与未知环境交互来逼近目标最优能量分配策略,而不依赖于环境的先验知识。在此基础上,还基于强化学习的特点和系统的非时变系统特征,提出了一种预训练算法来优化该策略的初始化状态和学习速率。在不同的信道数据条件下进行仿真对比实验,结果显示提出的能量分配策略在不同信道条件下均有好于现有策略的性能,且兼具很强的变场景学习能力。
中图分类号:
[1]YANG Z,XU W,PAN Y,et al.Energy Efficient Resource Allocation in Machine-to-Machine Communications with Multiple Access and Energy Harvesting for IoT[J].IEEE Internet of Things Journal,2017,5(1):229-245. [2]ULUKUS S,AYLIN Y,ELZA E,et al.Energy HarvestingWireless Communications:A Review of Recent Advances[J].IEEE Journal on Selected Areas in Communications,2015,33(3):360-381. [3]OZEL O,TUTUNCUOGLU K,ULUKUS S,et al.Fundamental Limits of Energy Harvesting Communications[J].IEEE Communications Magazine,2015,53(4):126-132. [4]ZHANG L L,XIONG K,ZHANG Y.UAV-assisted WirelessEnergy Harvesting Fog Computing Network Optimization Method[J].Journal of Software,2019,30(1):9-17. [5]ABU B,JOSIAH H.Making sense of intermittent energy harvesting[C]//Conference on Embedded Networked Sensor Systems.ACM,2018:32-37. [6]THUC T K,HOSSAIN E,TABASSUM H.Downlink PowerControl in Two-Tier Cellular Networks with Energy-Harvesting Small Cells as Stochastic Games[J].IEEE Transactions on Communications,2015,63(12):5267-5282. [7]KU M L,LI W,CHEN Y,et al.Advances in Energy Harvesting Communications:Past,Present,and Future Challenges[J].IEEE Communications Surveys & Tutorials,2017,18(2):1384-1412. [8]YANG J,ULUKUS S.Optimal Packet Scheduling in an Energy Harvesting Communication System[J].IEEE Transactions on Communications,2010,60(1):220-230. [9]TUTUNCUOGLU K,YENER A.Optimum Transmission Policies for Battery Limited Energy Harvesting Nodes[J].IEEE Transactions on Wireless Communications,2010,11(3):1180-1189. [10]YUAN F,ZHANG Q T,JIN S,et al.Optimal Harvest-Use-Store Strategy for Energy Harvesting Wireless Systems[J].IEEE Transactions on Wireless Communications,2015,14(2):698-710. [11]CHI K K,XU X C,WEI X C.Minimal Base Stations Deploy-ment Scheme Satisfying Node Throughput Requirement in Radio Frequency Energy Harvesting Wireless Sensor Networks[J].Computer Science,2018,45(S1):332-336. [12]TIAN X Z,YAO C,ZHAO C,et al.5G Network oriented Mobile Edge Computation Offloading Strategy[J].Computer Scien-ce,2020,47(S2):286-290. [13]BLASCO P,GUNDUZ D,DOHLER M.A Learning Theoretic Approach to Energy Harvesting Communication System Optimization[J].IEEE Transactions on Wireless Communications,2013,12(4):1872-1882. [14]OZEL O,TUTUNCUOGLU K,YANG J,et al.Transmissionwith Energy Harvesting Nodes in Fading Wireless Channels:Optimal Policies[J].IEEE Journal on Selected Areas in Communications,2011,29(8):1732-1743. [15]AMIRNAVAEI F,DONG M.Online Power Control Optimization for Wireless Transmission with Energy Harvesting and Storage[J].IEEE Transactions on Wireless Communications,2016,15(7):4888-4901. [16]SUTTON R,BARTO A.Reinforcement Learning:An Introduction[M].MIT Press,1998. [17]CHU M,LI H,LIAO X,et al.Reinforcement Learning based Multi-Access Control and Battery Prediction with Energy Harvesting in IoT Systems[J].IEEE Internet of Things Journal,2019,6(2):2009-2020. [18]FRANCESCO F,BHARATHAN B,RAJESH G.Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning[C]//Conference on Embedded Networked Sensor Systems.ACM,2018:7-13. [19]JIA Z G,WANG Z P,HU J T.Work-in-Progress:Q-Learning Based Routing for Transiently Powered Wireless Sensor Network[C]//International Conference on Hardware/Software Codesign and System Synthesis.ACM,2019:1-2. [20]WEI Y,YU F R,SONG M,et al.User Scheduling and Resource Allocation in HetNets with Hybrid Energy Supply:An Actor-Critic Reinforcement Learning Approach[J].IEEE Transactions on Wireless Communications,2018,17(1):680-692. [21]AOUDIA F A,GAUTIER M,BERDER O.RLMan:An Energy Manager based on Reinforcement Learning for Energy Harvesting Wireless Sensor Networks[J].IEEE Transactions on Green Communications & Networking,2018,2(2):408-417. [22]CHU M,LI H,LIAO X,et al.Power Control in Energy Harvesting Multiple Access System with Reinforcement Learning[J].IEEE Internet of Things Journal,2019,6(5):9175-9186. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219 |
[3] | 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040 |
[4] | 谢万城, 李斌, 代玥玥. 空中智能反射面辅助边缘计算中基于PPO的任务卸载方案 PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing 计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249 |
[5] | 王思明, 谭北海, 余荣. 面向6G可信可靠智能的区块链分片与激励机制 Blockchain Sharding and Incentive Mechanism for 6G Dependable Intelligence 计算机科学, 2022, 49(6): 32-38. https://doi.org/10.11896/jsjkx.220400004 |
[6] | 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄. 基于遗憾探索的竞争网络强化学习智能推荐方法研究 Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration 计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226 |
[7] | 李鹏, 易修文, 齐德康, 段哲文, 李天瑞. 一种基于深度学习的供热策略优化方法 Heating Strategy Optimization Method Based on Deep Learning 计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155 |
[8] | 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮. 基于深度强化学习的无信号灯交叉路口车辆控制 DRL-based Vehicle Control Strategy for Signal-free Intersections 计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010 |
[9] | 周琴, 罗飞, 丁炜超, 顾春华, 郑帅. 基于逐次超松弛技术的Double Speedy Q-Learning算法 Double Speedy Q-Learning Based on Successive Over Relaxation 计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173 |
[10] | 代珊珊, 刘全. 基于动作约束深度强化学习的安全自动驾驶方法 Action Constrained Deep Reinforcement Learning Based Safe Automatic Driving Method 计算机科学, 2021, 48(9): 235-243. https://doi.org/10.11896/jsjkx.201000084 |
[11] | 张帆, 宫傲宇, 邓磊, 刘芳, 林艳, 张一晋. 面向实际信道观测环境的时限约束无线下行调度策略 Wireless Downlink Scheduling with Deadline Constraint for Realistic Channel Observation Environment 计算机科学, 2021, 48(9): 264-270. https://doi.org/10.11896/jsjkx.210100143 |
[12] | 成昭炜, 沈航, 汪悦, 王敏, 白光伟. 基于深度强化学习的无人机辅助弹性视频多播机制 Deep Reinforcement Learning Based UAV Assisted SVC Video Multicast 计算机科学, 2021, 48(9): 271-277. https://doi.org/10.11896/jsjkx.201000078 |
[13] | 梁俊斌, 张海涵, 蒋婵, 王天舒. 移动边缘计算中基于深度强化学习的任务卸载研究进展 Research Progress of Task Offloading Based on Deep Reinforcement Learning in Mobile Edge Computing 计算机科学, 2021, 48(7): 316-323. https://doi.org/10.11896/jsjkx.200800095 |
[14] | 房婷, 宫傲宇, 张帆, 林艳, 贾林琼, 张一晋. 一种传输时限下认知无线电网络的动态广播策略 Dynamic Broadcasting Strategy in Cognitive Radio Networks Under Delivery Deadline 计算机科学, 2021, 48(7): 340-346. https://doi.org/10.11896/jsjkx.200900001 |
[15] | 周仕承, 刘京菊, 钟晓峰, 卢灿举. 基于深度强化学习的智能化渗透测试路径发现 Intelligent Penetration Testing Path Discovery Based on Deep Reinforcement Learning 计算机科学, 2021, 48(7): 40-46. https://doi.org/10.11896/jsjkx.210400057 |
|