计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 291-297.doi: 10.11896/j.issn.1002-137X.2019.09.044

• 交叉与前沿 • 上一篇    下一篇

强化学习下能耗优化的虚拟机放置策略

卢海峰, 顾春华, 罗飞, 丁炜超, 袁野, 任强   

  1. (华东理工大学信息科学与工程学院 上海200237)
  • 收稿日期:2018-08-28 出版日期:2019-09-15 发布日期:2019-09-02
  • 通讯作者: 顾春华(1970-),男,博士,教授,博士生导师,主要研究方向为物联网、云计算、软件工程,E-mail:chgu@ecust.edu.cn
  • 作者简介:卢海峰(1993-),男,博士生,主要研究方向为云计算和强化学习;罗 飞(1978-),男,博士,副教授,主要研究方向为分布式计算;丁炜超(1989-),男,博士,讲师,主要研究方向为云资源调度、分布式计算、多目标优化等;袁 野(1995-),男,硕士,主要研究方向为云计算;任 强(1993-),男,硕士生,主要研究方向为云计算。
  • 基金资助:
    国家自然科学基金面上项目(61472139),华东理工大学2017年教育教学规律与方法研究项目(ZH1726107)

Virtual Machine Placement Strategy with Energy Consumption Optimization under Reinforcement Learning

LU Hai-feng, GU Chun-hua, LUO Fei, DING Wei-chao, YUAN Ye, REN Qiang   

  1. (School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
  • Received:2018-08-28 Online:2019-09-15 Published:2019-09-02

摘要: 云数据中心的高速发展带来了非常强大的计算能力,但是伴随产生的能耗问题也日益严重。为了降低云数据中心内物理服务器的能耗开销,首先利用强化学习对虚拟机放置问题进行建模,随后结合实际问题从状态聚合和时间信度两个方面对Q-Learning(λ)算法进行优化,最后通过云仿真平台CloudSim和实际数据集对虚拟机放置问题进行实验。实验结果表明,与Q-Learning算法、Greedy算法和PSO算法相比,优化后的Q-Learning(λ)算法更有效地降低了物理服务器的能耗开销,同时针对不同数量的虚拟机放置请求也能够保证更好的结果,具有较强的实用价值。

关键词: Q-Learning(λ)算法, 能耗优化, 强化学习, 虚拟机放置, 云计算

Abstract: Although the rapid development of cloud data centers has brought very powerful computing power,the energy consumption problem has become increasingly serious.In order to reduce the energy consumption of physical servers in cloud data centers,firstly the virtual machine placement problem is modeled by reinforcement learning.Then,the Q-Learning(λ) algorithm is optimized from two aspects:state aggregation and time reliability.Finally,the virtual machine placement problem is simulated through cloud simulation platform CloudSim and actual data.The simulation results show that the optimized Q-Learning(λ) algorithm can effectively reduce the energy consumption of the cloud data center compared with the Greedy algorithm,PSO algorithm and Q-Learning algorithm,and can ensure better results for diffe-rent numbers of virtual machine placement requests.The proposed algorithm has strong practical value.

Key words: Cloud computing, Energy consumption optimization, Q-Learning(λ) algorithm, Reinforcement learning, Virtual machine placement

中图分类号: 

  • TP181
[1]GAI K,QIU M,ZHAO H,et al.Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing[J].Journal of Network & Computer Applications,2016,59(C):46-54.
[2]HAMEED A,KHOSHKBARFOROUSHHA A,RANJAN R,et al.A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems[J].Computing,2016,98(7):751-774.
[3]GAI K,QIU M,ZHAO H.Cost-Aware Multimedia Data Allocation for Heterogeneous Memory Using Genetic Algorithm in Cloud Computing[J].IEEE Transactions on Cloud Computing,2016,PP(99):1-1.
[4]LINDBERG P,LEINGANG J,LYSAKER D,et al.Comparison and analysis of eight scheduling heuristics for the optimization of energy consumption and makespan in large-scale distributed systems[J].Journal of Supercomputing,2012,59(1):323-360.
[5]BELOGLAZOV A,ABAWAJY J,BUYYA R.Energy-aware resource alocation heuristics for eficient management of data centers for cloud computing[J].Future Generation Computer Systems,2012,28(5):755-768.
[6]GAO Y,GUAN H,QI Z,et al.A multi-objective ant colony system algorithm for virtual machine placement in cloud computing[J].Journal of Computer & System Sciences,2013,79(8):1230-1242.
[7]NEJAD M M,MASHAYEKHY L,GROSU D.Truthful GreedyMechanisms for Dynamic Virtual Machine Provisioning and Allocation in Clouds[J].IEEE Transactions on Parallel & Distri-buted Systems,2015,26(2):594-603.
[8]COUTINHO R D C,FROTA Y,OLIVEIRA D D.Optimizingvirtual machine allocation for parallel scientific workflows in federated clouds[J].Future Generation Computer Systems,2015,46(C):51-68.
[9]MAO H,ALIZADEH M,MENACHE I,et al.Resource Management with Deep Reinforcement Learning[C]//ACM Workshop on Hot Topics in Networks.ACM,2016:50-56.
[10]RUPASINGHE N,GÜVENÇ I.Reinforcement learning for licensed-assisted access of LTE in the unlicensed spectrum[C]//Wireless Communications and Networking Conference.IEEE,2015:1279-1284.
[11]SALEEM Y,YAU K L A,MOHAMAD H,et al.Clustering and Reinforcement-Learning-Based Routing for Cognitive Radio Networks[J].IEEE Wireless Communications,2017,24(4):146-151.
[12]MORADI M.A centralized reinforcement learning method formulti-agent job scheduling in Grid[C]//International Confe-rence on Computer and Knowledge Engineering.Mashhad:IEEE,2017.
[13]BOTVINICK M,WEINSTEIN A,SOLWAY A,et al.Rein-forcement learning,efficient coding,and the statistics of natural tasks[J].Current Opinion in Behavioral Sciences,2015,5:71-77.
[14]ZHENG Q,LI R,LI X,et al.A Multi-Objective BiogeographyBased Optimization for Virtual Machine Placement[C]//2015 15th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.Shenzhen:IEEE,2015:687-696.
[15]YOU C,HUANG K,CHAE H,et al.Energy-Efficient Resource Allocation for Mobile-Edge Computation Offloading[J].IEEE Transactions on Wireless Communications,2017,16(3):1397-1411.
[16]GAI K,QIU M.Optimal resource allocation using reinforcement learning for IoT content-centric services [J].Applied Soft Computing,2018,70:12-21.
[17]KUMAR M,YADAV A K,KHATRI P,et al.Global host allocation policy for virtual machine in cloud computing[J].International Journal of Information Technology,2018,10(3):279-287.
[18]SANTRA S,MALI K.A new approach to survey on load balancing in VM in cloud computing:Using CloudSim[C]//International Conference on Computer,Communication and Control.IEEE,2016:1-5.
[19]DUONG T,CHU Y J,NGUYEN T,et al.Virtual MachinePlacement via Q-Learning with Function Approximation[C]//IEEE Global Communications Conference.San Diego:IEEE,2015:1-6.
[20]HABIB A,KHAN M I.Reinforcement learning based autonomic virtual machine management in clouds[C]//International Conference on Informatics,Electronics and Vision.Univ Dhaka:IEEE,2016:1083-1088.
[21]XU ZX,et al.Deep Reinforcement Learning with Sarsa and Q-Learning:A Hybrid Approach[J].IEICE Transactions on Information and Systems,2018,E101d(9):2315-2322.
[22]TENG L,BIN T,YUN A,et al.Parallel reinforcement learning:a framework and case study[J].IEEE/CAA Journal of Automatica Sinica,2018,5(4):827-835.
[23]NISHIYAMA R,YAMADA S.Reinforcement Learning withMultiple Actions[C]//Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems.New York:Springer2016:207-213.
[24]HOMEM T P D,PERICO D H,SANTOS P E,et al.Improving Reinforcement Learning Results with Qualitative Spatial Representation[C]//Brazilian Conference on Intelligent Systems.Brazil:IEEE,2017:151-156.
[25]DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarkingdeep reinforcement learning for continuous control[C]//International Conference on International Conference on Machine Learning.New York:ACM,2016:1329-1338.
[26]LITTMAN M L.Reinforcement learning improves behaviourfrom evaluative feedback[J].Nature,2015,521(7553):445-451.
[27]THERRIEN A S,WOLPERT D M,BASTIAN A J.Effectivereinforcement learning following cerebellar damage requires a balance between exploration and motor noise[J].Brain,2016,139(1):101-114.
[28]CUTLER M,WALSH T J,HOW J P.Real-World Reinforcement Learning via Multifidelity Simulators[J].IEEE Transactions on Robotics,2017,31(3):655-671.
[29]LEONG Y C,RADULESCU A,DANIEL R,et al.Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments[J].Neuron,2017,93(2):451-463.
[30]KIM B G,ZHANG Y,SCHAAR M V D,et al.Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning[J].IEEE Transactions on Smart Grid,2016,7(5):2187-2198.
[31]XIONG R,CAO J,YU Q.Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle[J].Applied Energy,2018,211:538-548.
[32] SAMBROOK T D,GOSLIN J.Principal Components Analysis of Reward Prediction Errors in a Reinforcement Learning Task[J].Neuroimage,2016,124(Pt A):276-286.
[33]CHEN H,LI X,ZHAO F.A Reinforcement Learning-BasedSleep Scheduling Algorithm for Desired Area Coverage in Solar-Powered Wireless Sensor Networks[J].IEEE Sensors Journal,2016,16(8):2763-2774.
[1] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[2] 刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波.
基于边缘智能的频谱地图构建与分发方法
Construction and Distribution Method of REM Based on Edge Intelligence
计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3] 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军.
基于多智能体强化学习的端到端合作的自适应奖励方法
Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning
计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[4] 袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟.
智能博弈对抗方法:博弈论与强化学习综合视角对比分析
Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[5] 于滨, 李学华, 潘春雨, 李娜.
基于深度强化学习的边云协同资源分配算法
Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning
计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[6] 李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳.
基于深度确定性策略梯度的服务器可靠性任务卸载策略
Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient
计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[7] 郭雨欣, 陈秀宏.
融合BERT词嵌入表示和主题信息增强的自动摘要模型
Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement
计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101
[8] 范静宇, 刘全.
基于随机加权三重Q学习的异策略最大熵强化学习算法
Off-policy Maximum Entropy Deep Reinforcement Learning Algorithm Based on RandomlyWeighted Triple Q -Learning
计算机科学, 2022, 49(6): 335-341. https://doi.org/10.11896/jsjkx.210300081
[9] 谢万城, 李斌, 代玥玥.
空中智能反射面辅助边缘计算中基于PPO的任务卸载方案
PPO Based Task Offloading Scheme in Aerial Reconfigurable Intelligent Surface-assisted Edge Computing
计算机科学, 2022, 49(6): 3-11. https://doi.org/10.11896/jsjkx.220100249
[10] 洪志理, 赖俊, 曹雷, 陈希亮, 徐志雄.
基于遗憾探索的竞争网络强化学习智能推荐方法研究
Study on Intelligent Recommendation Method of Dueling Network Reinforcement Learning Based on Regret Exploration
计算机科学, 2022, 49(6): 149-157. https://doi.org/10.11896/jsjkx.210600226
[11] 张佳能, 李辉, 吴昊霖, 王壮.
一种平衡探索和利用的优先经验回放方法
Exploration and Exploitation Balanced Experience Replay
计算机科学, 2022, 49(5): 179-185. https://doi.org/10.11896/jsjkx.210300084
[12] 李鹏, 易修文, 齐德康, 段哲文, 李天瑞.
一种基于深度学习的供热策略优化方法
Heating Strategy Optimization Method Based on Deep Learning
计算机科学, 2022, 49(4): 263-268. https://doi.org/10.11896/jsjkx.210300155
[13] 欧阳卓, 周思源, 吕勇, 谭国平, 张悦, 项亮亮.
基于深度强化学习的无信号灯交叉路口车辆控制
DRL-based Vehicle Control Strategy for Signal-free Intersections
计算机科学, 2022, 49(3): 46-51. https://doi.org/10.11896/jsjkx.210700010
[14] 周琴, 罗飞, 丁炜超, 顾春华, 郑帅.
基于逐次超松弛技术的Double Speedy Q-Learning算法
Double Speedy Q-Learning Based on Successive Over Relaxation
计算机科学, 2022, 49(3): 239-245. https://doi.org/10.11896/jsjkx.201200173
[15] 李素, 宋宝燕, 李冬, 王俊陆.
面向金融活动的复合区块链关联事件溯源方法
Composite Blockchain Associated Event Tracing Method for Financial Activities
计算机科学, 2022, 49(3): 346-353. https://doi.org/10.11896/jsjkx.210700068
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!