计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 387-395.doi: 10.11896/jsjkx.241200020

• 计算机网络 • 上一篇    下一篇

基于多目标优化的大规模Hadoop集群虚拟机放置

文佳1,2,3, 吴舒霞1,2,3, 于正欣4, 苗旺5, 陈哲毅1,2,3   

  1. 1 福州大学计算机与大数据学院 福州 350116
    2 大数据智能教育部工程研究中心 福州 350002
    3 福建省网络计算与智能信息处理重点实验室(福州大学) 福州 350116
    4 兰卡斯特大学计算与通信学院 英国 兰卡斯特 LA1 4YW
    5 埃克塞特大学计算机科学系 英国 埃克塞特 EX4 4QF
  • 收稿日期:2024-12-02 修回日期:2025-03-18 发布日期:2026-02-10
  • 通讯作者: 陈哲毅(z.chen@fzu.edu.cn)
  • 作者简介:(1448341761@qq.com)
  • 基金资助:
    国家自然科学基金(62202103);福建省自然科学基金杰出青年基金(2025J010020);中央引导地方科技发展资金项目(2022L3004);福建省科技经济融合服务平台(2023XRH001);福厦泉国家自主创新示范区协同创新平台项目(2022FX5)

Multi-objective Optimization for Virtual Machine Placement in Large-scale Hadoop Cluster

WEN Jia1,2,3, WU Shuxia1,2,3, YU Zhengxin4, MIAO Wang5, CHEN Zheyi1,2,3   

  1. 1 College of Computer and Data Science,Fuzhou University,Fuzhou 350116,China
    2 Key Laboratory of Spatial Data Mining & Information Sharing,Ministry of Education,Fuzhou 350002,China
    3 Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing,Fuzhou 350116,China
    4 School of Computing and Communications,Lancaster University,Lancaster LA1 4YW,UK
    5 Department of Computer Science,University of Exeter,Exeter EX4 4QF,UK
  • Received:2024-12-02 Revised:2025-03-18 Online:2026-02-10
  • About author:WEN Jia,born in 2000,postgraduate,is a member of CCF(No.P7488G).Her main research interests include cloud/edge computing and virtual machine placement.
    CHEN Zheyi,born in 1991,Ph.D,professor,Ph.D supervisor, is a member of CCF(No.41902M).His main research interests include cloud/edge computing,resource optimization and machine learning.
  • Supported by:
    National Natural Science Foundation of China(62202103),Natural Science Foundation of Fujian Province for Distinguished Young Fund(2025J010020),Central Funds Guiding the Local Science and Technology Development(2022L3004),Fujian Province Technology and Economy Integration Service Platform(2023XRH001) and Fuzhou-Xiamen-Quanzhou National Independent Innovation Demonstration Zone Collaborative Innovation Platform(2022FX5).

摘要: 虚拟化技术已成为云计算快速发展的核心支撑。Hadoop作为一种广泛应用于云环境中的分布式框架,其集群性能通常受限于低下的资源管理效率。随着数据量与集群规模的不断增大,如何高效优化虚拟机放置进而降低Hadoop集群能耗、提升资源利用率和缩短文件访问延迟已成为一个极具挑战的难题。对此,提出了新型的面向大规模Hadoop集群虚拟机放置的可变长度双染色体多目标优化(Multi-objective Optimization with Variable Length Double chromosome,MO-VLD)方法。首先,通过结合可变长度染色体与非支配排序遗传算法(Non-dominated Sorting Genetic Algorithm-III,NSGA-III),设计了双染色体结构。接着,引入两阶段交叉与变异操作以增强解空间探索的多样性。基于谷歌集群真实运行数据集的大量实验表明,MO-VLD方法能够有效应对动态的资源需求并提升Hadoop集群的资源管理效率。相比于基准方法,MO-VLD方法在能耗、资源利用率和文件访问延迟方面均展现出更加优越的性能。

关键词: 云计算, Hadoop, 虚拟机放置, 多目标优化, 遗传算法

Abstract: Virtualization technology has become the core support for the rapid development of cloud computing.As a popular distributed framework in cloud environments,the performance of the Hadoop cluster is usually limited by the low efficiency of resource management.With the increasing data volume and cluster scale,it is challenging to efficiently optimize Virtual Machine(VM) placement in the Hadoop cluster to reduce energy consumption,increase resource utilization,and lessen file access latency.To address this important challenge,this paper proposes a novel Multi-objective Optimization with Variable Length Double chromosome(MO-VLD) method for VM placement in the large-scale Hadoop cluster.Firstly,a double chromosome structure is designed by combining the variable length chromosome with NSGA-III.Next,two-stage crossover and mutation operations are introduced to enhance the exploration diversity of solution space.Using the real-world runtime datasets of the Google cluster,extensive simulation experiments demonstrate that the proposed MO-VLD method can effectively handle the dynamic resource demands and improve the resource management efficiency of the Hadoop cluster.Compared to benchmark methods,the MO-VLD method shows superior performance in terms of energy consumption,resource utilization,and file access latency.

Key words: Cloud computing, Hadoop, Virtual machine placement, Multi-objective optimization, Genetic algorithm

中图分类号: 

  • TP393
[1]MIAO C,ZHONG Z,XIAO Y,et al.MegaTE:Extending WAN Traffic Engineering to Millions of Endpoints in Virtualized Cloud[C]//Proceedings of the ACM SIGCOMM 2024 Confe-rence.2024:103-116.
[2]WEI C,LI X,YANG Y,et al.Achelous:Enabling Programmability,Elasticity,and Reliability in Hyperscale Cloud Networks[C]//Proceedings of the ACM SIGCOMM 2023 Conference.2023:769-782.
[3]TAKAMORI D.HDFS Users Guide.[EB/OL](2023-06-18).https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html.
[4]TAKAMORI D.MapReduce Tutorial.[EB/OL](2022-05-18).https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html.
[5]CHEN Q,HUANG W,HUANG Y.The Learnable Model-Based Genetic Algorithm for the IP Mapping Problem[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2022,42(7):2350-2363.
[6]AYERDI J,TERRAGNI V,JAHANGIROVA G,et al.Automatically Generating Metamorphic Relations via Genetic Programming[J].arXiv:2312.15302,2023.
[7]BRAIKI K,YOUSSEF H.Multi-objective virtual machine placement algorithm based on particle swarm optimization[C]//2018 14th International Wireless Communications & Mobile Computing Conference(IWCMC).IEEE,2018:279-284.
[8]BHATT C,SINGHAL S.Hybrid Metaheuristic Technique for Optimization of Virtual Machine Placement in Cloud[J].International Journal of Fuzzy Logic and Intelligent Systems,2023,23(3):353-364.
[9]SRIVASTAVA A,KUMAR N.Virtual Machine AllocationUsing Genetic-Based Algorithm in Cloud Infrastructure[C]//Proceedings of Second International Conference on Computational Electronics for Wireless Communications:ICCWC 2022.Singapore:Springer,2023:273-282.
[10]YARAHMADI A,MOMTAZPOUR M.VM placement in acce-lerator-equipped data centers using variable-length modified genetic algorithm[C]//2021 29th Iranian Conference on Electrical Engineering(ICEE).IEEE,2021:562-567.
[11]SWAIN S R,PARASHAR A,SINGH A K,et al.A Multi-objective Virtual Machine Placement Optimization in Sustainable Cloud Environment[C]//International Conference on Deep Learning,Artificial Intelligence and Robotics.Cham:Springer,2023:415-426.
[12]TANG M,PAN S.A Hybrid Genetic Algorithm for the Energy-Efficient Virtual Machine Placement Problem in Data Centers[J].Neural Processing Letters,2015,41(2):211-221.
[13]GOPU A,THIRUGNANASAMBANDAM K,ALGHAMDI AS,et al.Energy-efficient virtual machine placement in distributed cloud using NSGA-III algorithm[J].Journal of Cloud Computing,2023,12(1):124.
[14]CONEJERO J,CAMINERO B,CARRIÓN C.Analysing Hadoop performance in a multi-user IaaS Cloud[C]//2014 International Conference on High Performance Computing & Simulation(HPCS).IEEE,2014:399-406.
[15]HEDAYATI S,MALEKI N,OLSSON T,et al.MapReducescheduling algorithms in Hadoop:a systematic study[J].Journal of Cloud Computing,2023,12(1):143.
[16]GUERRERO C,LERA I,BERMEJO B,et al.Multi-Objective Optimization for Virtual Machine Allocation and Replica Placement in Virtualized Hadoop[J].IEEE Transactions on Parallel and Distributed Systems,2018,29(11):2568-2581.
[17]MARQUEZ J,MONDRAGON O H,GONZALEZ J D.An Intelligent Approach to Resource Allocation on Heterogeneous Cloud Infrastructures[J].Applied Sciences,2021,11(21):9940.
[18]LI Y,HEI X.Performance optimization of computing taskscheduling based on the Hadoop big data platform[J].Neural Computing and Applications,2022,37:8181-8192.
[19]GHAZALI R,ADABI S,DOWN D G,et al.A classification of Hadoop job schedulers based on performance optimization approaches[J].Cluster Computing,2021,24(4):3381-3403.
[20]ZHANG Y,ZHANG X.Minimizing Data Access Latencies via Virtual Machine Placement Method in Datacenter[C]//2017 14th International Symposium on Pervasive Systems,Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing(ISPAN-FCST-ISCC).IEEE,2017:197-202.
[21]NAIK S,KALRA M.Big Data Processing with Balanced Resource Utilization[C]//5th International Conference on Next Generation Computing Technologies.2020.
[22]MIRIAM A J,SAMINATHAN R,CHAKARAVARTHI S.Non-dominated Sorting Genetic Algorithm(NSGA-III) for effective resource allocation in cloud[J].Evolutionary Intelligence,2021,14:759-765.
[23]DE MAIO V,KECSKEMETI G,PRODAN R.An improvedmodel for live migration in data centre simulators[C]//Procee-dings of the 9th International Conference on Utility and Cloud Computing.2016:108-117.
[24]Johnwilkes.Google Cluster data[EB/OL].(2020-04-02).ht-tps://github.com/google/cluster-data/blob/master/TraceVer-sion1.md.
[25]Amazon.Amazon EC2instance types[EB/OL].(2024-09-25).ht-tps://aws.amazon.com/cn/ec2/instance-types/.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!