计算机科学 ›› 2014, Vol. 41 ›› Issue (Z6): 42-46.

• 智能计算 • 上一篇    下一篇

一种基于DAG的MapReduce任务调度算法

唐一韬,黄晶,肖球   

  1. 湖南长沙民政职业技术学院 长沙410004;湖南大学信息科学与工程学院 长沙410082;湖南大学信息科学与工程学院 长沙410082
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受2011年湖南省“十二五”规划课题:基于立体化教学环境的创新型IT人才模式研究与实践,2012年国家教育部资助

Task Scheduling Algorithm for MapReduce Based on DAG

TANG Yi-tao,HUANG Jing and XIAO Qiu   

  • Online:2018-11-14 Published:2018-11-14

摘要: Hadoop已成为研究云计算的基础平台,MapReduce是其大数据分布式处理的计算模型。针对异构集群下MapReduce数据分布、数据本地性、作业执行流程等问题,提出一种基于DAG的MapReduce调度算法。把集群中的节点按计算能力进行划分,将MapReduce作业转换成DAG模型,改进向上排序值计算方法,使其在异构集群中计算更精准、任务的优先级排序更合理。综合节点的计算能力与数据本地性及集群利用情况,选择合理的数据节点分配和执行任务,减少当前任务完成时间。实验表明,该算法能合理分布数据,有效提高数据本地性,减少通信开销,缩短整个作业集的调度长度,从而提高集群的利用率。

关键词: DAG,调度算法,MapReduce,Hadoop,异构环境,大数据 中图法分类号TP302文献标识码A

Abstract: Hadoop has been the basic platform of cloud computing research,and MapReduce is the computing mode for distributed processing of big data.For heterogeneous cluster,considering MapReduce’s defects in data distribution,data locality and process of the job execution,we proposed a DAG based MapReduce scheduling algorithm.The algorithm groups the nodes based on their computing ability,transforms MapReduce job execution to DAG model and improves upward ranking to achieve better accuracy and a more reasonable sequencing of task priority.By combining the computing ability of nodes,data locality and cluster utilization,choosing the proper data nodes for task distribution and execution,our algorithm shortens task completion time.The experimental result shows that the proposed algorithm can distribute data reasonably,improve data locality effectively,reduce communication overhead,shorten schedule length of set of job,thus improving utilization of cluster.

Key words: DAG,Scheduling algorithm,MapReduce,Hadoop,Heterogeneous environment,Big data

[1] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Apache Hadoop.Hadoop[EB/OL].http://hadoop.apache.org/,2009-03-06
[3] Vaquero L M,Rodero-Merino L,Caceres J,et al.A Break In the cloud:Towards a Ckoud Definition[J].ACM SIGCOMM Computer Communication Review,2009,39(1):50-55
[4] 陆嘉恒.Hadoop实战(第3版)[M].北京:机械工业出版社,2011
[5] Zaharia M,Borthakur D,Sarma J S,et al.Delay Scheduling:A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling[C]∥Proceedings of the 5th European Conference on Computer Systems.2010:265-278
[6] Xie J,Yin S,Ruan X J,et al.Improving MapReduce Performancethrough Data Placement in Heterogeneous Hadoop Clusters[C]∥IEEE International Symposium on Parallel & Distributed Processing,Workshops and PhdForum.2010:1-9
[7] Zhang X H,Zhong Z Y,Feng S Z,et al.Improving Data Localityof MapReduce by Scheduling in Homogeneous Computing Environments[C]∥IEEE 9th International Symposium on Parallel and Distributed Processing with Applications.2011:120-126
[8] Guo Lei-tao,Sun Hong-wei,et al.A data distribution aware taskscheduling strategy for mapreduce system[C]∥First International Conference on Cloud Computing.2009
[9] Verma A,Cherkasova L,Campbell R.Resource ProvisioningFramework for MapReduce Jobs withPerformance Goals[J].Lecture Notes in Computer Science,2011,9:165-186
[10] Polo J,Carrera D,et al.Performance-driven task co-scheduling for mapreduce environments[C]∥Proc of IEEE/IFIP Network Operations and Management Symposium.2010
[11] Kc K,Anyanwu K.Scheduling Hadoop Jobs to Meet Deadlines[C]∥IEEE Second International Conference on Cloud Computing Technology and Science.2010:388-392
[12] Polo J,Carrera D,Becerra Y,et al.Performance-Driven Task Co-Scheduling for MapReduce Environments[C]∥IEEE proceedings of Network Operations and Management Symposium.2010:373-380
[13] Tang Zhuo,Zhou Jun-qing,Li Ken-li,et al.MTSD:A taskscheduling algorithm for MapReduce base on deadline constraints[C]∥IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.2012
[14] Zaliva V,Orlov V.Hamake:A Data Flow Approach to DataProcessing in Hadoop[C]∥CLOSER.2012:457-461
[15] Furst S.Challenges in the design of automotive software[C]∥Proceedings of the Conference on Design,Automation and Test in Europe.European Design and Automation Association,2010:256-258
[16] Arabnejad H,Barbosa J.Fairness resource sharing for dynamicworkflow scheduling on Heterogeneous Systems[C]∥Parallel and Distributed Processing with Applications (ISPA),2012IEEE 10th International Symposium on.IEEE,2012:633-639
[17] Klobedanz K,Koenig A,Mueller W.A reconfiguration approach for fault-tolerant flexray networks[C]∥Design,Automation & Test in Europe Conference & Exhibition (DATE),2011.IEEE,2011:1-6

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!