计算机科学 ›› 2016, Vol. 43 ›› Issue (Z11): 11-15.doi: 10.11896/j.issn.1002-137X.2016.11A.003

• 智能计算 • 上一篇    下一篇

基于推测的无响应任务自适应容错调度算法

崔云飞,吴晓进,戴晔,程肖,郭岗   

  1. 北京航天飞行控制中心 北京100094,北京航天飞行控制中心 北京100094,北京航天飞行控制中心 北京100094,北京航天飞行控制中心 北京100094,北京航天飞行控制中心 北京100094
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受部委级重点项目资助

Adaptive Fault-tolerant Scheduling Algorithm for Unresponsive Task Based on Speculation

CUI Yun-fei, WU Xiao-jin, DAI Ye, CHENG Xiao and GUO Gang   

  • Online:2018-12-01 Published:2018-12-01

摘要: 已有的 基于静态的执行失败判定时间阈值 的无响应任务容错调度算法,不能适应大数据处理中心动态的集群负载。针对该问题,提出判定无响应任务执行失败时间阈值自适应调整方法。基于该模型,设计了自适应的无响应任务容错调度算法(AFTS)。AFTS算法通过分析作业规模、单个任务大小和剩余作业推测执行时间等参数,自适应地调整无响应任务判定执行失败的时间阈值,以减少无响应任务对整体作业执行效率的影响,降低作业响应时间。基于开发的原型系统,验证了自适应判定方法,测试了算法的性能。实验结果表明,AFTS算法在作业响应时间等方面优于已有的无响应任务容错调度算法。

关键词: 大数据,容错调度,自适应,推测的,MapReduce

Abstract: Current fault-tolerant scheduling algorithm for unresponsive task,based on static execution failed time threshold,can not adapt to dynamic cluster load of large data processing center.To address this issue,an adaptive execution failed time threshold method was proposed.Based on this method,an adaptive fault-tolerant scheduling algorithm (AFTS) for unresponsive task was designed.AFTS adjusts unresponsive task’s time threshold to be determined failure dynamically and to reduce the job response time,according to the information of job size,the size of individual tasks and the remaining operating time.A prototype system using AFTS is developed,on which the validation of the adaptive execution failed time threshold method and the evaluation of AFTS’s performance are carried out.It is shown that AFTS outperforms current fault-tolerant scheduling algorithm in term of the job response time.

Key words: Big data,Fault-tolerant scheduling algorithm,Adaptive,Speculative,MapReduce

[1] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters [J].Communications of the ACM,2008,51(1):107-113
[2] 陆嘉恒.Hadoop实战[M].机械工业出版社,2012
[3] Adaptive Scheduler[EB/OL].https://issues.apache.org/jira/browse/MAPREDUCE-1380,3
[4] Improve speculative execution [EB/OL].https://issues.apache.org/jira/browse/MAPREDUCE-2039,3
[5] Speculative execution for Reads [EB/OL].https://issues.apa-che.org/jira/browse/CASSANSRA-4705,3
[6] Looking for speculative tasks is very expensive [EB/OL].https://issues.apache.org/jira/browse/MAPREDUCE-4499,3
[7] Dinu F, Ng T S E.Understanding the Effects and Implications of Compute Node Related Failures in Hadoop[R].HPDC’12.2012:18-22
[8] Lee K H,Lee Y J,Choi H,et al.Parallel Data Processing with MapReduce:A Survey[J].SIGMOD Record,2011,0(4):11-20
[9] Matei Z,Andy K,Anthony D.Improving MapReduce Performance in Heterogeneous Environments[C]∥8th Usenix Symposium on Operating Systems Design and Implementation.2008
[10] ResourceManagerRest [EB/OL].http://hadoop.apace.org/docs/r0.23.6,2013

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!