计算机科学 ›› 2022, Vol. 49 ›› Issue (3): 11-22.doi: 10.11896/jsjkx.210900117

• 新兴分布式计算技术与系统* 上一篇    下一篇

减少Hadoop集群中网络队头阻塞的调度算法

田冰川, 田臣, 周宇航, 陈贵海, 窦万春   

  1. 南京大学计算机科学与技术系 南京210023
  • 收稿日期:2021-09-14 修回日期:2021-12-12 出版日期:2022-03-15 发布日期:2022-03-15
  • 通讯作者: 田臣(tianchen@nju.edu.cn)
  • 作者简介:(bctian@smail.nju.edu.cn)
  • 基金资助:
    广东省重点研发计划(2020B0101390001);国家自然科学基金(61772265,61802172,62072228)

Reducing Head-of-Line Blocking on Network in Hadoop Clusters

TIAN Bing-chuan, TIAN Chen, ZHOU Yu-hang, CHEN Gui-hai, DOU Wan-chun   

  1. Department of Computer Science and Technology,Nanjing University,Nanjing 210023,China
  • Received:2021-09-14 Revised:2021-12-12 Online:2022-03-15 Published:2022-03-15
  • About author:TIAN Bing-chuan,born in 1993,Ph.D.His main research interests include network verification,programmable networks and distributed systems.
    TIAN Chen,born in 1978,Ph.D,asso-ciate professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data center networks,distributed systems,internet streaming and urban computing.
  • Supported by:
    Key-Area Research and Development Program of Guangdong Province(2020B0101390001) and National Natural Science Foundation of China(61772265,61802172,62072228).

摘要: 大数据分析系统的用户希望任务的执行时间尽可能短。然而,在任务执行期间,网络与计算时刻都可能成为阻碍任务执行的资源瓶颈。通过对大数据分析系统的观察与分析,得出如下结论:1)根据当前资源瓶颈的不同,数据并行框架应当在多种工作模式之间切换;2)子任务的调度应当充分考虑将来可能到达的新任务,而不能仅考虑当前已经提交的任务。基于上述观察,设计并实现了全新的任务调度系统Duopoly,其由感知计算资源的网络调度器cans与感知网络资源的子任务调度器nats两部分组成。通过小规模物理集群与大规模仿真实验对Duopoly的效果进行评估,实验结果表明,与现有工作相比,Duopoly可以将平均任务完成时间缩短37.30%~76.16%。

关键词: Hadoop集群, 队头阻塞, 任务调度, 网络调度

Abstract: Users of big data analytics systems want the execution time of tasks to be as short as possible.However,during task execution,both network and computational moments may become resource bottlenecks that hinder task execution.Through the observation and analysis of the big data analysis system,the following conclusions are drawn:1)the data-parallel framework should switch between multiple working modes depending on the current resource bottlenecks;2)the scheduling of subtasks should fully consider the new tasks that may arrive in the future,not only the currently submitted tasks.Based on the above observations,a new task scheduling system Duopoly is designed and implemented,which consists of two parts:cans,a network scheduler that senses computational resources,and nats,a sub-task scheduler that senses network resources.The effectiveness of Duopoly is evaluated by small-scale physical clusters and large-scale simulation experiments,and the experimental results show that Duopoly can reduce the average task completion time by 37.30%~76.16% compared with existing work.

Key words: Hadoop cluster, Head-of-line blocking, Job scheduling, Network scheduling

中图分类号: 

  • TP393
[1]OUSTERHOUT K,WENDELL P,ZAHARIA M,et al.Spar-row:distributed,low latency scheduling [C]//Proceedings of the 24th Symposium on Operating Systems Principles(SOSP 2013).New York:ACM,2013:69-84.
[2]SHINNAR A,CUNNINGHAM D,SARASWAT V,et al.M3r:increased performance for in-memory hadoop jobs[J].Procee-dings of the VLDB Endowment,2012,5(12):1736-1747.
[3]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing [C]//Proceedings of the 9th USENIX Confe-rence on Networked Systems Design and Implementation(NSDI 2012).Berkeley:USENIX Association,2012.
[4]TRIVEDI A,STUEDI P,PFEFFERLE J,et al.On the [ir] relevance of network performance for data processing [C]//Proceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing(HotCloud 2016).Berkeley:USENIX Association,2016:126-131.
[5]OUSTERHOUT K,CANEL C,RATNASAMY S,et al.Mono-tasks:Architecting for performance clarity in data analytics frameworks [C]//Proceedings of the 26th Symposium on Ope-rating Systems Principles(SOSP 2017).New York:ACM,2017:184-200.
[6]OUSTERHOUT K,RASTI R,RATNASAMY S,et al.Making sense of performance in data analytics frameworks [C]//Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation(NSDI 2015).Berkeley:USENIX Association,2015:293-307.
[7]CHOWDHURY M,ZAHARIA M,MA J,et al.Managing data transfers in computer clusters with orchestra [C]//Proceedings of the ACM SIGCOMM 2011 Conference(SIGCOMM 2011).New York:ACM,2011:98-109.
[8]DOGAR F R,KARAGIANNIS T,BALLANI H,et al.Decentralized task-aware scheduling for data center networks[J].ACM SIGCOMM Computer Communication Review,2014,44(4):431-442.
[9]CHOWDHURY M,ZHONG Y,STOICA I.Efficient coflowscheduling with varys[J].ACM SIGCOMM Computer Communication Review,2014,44(4):443-454.
[10]CHOWDHURY M,STOICA I.Efficient coflow schedulingwithout prior knowledge[J].ACM SIGCOMM Computer Communication Review,2015,45(4):393-406.
[11]ZAHARIA M,BORTHAKUR D,SARMA J S,et al.Delayscheduling:a simple technique for achieving locality and fairness in cluster scheduling [C]//Proceedings of the 5th European Conference on Computer Systems(EuroSys 2010).New York:ACM,2010:265-278.
[12]ISARD M,PRABHAKARAN V,CURREY J,et al.Quincy:fair scheduling for distributed computing clusters [C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles(SOSP 2009).New York:ACM,2009:261-276.
[13]ANANTHANARAYANAN G,KANDULA S,GREENBERGA G,et al.Reining in the outliers in map-reduce clusters using mantri [C]//Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation(OSDI 2010).Berkeley:USENIX Association,2010:265-278.
[14]AHMAD F,CHAKRADHAR S T,RAGHUNATHAN A,et al.Shuffle-aware scheduling in multi-tenant map-reduce clusters [C]//Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference(USENIX ATC 2014).Berkeley:USENIX Association,2014:1-12.
[15]JALAPARTI V,BODIK P,MENACHE I,et al.Network-aware scheduling for data-parallel jobs:Plan when you can[J].ACM SIGCOMM Computer Communication Review,2015,45(4):407-420.
[16]CHEN Y,ALSPAUGH S,KATZ R.Interactive analytical processing in big data systems:A cross-industry study of mapreduce workloads[J].Proceedings of the VLDB Endowment,2012,5(12):1802-1813.
[17]TAN J,CHIN A,HU Z Z.Dynmr:Dynamic mapreduce with reduce task interleaving and maptask backfilling [C]//Procee-dings of the Ninth European Conference on Computer Systems(EuroSys 2014).New York:ACM,2014:1-14.
[18]KAVULYA S,TAN J,GANDHI R,et al.An analysis of traces from a production mapreduce cluster [C]//Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster,Cloud and Grid Computing(CCGRID 2010).Washington:IEEE,2010:94-103.
[19]CHOWDHURY M,KANDULA S,STOICA I.Leveraging endpoint flexibility in data-intensive clusters[J].ACM SIGCOMM Computer Communication Review,2013,43(4):231-242.
[20]CHEN Y,GANAPATHI A,GRIFFITH R,et al.The case forevaluating mapreduce performance using workload suites [C]//Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling,Analysis,and Simulation of Computer and Telecommunication Systems(MASCOTS 2011).Washington:IEEE,2011:390-399.
[21]COSTA P,DONNELLY A,ROWSTRON A,et al.Camdoop:Exploiting in-network aggregation for big data applications [C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation(NSDI 2012).Berkeley:USENIX Association,2012.
[22]BAI W,CHEN L,CHEN K,et al.Information-agnostic flowscheduling for commodity data centers[C]//Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation(NSDI 2015).Berkeley:USENIX Association,2015:455-468.
[23]PENG Y,CHEN K,WANG G,et al.Hadoopwatch:A first step towards comprehensive traffic forecasting in cloud computing [C]//The 33rd Annual IEEE International Conference on Computer Communications(INFOCOM 2014).Piscataway:IEEE,2014:19-27.
[24]WANG H,CHEN L,CHEN K,et al.Flowprophet:Generic andaccurate traffic prediction for data-parallel cluster computing [C]//International Conference on Distributed Computing Systems(ICDCS 2015).Piscataway:IEEE,2015:349-358.
[25]MUNIR A,HE T,RAGHAVENDRA R,et al.Network scheduling aware task placement in datacenters [C]//Proceedings of the 12th International on Conference on Emerging Networking Experiments and Technologies(CoNEXT 2016).New York:ACM,2016:221-235.
[26]CHANG H,KODIALAM M,KOMPELLA R R,et al.Scheduling in mapreduce-like systems for fast completion time [C]//The 30th IEEE International Conference on Computer Communications(INFOCOM 2011).Piscataway:IEEE,2011:3074-3082.
[27]CHEN F,KODIALAM M,LAKSHMAN T.Joint scheduling of processing and shuffle phases in mapreduce systems [C]//The 31th IEEE International Conference on Computer Communications(INFOCOM 2012).Piscataway:IEEE,2012:1143-1151.
[1] 沈彪, 沈立炜, 李弋.
空间众包任务的路径动态调度方法
Dynamic Task Scheduling Method for Space Crowdsourcing
计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249
[2] 谭双杰, 林宝军, 刘迎春, 赵帅.
基于机器学习的分布式星载RTs系统负载调度算法
Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning
计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126
[3] 王政, 姜春茂.
一种基于三支决策的云任务调度优化算法
Cloud Task Scheduling Algorithm Based on Three-way Decisions
计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023
[4] 蔡凌峰, 魏祥麟, 邢长友, 邹霞, 张国敏.
故障场景下的边缘计算DAG任务重调度方法
Failure-resilient DAG Task Rescheduling in Edge Computing
计算机科学, 2021, 48(10): 334-342. https://doi.org/10.11896/jsjkx.210300304
[5] 张龙信, 周立前, 文鸿, 肖满生, 邓晓军.
基于异构云计算的成本约束下的工作流能量高效调度算法
Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems
计算机科学, 2020, 47(8): 112-118. https://doi.org/10.11896/jsjkx.200300038
[6] 孙敏, 陈中雄, 叶侨楠.
云环境下基于HEDSM的工作流调度策略
Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment
计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047
[7] 胡俊钦, 张佳俊, 黄引豪, 陈星, 林兵.
边缘环境下DNN应用的计算迁移调度技术
Computation Offloading Scheduling Technology for DNN Applications in Edge Environment
计算机科学, 2020, 47(10): 247-255. https://doi.org/10.11896/jsjkx.190900106
[8] 张洲, 黄国锐, 金培权.
基于Storm的任务调度:现状与研究展望
Task Scheduling on Storm:Current Situations and Research Prospects
计算机科学, 2019, 46(9): 28-35. https://doi.org/10.11896/j.issn.1002-137X.2019.09.004
[9] 曾金晶, 张建山, 林兵, 张文德.
基于无线城域网的微云负载均衡算法
Cloudlet Workload Balancing Algorithm in Wireless Metropolitan Area Networks
计算机科学, 2019, 46(8): 163-170. https://doi.org/10.11896/j.issn.1002-137X.2019.08.027
[10] 张建山, 林兵, 卢宇, 许芙蓉.
基于无线城域网的微云部署及用户任务调度
Cloudlet Placement and User Task Scheduling Based on Wireless Metropolitan Area Networks
计算机科学, 2019, 46(6): 128-134. https://doi.org/10.11896/j.issn.1002-137X.2019.06.019
[11] 叶符明, 李雯婷, 王颖.
MC2ETS:移动云计算中一种能效任务调度算法
MC2ETS:An Energy-efficient Tasks Scheduling Algorithm in Mobile Cloud Computing
计算机科学, 2019, 46(6): 135-142. https://doi.org/10.11896/j.issn.1002-137X.2019.06.020
[12] 马小晋,饶国宾,许华虎.
云计算中任务调度研究的调查
Research on Task Scheduling in Cloud Computing
计算机科学, 2019, 46(3): 1-8. https://doi.org/10.11896/j.issn.1002-137X.2019.03.001
[13] 王卓昊, 杨冬菊, 徐晨阳.
基于ISE算法的分布式ETL任务调度策略研究
Research on Distributed ETL Tasks Scheduling Strategy Based on ISE Algorithm
计算机科学, 2019, 46(12): 1-7. https://doi.org/10.11896/jsjkx.190100023
[14] 徐俊, 项倩红, 肖刚.
基于改进混合蛙跳算法的云工作流负载均衡调度优化
Load Balancing Scheduling Optimization of Cloud Workflow Using Improved Shuffled Frog Leaping Algorithm
计算机科学, 2019, 46(11): 315-322. https://doi.org/10.11896/jsjkx.181001866
[15] 吴修国, 刘翠.
云存储系统中最小开销的数据副本布局转换策略
Data Replicas Distribution Transition Strategy in Cloud Storage System
计算机科学, 2019, 46(10): 202-208. https://doi.org/10.11896/jsjkx.180901623
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!