计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 141-146.doi: 10.11896/j.issn.1002-137X.2015.04.028

• 软件与数据库技术 • 上一篇    下一篇

基于压力反馈的MapReduce负载均衡策略

李航晨,秦小麟,沈 尧   

  1. 南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61373015,61300052,41301407),国家教育部高等学校博士学科点专项科研基金资助

Load Balancing Strategy Based on Pressure Feedback on MapReduce

LI Hang-chen, QIN Xiao-lin and SHEN Yao   

  • Online:2018-11-14 Published:2018-11-14

摘要: 数据倾斜是严重影响MapReduce性能的因素之一。数据倾斜问题的现有解决方法需要用户对应用类型提供针对的分区函数,或是为MapReduce编写额外的采样过程,增加了用户的负担。为解决上述问题,提出了一种基于压力统计的负载均衡策略。该策略充分利用MapReduce中的混洗阶段,在reducer准备数据的同时进行统计,以获取全局数据分布。系统根据数据分布情况对负载较重节点进行调度,平衡整个集群负载,而无需用户提供额外的输入。此外,考虑到上层不同的应用类型,引入了压力反馈机制来进一步提高调度策略的性能。实验结果表明,提出的负载均衡调度策略的性能优于默认策略性能。

关键词: MapReduce,数据倾斜,负载均衡,压力反馈

Abstract: Data skew is one of the factors which seriously affects the performance of MapReduce.Existing solutions for the data skew problem increase the burden that the users need to provide the partition function for the specific application,or write additional sampling processes for the MapReduce.To solve this problem,we presented a load balancing strategy based on pressure statistics.To get the global data distribution,we computed the statistics while preparing data,which makes full use of the shuffle stage in MapReduce.To balance the entire cluster,the strategy schedules the heavy nodes according to the data distribution,without requiring the user to provide additional input.In addition,due to the complexity of the applications,we introduced the pressure feedback mechanism,and further improved the perfor-mance of the scheduling policy.The experimental results show that our strategy is far more efficient than the default strategy.

Key words: MapReduce,Data skew,Load balance,Pressure feedback

[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113
[2] http://hadoop.apache.org
[3] Dhawalia P,Kailasam S,Janakiram D.Chisel:A Resource Savvy Approach for Handling Skew in MapReduce Applications[C]∥2013 IEEE Sixth International Conference on Cloud Computing (CLOUD).IEEE,2013:652-660
[4] DeWitt D J,Naughton J F,Schneider D A,et al.Practical skew handling in parallel joins[C]∥Very Large Data Bases(VLDB).1992:27-40
[5] Poosala V,Ioannidis Y E.Estimation of query-result distribution and its application in parallel-join load balancing[C]∥VLDB.1996:448-459
[6] Shatdal A,Naughton J F.Adaptive parallel aggregation algo-rithms[J].ACM SIGMOD Record,ACM,1995,24(2):104-114
[7] Gates A F,Natkovich O,Chopra S,et al.Building a high-level dataflow system on top of Map-Reduce:the Pig experience[J].Proceedings of the VLDB Endowment,2009,2(2):1414-1425
[8] Kwon Y C,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]∥Proceedings of the 1st ACM symposium on Cloud computing.ACM,2010:75-86
[9] Ibrahim S,Jin H,Lu L,et al.Handling partitioning skew in MapReduce using LEEN[J].Peer-to-Peer Networking and Applications,2013,6(4):409-424
[10] 傅杰,都志辉.一种周期性 MapReduce 作业的负载均衡策略[J].计算机科学,2013,40(3):38-40
[11] Morton K,Balazinska M,Grossman D.ParaTimer:a progress indicator for MapReduce DAGs[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM,2010:507-518
[12] Chen Q,Zhang D,Guo M,et al.Samr:A self-adaptive mapreduce scheduling algorithm in heterogeneous environment[C]∥2010 IEEE 10th International Conference on Computer and Information Technology (CIT).IEEE,2010:2736-2743
[13] Shi Y,Meng X,Liu B.Halt or continue:estimating progress of queries in the cloud[C]∥Database Systems for Advanced Applications.Springer Berlin Heidelberg,2012:169-184

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!