Computer Science ›› 2015, Vol. 42 ›› Issue (4): 141-146.doi: 10.11896/j.issn.1002-137X.2015.04.028

Previous Articles     Next Articles

Load Balancing Strategy Based on Pressure Feedback on MapReduce

LI Hang-chen, QIN Xiao-lin and SHEN Yao   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Data skew is one of the factors which seriously affects the performance of MapReduce.Existing solutions for the data skew problem increase the burden that the users need to provide the partition function for the specific application,or write additional sampling processes for the MapReduce.To solve this problem,we presented a load balancing strategy based on pressure statistics.To get the global data distribution,we computed the statistics while preparing data,which makes full use of the shuffle stage in MapReduce.To balance the entire cluster,the strategy schedules the heavy nodes according to the data distribution,without requiring the user to provide additional input.In addition,due to the complexity of the applications,we introduced the pressure feedback mechanism,and further improved the perfor-mance of the scheduling policy.The experimental results show that our strategy is far more efficient than the default strategy.

Key words: MapReduce,Data skew,Load balance,Pressure feedback

[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113
[2] http://hadoop.apache.org
[3] Dhawalia P,Kailasam S,Janakiram D.Chisel:A Resource Savvy Approach for Handling Skew in MapReduce Applications[C]∥2013 IEEE Sixth International Conference on Cloud Computing (CLOUD).IEEE,2013:652-660
[4] DeWitt D J,Naughton J F,Schneider D A,et al.Practical skew handling in parallel joins[C]∥Very Large Data Bases(VLDB).1992:27-40
[5] Poosala V,Ioannidis Y E.Estimation of query-result distribution and its application in parallel-join load balancing[C]∥VLDB.1996:448-459
[6] Shatdal A,Naughton J F.Adaptive parallel aggregation algo-rithms[J].ACM SIGMOD Record,ACM,1995,24(2):104-114
[7] Gates A F,Natkovich O,Chopra S,et al.Building a high-level dataflow system on top of Map-Reduce:the Pig experience[J].Proceedings of the VLDB Endowment,2009,2(2):1414-1425
[8] Kwon Y C,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]∥Proceedings of the 1st ACM symposium on Cloud computing.ACM,2010:75-86
[9] Ibrahim S,Jin H,Lu L,et al.Handling partitioning skew in MapReduce using LEEN[J].Peer-to-Peer Networking and Applications,2013,6(4):409-424
[10] 傅杰,都志辉.一种周期性 MapReduce 作业的负载均衡策略[J].计算机科学,2013,40(3):38-40
[11] Morton K,Balazinska M,Grossman D.ParaTimer:a progress indicator for MapReduce DAGs[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM,2010:507-518
[12] Chen Q,Zhang D,Guo M,et al.Samr:A self-adaptive mapreduce scheduling algorithm in heterogeneous environment[C]∥2010 IEEE 10th International Conference on Computer and Information Technology (CIT).IEEE,2010:2736-2743
[13] Shi Y,Meng X,Liu B.Halt or continue:estimating progress of queries in the cloud[C]∥Database Systems for Advanced Applications.Springer Berlin Heidelberg,2012:169-184

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!