Computer Science ›› 2015, Vol. 42 ›› Issue (10): 50-56.

Previous Articles     Next Articles

Load Balancing Strategy on MapReduce with Locality-aware

LI Hang-chen, QIN Xiao-lin and SHEN Yao   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Intermediate data distribution characteristics and network traffic overhead are not considered in any existing research on load balancing strategy on MapReduce,resulting in additional network traffic overhead and decrease of system efficiency.To solve this problem ,this paper presented a locality-aware load balancing strategy.By taking advantage of the new features of resource management brought by YARN,the strategy can obtain the data distribution when the buffered data are written to local disk.The strategy schedules the reduce tasks according to the data distribution along with the processing speed of each node to decrease network overhead while maximizing load balancing of each node.In addition,to further improve the performance of scheduling strategy with data skew,this paper introduced the strategy of fine-grained partitioning and self-adaption fragmentation.The comparative experimental results show that the presented strategy can improve the performance effectively,and reduce the total network traffic overhead.

Key words: MapReduce,Data locality,Data skew,Load balance

[1] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Apache Hadoop [EB/OL].http://hadoop.apache.org,2014
[3] Vavilapalli V K,Murthy A C,Douglas C,et al.Apache hadoop yarn:Yet another resource negotiator[C]∥Proceedings of the 4th annual Symposium on Cloud Computing.ACM,2013
[4] Ibrahim S,Jin H,Lu L,et al.Handling partitioning skew in Map-Reduce using LEEN[J].Peer-to-Peer Networking and Applications,2013,6(4):409-424
[5] Guo L,Sun H,Luo Z.A data distribution aware task scheduling strategy for mapreduce system[M]∥Cloud Computing.Springer Berlin Heidelberg,2009:694-699
[6] Polo J,Carrera D,Becerra Y,et al.Performance-driven task co-scheduling for mapreduce environments[C]∥Network Operations and Management Sympo-sium (NOMS),2010 IEEE.IEEE,2010:373-380
[7] 唐一韬,黄晶,肖球.一种基于 DAG 的 MapReduce 任务调度算法[J].计算机科学,2014,1(6A):42-46,1 Tang Yi-tao,Huang Jing,Xiao Qiu.Task Scheduling Algorithm for MapReduce Based on DAG[J].Computer Science,2014,1(6A):42-46,1
[8] Dhawalia P,Kailasam S,Janakiram D.Chisel:A Resource Savvy Approach for Handling Skew in MapReduce Applications[C]∥2013 IEEE Sixth International Conference on Cloud Computing (CLOUD).IEEE,2013:652-660
[9] Dewitt D J,Naughton J F,Schneider D A,et al.Practical skew handling in parallel joins[C]∥Proceedings of the 18th International Conference on Very Large Data Bases.1992:27-40
[10] Poosala V,Ioannidis Y E.Estimation of query-result distribution and its application in parallel-join load balancing[C]∥VLDB.1996:448-459
[11] Shatdal A,Naughton J F.Adaptive parallel aggregation algo-rithms[J].ACM SIGMOD Record.ACM,1995,24(2):104-114
[12] Gates A F,Natkovich O,Chopra S,et al.Building a high-leveldataflow system on top of Map-Reduce:the Pig experience[J].Proceedings of the VLDB Endowment,2009,2(2):1414-1425
[13] Kwon Y C,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]∥Proceedings of the 1st ACM Symposium on Cloud Computing.ACM,2010:75-86
[14] Morton K,Balazinska M,Grossman D.ParaTimer:a progressindicator for MapReduce DAGs[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM,2010:507-518
[15] Shi Y,Meng X,Liu B.Halt or continue:estimating progress of queries in the cloud[M]∥Database Systems for Advanced Applications.Springer Berlin Heidelberg,2012:169-184
[16] Hassan M,Bamha M,Loulergue F.Handling Data-skew Effects in Join Operations Using MapReduce[J].Procedia Computer Science,2014,29:145-158
[17] Zacheilas N,Kalogeraki V.Real-Time Scheduling of SkewedMapReduce Jobs in Heterogeneous Environments[C]∥International Conference on Autonomic Computing.2014:145-158
[18] Seo S,Jang I,Woo K,et al.HPMR:Prefetching and pre-shuffling in shared MapReduce computation environment[C]∥IEEE International Conference on Cluster Computing and Workshops,2009(CLUSTER’09).IEEE,2009:2736-2743

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!