计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 141-146.doi: 10.11896/j.issn.1002-137X.2015.04.028
李航晨,秦小麟,沈 尧
LI Hang-chen, QIN Xiao-lin and SHEN Yao
摘要: 数据倾斜是严重影响MapReduce性能的因素之一。数据倾斜问题的现有解决方法需要用户对应用类型提供针对的分区函数,或是为MapReduce编写额外的采样过程,增加了用户的负担。为解决上述问题,提出了一种基于压力统计的负载均衡策略。该策略充分利用MapReduce中的混洗阶段,在reducer准备数据的同时进行统计,以获取全局数据分布。系统根据数据分布情况对负载较重节点进行调度,平衡整个集群负载,而无需用户提供额外的输入。此外,考虑到上层不同的应用类型,引入了压力反馈机制来进一步提高调度策略的性能。实验结果表明,提出的负载均衡调度策略的性能优于默认策略性能。
[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113 [2] http://hadoop.apache.org [3] Dhawalia P,Kailasam S,Janakiram D.Chisel:A Resource Savvy Approach for Handling Skew in MapReduce Applications[C]∥2013 IEEE Sixth International Conference on Cloud Computing (CLOUD).IEEE,2013:652-660 [4] DeWitt D J,Naughton J F,Schneider D A,et al.Practical skew handling in parallel joins[C]∥Very Large Data Bases(VLDB).1992:27-40 [5] Poosala V,Ioannidis Y E.Estimation of query-result distribution and its application in parallel-join load balancing[C]∥VLDB.1996:448-459 [6] Shatdal A,Naughton J F.Adaptive parallel aggregation algo-rithms[J].ACM SIGMOD Record,ACM,1995,24(2):104-114 [7] Gates A F,Natkovich O,Chopra S,et al.Building a high-level dataflow system on top of Map-Reduce:the Pig experience[J].Proceedings of the VLDB Endowment,2009,2(2):1414-1425 [8] Kwon Y C,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]∥Proceedings of the 1st ACM symposium on Cloud computing.ACM,2010:75-86 [9] Ibrahim S,Jin H,Lu L,et al.Handling partitioning skew in MapReduce using LEEN[J].Peer-to-Peer Networking and Applications,2013,6(4):409-424 [10] 傅杰,都志辉.一种周期性 MapReduce 作业的负载均衡策略[J].计算机科学,2013,40(3):38-40 [11] Morton K,Balazinska M,Grossman D.ParaTimer:a progress indicator for MapReduce DAGs[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM,2010:507-518 [12] Chen Q,Zhang D,Guo M,et al.Samr:A self-adaptive mapreduce scheduling algorithm in heterogeneous environment[C]∥2010 IEEE 10th International Conference on Computer and Information Technology (CIT).IEEE,2010:2736-2743 [13] Shi Y,Meng X,Liu B.Halt or continue:estimating progress of queries in the cloud[C]∥Database Systems for Advanced Applications.Springer Berlin Heidelberg,2012:169-184 |
No related articles found! |
|