计算机科学 ›› 2018, Vol. 45 ›› Issue (3): 158-164.doi: 10.11896/j.issn.1002-137X.2018.03.025
王华进,黎建辉,沈志宏,周园春
WANG Hua-jin, LI Jian-hui, SHEN Zhi-hong and ZHOU Yuan-chun
摘要: 负载不均衡问题位列影响大规模MapReduce集群性能因素的首位,而Hive join查询非常容易触发该问题。通用解决方案是基于中间键值对的key频率分布设计能够实现负载均衡的key划分算法。现有工作估算key频率分布时依赖于对map的输出进行监控采样,使得通信开销较大并显著延后了shuffle的启动。针对Hive join查询,提出了基于ORC元数据的key频率分布估计方法和相应的负载均衡key划分方法。该方法具有计算量小、通信开销小、不影响现有shuffle机制的优点。通过基准测试证明了该方法在key频率分布估算效率上的巨大提升及相应的key划分方法对Hive join查询性能的提升。
[1] KWON Y C,BALAZINSKA M,HOWE B,et al.Skew-resistantparallel processing of feature-extracting scientific user-defined functions[C]∥ACM Symposium on Cloud Computing (SoCC).ACM,2010:75-86. [2] KE Q,PRABHAKARAN V,XIE Y,et al.Optimizing Data Par-titioning for Data-Parallel Computing[C]∥Proceedings of the 13th USENIX conference on Hot topics in operating systems (HotOS).USENIX,2011:13. [3] YAN W,XUE Y,MALIN B.Scalable and robust key group size estimation for reducer load balancing in MapReduce[C]∥IEEE International Conference on Big Data.IEEE,2013:156-162. [4] IBRAHIM S,JIN H,LU L,et al.LEEN:Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud[C]∥IEEE Second International Conference on Cloud Computing Technology and Science.IEEE,2010:17-24. [5] GUFLER B,AUGSTEN N,REISER A,et al.Load Balancing in MapReduce Based on Scalable Cardinality Estimates[C]∥International Conference on Data Engineerin (ICDE).IEEE Compu-ter Society,2012:522-533. [6] CHEN Q,YAO J,XIAO Z H.LIBRA:Lightweight Data SkewMitigation in MapReduce[J].IEEE Transactions on Parallel & Distributed Systems,2015,26(9):2520-2533. [7] WANG Z,CHEN Q,LI Z H,et al, An Incremental Partitioning Strategy for Data Balance on MapReduce[J].Chinese Journal of Computers,2016,39(1):19-35.(in Chinese) 王卓,陈群,李战怀,等.基于增量式分区策略的 MapReduce 数据均衡方法[J].计算机学报,2016,39(1):19-35. [8] KWON Y,BALAZINSKA M,HOWE B,et al.SkewTune:mitigating skew in mapreduce applications[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2012:25-36. |
No related articles found! |
|