Computer Science ›› 2013, Vol. 40 ›› Issue (6): 152-154.

Previous Articles     Next Articles

Research of Distributed ETL Architecture Based on MapReduce

SONG Jie,HAO Wen-ning,CHEN Gang,JIN Da-wei and ZHAO Shui-ning   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Aiming at deficiency of centralized execution mode of traditional extraction-transformation-loading(ETL) tools,this paper put forward the architecture of distributed ETL based on MapReduce——MDETL(MapReduce Distributed ETL).The ETL architecture which uses a parallel programming model of massive data parallel processing with cluster computing methods of distributed ETL,achieves the cluster distributed ETL processing.It improves the whole ETL system's flexibility and throughput rate,and has better expansibility and load-balancing,raises the performance efficiency.

Key words: ETL,MapReduce,Distributed

[1] 许力,等.并行ETL过程的研究与实现[J].计算机工程与应用,2009,5(13):170-172
[2] 王珊,王会举,等.架构大数据:挑战、现状与展望[J].计算机科学,2011,0:1741-1752
[3] Guo Lei-tao,Sun Hong-wei,et al.A data distribution aware task scheduling strategy for mapreduce system[A]∥First International Conference on Cloud Computing[C].Berlin:Springer,2009:694-699
[4] Chen Quan,Zhang Da-qiang,et al.SAMR:A self-adaptive mapreduce scheduling algorithm in heterogeneous environment [A]∥Proc of IEEE International Conference on Computer and Information Technology[C].Los Alamitos:IEEE Computer society,2010:2736-2743
[5] 李建江,崔健,等.MapReduce并行编程模型研究综述[J].电子学报,2011,1:2635-2642
[6] 陈伟江,郭朝珍.分布式ETL中协同机制的研究与设计[J].通信学报,2006,11:177-182
[7] 徐艳华,郭朝珍.基于MAS的分布式ETL模型[J].郑州大学学报,2007,2:118-121
[8] 夏秀峰,等.一种改进的分布式ETL体系结构[J].计算机应用与软件,2010,4:174-176
[9] 张亮,夏秀峰.分布式ETL负载均衡策略研究[J].计算机与现代化,2011,9:201-204

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!