Computer Science ›› 2013, Vol. 40 ›› Issue (Z11): 263-266.

Previous Articles     Next Articles

Research of Distributed ETL Dimensional Data Model Based on MapReduce

SONG Jie,HAO Wen-ning,CHEN Gang,JIN Da-wei and ZHAO Cheng   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Because MapReduce lacks support for high-level ETL specific constructs,this paper presented a parallel dimensional ETL framework based on MapReduce (MapReduce Distributed ETL--MDETL),which exhibits the data processing to the composable property(the processing of dimensions and facts),directly supports high-level ETL-specific dimensional constructs.This paper evaluated its performance on large realistic data sets.The experimental results show that MDETL achieves very good scalability.

Key words: ETL,MapReduce,MDETL,Dimensions,Facts

[1] 徐俊,刚裴莹.数据ETL研究综述[J].计算机科学,2011,38(4)
[2] Dean J,SGhemawat J.MapReduce:Simplified Data Processingon Large Clusters[C]∥Proc.of OSDI.2004:137-150
[3] Kovoor G,Singer J,Lujan M.Building a Java MapReduceFramework for Multi-core Architectures[C]∥Proc.of MULTIPROG.2010
[4] 王珊,王会举,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,0:1741-1752
[5] 李建江,崔健,等.MapReduce并行编程模型研究综述[J].电子学报,2011,1:2635-2642
[6] Dean J,Ghemawat S.MapReduce:A Flexible Data ProcessingTool[J].CACM,2010,53(1):72-77

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!