计算机科学 ›› 2013, Vol. 40 ›› Issue (Z11): 263-266.

• 数据存储与挖掘 • 上一篇    下一篇

基于MapReduce的分布式ETL多维数据模型研究

宋杰,郝文宁,陈刚,靳大尉,赵成   

  1. 解放军理工大学 南京210007;解放军理工大学 南京210007;解放军理工大学 南京210007;解放军理工大学 南京210007;解放军理工大学 南京210007
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金资助

Research of Distributed ETL Dimensional Data Model Based on MapReduce

SONG Jie,HAO Wen-ning,CHEN Gang,JIN Da-wei and ZHAO Cheng   

  • Online:2018-11-16 Published:2018-11-16

摘要: 针对MapReduce缺少对ETL上层数据模型的具体描述,提出了一种集成的基于MapReduce的分布式ETL(MapReduce Distributed ETL,简称MDETL)多维数据模型处理方法其,把对数据的处理分解成对数据属性(维和事实)的处理,解决了ETL上层具体数据模型的构建问题。用真实的数据集评估了它的性能,实验结果表明MDETL具有很好的可扩展性。

关键词: ETL,MapReduce,MDETL,维,事实

Abstract: Because MapReduce lacks support for high-level ETL specific constructs,this paper presented a parallel dimensional ETL framework based on MapReduce (MapReduce Distributed ETL--MDETL),which exhibits the data processing to the composable property(the processing of dimensions and facts),directly supports high-level ETL-specific dimensional constructs.This paper evaluated its performance on large realistic data sets.The experimental results show that MDETL achieves very good scalability.

Key words: ETL,MapReduce,MDETL,Dimensions,Facts

[1] 徐俊,刚裴莹.数据ETL研究综述[J].计算机科学,2011,38(4)
[2] Dean J,SGhemawat J.MapReduce:Simplified Data Processingon Large Clusters[C]∥Proc.of OSDI.2004:137-150
[3] Kovoor G,Singer J,Lujan M.Building a Java MapReduceFramework for Multi-core Architectures[C]∥Proc.of MULTIPROG.2010
[4] 王珊,王会举,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,0:1741-1752
[5] 李建江,崔健,等.MapReduce并行编程模型研究综述[J].电子学报,2011,1:2635-2642
[6] Dean J,Ghemawat S.MapReduce:A Flexible Data ProcessingTool[J].CACM,2010,53(1):72-77

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!