计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 537-541.

• 软件工程与数据库技术 • 上一篇    下一篇

MapReduce并行编程模型研究综述

杜江,张铮,张杰鑫,邰铭   

  1. 解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受863计划重点项目:新概念高效能计算机体系结构及系统研究开发(2009AA012200),上海市科研计划项目:新概念高效能计算机体系结构及系统研究开发(08dz1501600),上海市科研计划项目:拟态安全原理验证平台研制(13dz1108800)资助

Survey of MapReduce Parallel Programming Model

DU Jiang, ZHANG Zheng, ZHANG Jie-xin and TAI Ming   

  • Online:2018-11-14 Published:2018-11-14

摘要: MapReduce并行编程模型的出现简化了并行编程的复杂度。通过调用方便的接口和运行时支持库,MapReduce并行编程模型可令大规模并行计算任务自动并发地执行而不必关心底层的具体实现细节,从而令MapReduce并行编程模型在大规模中低性能集群中发挥出色的计算能力,且可节约成本。对国内外关于MapReduce并行编程模型的研究现状进行了综述,分析了目前国内外相关研究成果的优缺点,并对MapReduce并行编程模型的未来发展进行了展望。

Abstract: MapReduce parallel programming model simplifies the complexity of parallel programming.Through calling a convenient interface and runtime support libraries,MapReduce parallel programming model makes large scale parallel computing tasks automatically execute concurrently without caring about the underlying implementation details,thus it can exert significant computing power in the large-scale low-performance cluster,which is cost saving as well.This paper reviewed the research of MapReduce parallel programming model at home and abroad,analysed the strengths and weaknesses of current research achievements,and prospected the future trend for the MapReduce.

Key words: MapReduce,Parallel programming model,Parallel computing,Massive data processing

[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Ghemawat S,Gobioff H,Leung S T.The Google file system[J].ACM SIGOPS Operating Systems Review,ACM,2003,37(5):29-43
[3] Verma A,Cho B,Zea N,et al.Breaking the MapReduce stage barrier[J].Cluster computing,2013,16(1):191-206
[4] Yang H,Dasdan A,Hsiao R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥Procee-dings of the 2007 ACM SIGMOD International Conference on Management of Data.ACM,2007:1029-1040
[5] Zhao Y,Wu J.Dache:A data aware caching for big-data applica-tions using the MapReduce framework[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:35-39
[6] Ahmad F,Lee S,Thottethodi M,et al.MapReduce with communication overlap(MaRCO)[J].J.Parallel Distrib.Comput.JPDC,2013,73(5):608-620
[7] Tan J,Meng X,Zhang L.Performance analysis of couplingscheduler for mapreduce/hadoop[C]∥2012 Proceedings IEEE INFOCOM.IEEE,2012:2586-2590
[8] Chen F,Kodialam M,Lakshman T V.Joint scheduling of processing and shuffle phases in mapreduce systems[C]∥2012 Proceedings IEEE INFOCOM.IEEE,2012:1143-1151
[9] Tan J,Meng S,Meng X,et al.Improving ReduceTask data loca-lity for sequential MapReduce jobs[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1627-1635
[10] Wang W,Zhu K,Ying L,et al.Map task scheduling in mapreduce with data locality:Throughput and heavy-traffic optimality[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1609-1617
[11] Tan J,Meng X,Zhang L.Coupling task progress for mapreduce resource-aware scheduling[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1618-1626
[12] Chang H,Kodialam M,Kompella R R,et al.Scheduling in mapreduce-like systems for fast completion time[C]∥2011 Proceedings IEEE INFOCOM.IEEE,2011:3074-3082
[13] Lee R,Luo T,Huai Y,et al.Ysmart:Yet another sql-to-mapreduce translator[C]∥2011 31st International Conference on Distributed Computing Systems(ICDCS).IEEE,2011:25-36
[14] Chung W C,Lin H P,Chen S C,et al.JackHare:a framework for SQL to NoSQL translation using MapReduce[J].Automated Software Engineering,2014,21(4):489-508
[15] Ahmad F,Chakradhar S T,Raghunathan A,et al.Tarazu:optimizing MapReduce on heterogeneous clusters[J].ACM SIGARCH Computer Architecture News,ACM,2012,40(1):61-74
[16] Zhang K,Chen X.Large-scale Deep Belief Nets with MapReduce[J].Aceess,IEEE,2014,2:395-403
[17] Li F,Ooi B C,zsu M T,et al.Distributed data management using MapReduce[J].ACM Computing Surveys(CSUR),2014,46(3):31
[18] Zou Q,Li X B,Jiang W R,et al.Survey of MapReduce frame operation in bioinformatics[J].Briefings in bioinformatics,2014,5(4):637-647
[19] Qian J,Miao D,Zhang Z,et al.Parallel attribute reduction algorithms using MapReduce[J].Information Sciences,2014,279:671-690
[20] Zaharia M,Konwinski A,Joseph A D,et al.Improving MapReduce Performance in Heterogeneous Environments[J].OSDI,2008,8(4):29-42
[21] Teng F,Yang H,Li T,et al.Scheduling real-time workflow on mapreduce-based cloud[C]∥Innovative Computing Technology(INTECH),2013 Third International Conference on.IEEE,2013:117-122
[22] Zaharia M,Borthakur D,Sarma J S,et al.Job scheduling formulti-user mapreduce clusters[R].UCB/EECS-2009-55.EECS Department,University of California,Berkeley, 2009
[23] Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributedstorage system for structured data[C]∥Proceeding of Confe-rence on Usenix Symposium on Operating System Design and Implementation.2006:205-218
[24] 董西成.Hadoop技术内幕 [M].北京:机械工业出版社,2013
[25] 李建江,崔健,王聃,等.MapReduce 并行编程模型研究综述[J].电子学报,2012,39(11):2635-2642
[26] 吴煜祺,曾国荪,曾媛.云计算环境下调度算法的趋势分析 [J].微电子学与计算机,2012,29(9):103-108

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!