计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 537-541.

• 软件工程与数据库技术 • 上一篇    下一篇

MapReduce并行编程模型研究综述

杜江,张铮,张杰鑫,邰铭   

  1. 解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001,解放军信息工程大学 郑州450001 数学工程与先进计算国家重点实验室 郑州450001
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受863计划重点项目:新概念高效能计算机体系结构及系统研究开发(2009AA012200),上海市科研计划项目:新概念高效能计算机体系结构及系统研究开发(08dz1501600),上海市科研计划项目:拟态安全原理验证平台研制(13dz1108800)资助

Survey of MapReduce Parallel Programming Model

DU Jiang, ZHANG Zheng, ZHANG Jie-xin and TAI Ming   

  • Online:2018-11-14 Published:2018-11-14

摘要: MapReduce并行编程模型的出现简化了并行编程的复杂度。通过调用方便的接口和运行时支持库,MapReduce并行编程模型可令大规模并行计算任务自动并发地执行而不必关心底层的具体实现细节,从而令MapReduce并行编程模型在大规模中低性能集群中发挥出色的计算能力,且可节约成本。对国内外关于MapReduce并行编程模型的研究现状进行了综述,分析了目前国内外相关研究成果的优缺点,并对MapReduce并行编程模型的未来发展进行了展望。

Abstract: MapReduce parallel programming model simplifies the complexity of parallel programming.Through calling a convenient interface and runtime support libraries,MapReduce parallel programming model makes large scale parallel computing tasks automatically execute concurrently without caring about the underlying implementation details,thus it can exert significant computing power in the large-scale low-performance cluster,which is cost saving as well.This paper reviewed the research of MapReduce parallel programming model at home and abroad,analysed the strengths and weaknesses of current research achievements,and prospected the future trend for the MapReduce.

Key words: MapReduce,Parallel programming model,Parallel computing,Massive data processing

[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Ghemawat S,Gobioff H,Leung S T.The Google file system[J].ACM SIGOPS Operating Systems Review,ACM,2003,37(5):29-43
[3] Verma A,Cho B,Zea N,et al.Breaking the MapReduce stage barrier[J].Cluster computing,2013,16(1):191-206
[4] Yang H,Dasdan A,Hsiao R L,et al.Map-reduce-merge:simplified relational data processing on large clusters[C]∥Procee-dings of the 2007 ACM SIGMOD International Conference on Management of Data.ACM,2007:1029-1040
[5] Zhao Y,Wu J.Dache:A data aware caching for big-data applica-tions using the MapReduce framework[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:35-39
[6] Ahmad F,Lee S,Thottethodi M,et al.MapReduce with communication overlap(MaRCO)[J].J.Parallel Distrib.Comput.JPDC,2013,73(5):608-620
[7] Tan J,Meng X,Zhang L.Performance analysis of couplingscheduler for mapreduce/hadoop[C]∥2012 Proceedings IEEE INFOCOM.IEEE,2012:2586-2590
[8] Chen F,Kodialam M,Lakshman T V.Joint scheduling of processing and shuffle phases in mapreduce systems[C]∥2012 Proceedings IEEE INFOCOM.IEEE,2012:1143-1151
[9] Tan J,Meng S,Meng X,et al.Improving ReduceTask data loca-lity for sequential MapReduce jobs[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1627-1635
[10] Wang W,Zhu K,Ying L,et al.Map task scheduling in mapreduce with data locality:Throughput and heavy-traffic optimality[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1609-1617
[11] Tan J,Meng X,Zhang L.Coupling task progress for mapreduce resource-aware scheduling[C]∥2013 Proceedings IEEE INFOCOM.IEEE,2013:1618-1626
[12] Chang H,Kodialam M,Kompella R R,et al.Scheduling in mapreduce-like systems for fast completion time[C]∥2011 Proceedings IEEE INFOCOM.IEEE,2011:3074-3082
[13] Lee R,Luo T,Huai Y,et al.Ysmart:Yet another sql-to-mapreduce translator[C]∥2011 31st International Conference on Distributed Computing Systems(ICDCS).IEEE,2011:25-36
[14] Chung W C,Lin H P,Chen S C,et al.JackHare:a framework for SQL to NoSQL translation using MapReduce[J].Automated Software Engineering,2014,21(4):489-508
[15] Ahmad F,Chakradhar S T,Raghunathan A,et al.Tarazu:optimizing MapReduce on heterogeneous clusters[J].ACM SIGARCH Computer Architecture News,ACM,2012,40(1):61-74
[16] Zhang K,Chen X.Large-scale Deep Belief Nets with MapReduce[J].Aceess,IEEE,2014,2:395-403
[17] Li F,Ooi B C,zsu M T,et al.Distributed data management using MapReduce[J].ACM Computing Surveys(CSUR),2014,46(3):31
[18] Zou Q,Li X B,Jiang W R,et al.Survey of MapReduce frame operation in bioinformatics[J].Briefings in bioinformatics,2014,5(4):637-647
[19] Qian J,Miao D,Zhang Z,et al.Parallel attribute reduction algorithms using MapReduce[J].Information Sciences,2014,279:671-690
[20] Zaharia M,Konwinski A,Joseph A D,et al.Improving MapReduce Performance in Heterogeneous Environments[J].OSDI,2008,8(4):29-42
[21] Teng F,Yang H,Li T,et al.Scheduling real-time workflow on mapreduce-based cloud[C]∥Innovative Computing Technology(INTECH),2013 Third International Conference on.IEEE,2013:117-122
[22] Zaharia M,Borthakur D,Sarma J S,et al.Job scheduling formulti-user mapreduce clusters[R].UCB/EECS-2009-55.EECS Department,University of California,Berkeley, 2009
[23] Chang F,Dean J,Ghemawat S,et al.Bigtable:A distributedstorage system for structured data[C]∥Proceeding of Confe-rence on Usenix Symposium on Operating System Design and Implementation.2006:205-218
[24] 董西成.Hadoop技术内幕 [M].北京:机械工业出版社,2013
[25] 李建江,崔健,王聃,等.MapReduce 并行编程模型研究综述[J].电子学报,2012,39(11):2635-2642
[26] 吴煜祺,曾国荪,曾媛.云计算环境下调度算法的趋势分析 [J].微电子学与计算机,2012,29(9):103-108

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .