计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 65-67.doi: 10.11896/j.issn.1002-137X.2015.11.013

• 2014年全国高性能计算机学术年会 • 上一篇    下一篇

基于树型结构的MapReduce并行模型

唐兵,贺海武   

  1. 湖南科技大学计算机科学与工程学院 湘潭411201,中国科学院计算机网络信息中心 北京100190
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受法国国家科研署科研项目(ANR-10-SEGI-001-01),中科院百人计划(1101002001),湖南省自然科学基金(2015JJ3071),湖南省教育厅一般项目(12C0121)资助

MapReduce Parallel Model Based on Tree Structure

TANG Bing and HE Hai-wu   

  • Online:2018-11-14 Published:2018-11-14

摘要: MapReduce是Google提出的一种分布式计算模型,已在海量数据处理领域得到了广泛的应用。提出一种基于树型结构的新型MapReduce并行模型。该模型适合于利用Internet或Intranet环境下不可靠的桌面PC资源进行海量科学数据分析。该模型以P2P的形式将计算节点进行组织,模型的底层采用了P2P-MPI框架,采用基于消息传递的模式来实现MapReduce应用层。在MapReduce应用层的实现中,在Map阶段采用广播的形式来分发数据块,在Reduce阶段建立反向二叉树来实现有效的结果合并和化简。将提出的MapReduce模型与现有主流MapReduce模型进行了比较,结果表明,基于树型结构的MapReduce并行模型在容错性能方面具有较优的性能,且系统简单,易于应用开发。

关键词: MapReduce,树型结构,二叉树,消息传递接口

Abstract: MapReduce is a distributed computing model introduced by Google,which has been widely used in the field of massive data processing.A novel MapReduce parallel model was presented in this paper.The model is suitable for massive scientific data analysis,using unreliable desktop PC resources in the Internet or Intranet environment.Computing nodes are organized in the form of P2P,and the P2P-MPI framework is utilized in the lower layer,while message pas-sing interface model is utilized to achieve the MapReduce application layer.In the implementation of MapReduce application layer,the way of broadcast is used to distribute data chunks in the Map stage,and an inverse binary tree is constructed to realize effective intermediate results reduction in the Reduce stage.The proposed MapReduce mode was compared with existing popular MapReduce modes.The results show that the proposed tree structure-based MapReduce parallel model has a good performance in terms of fault-tolerance and it is simple and easy for application development.

Key words: MapReduce,Tree structure,Binary tree,Message passing interface(MPI)

[1] Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Anderson D P.BOINC:A System for Public-Resource Computing and Storage[C]∥Proc.of the 5th International Workshop on Grid Computing (GRID 2004).2004:4-10
[3] Cappello F,Djilali S,Fedak G,et al.Computing on Large-scale Distributed Systems:XtremWeb Architecture,Programming Models,Security,Tests and Convergence with Grid[J].Future Generation Computer Systems,2005,21(3):417-437
[4] Litzkow M J,Livny M,Mutka M W.Condor-A Hunter of Idle Workstations[C]∥Proc.of the 8th International Conference on Distributed Computing Systems (ICDCS 1988).1988:104-111
[5] Lin H,Ma X,Feng W.Reliable MapReduce Computing on Opportunistic Resources[J].Cluster Computing,2012,15(2):145-161
[6] Marozzo F,Talia D,Trunfio P.P2P-Mapreduce:parallel dataprocessing in dynamic cloud environments[J].Journal of Computer and System Sciences,2012,78(5):1382-1402
[7] Costa F,Silva J N,Veiga L,et al.Large-scale volunteer computing over the Internet[J].Journal of Internet Services and Applications,2012,3(3):329-346
[8] Tang B,Moca M,Chevalier S,et al.Towards mapreduce fordesktop grid computing[C]∥Proc.of the 5th International Conference on P2P,Parallel,Grid,Cloud and Internet Computing (3PGCIC 2010).2010:193-200
[9] Lu L,Jin H,Shi X,et al.Assessing mapreduce for Internet computing:a comparison of Hadoop and BitDew-MapReduce[C]∥Proc.of the 13th ACM/IEEE International Conference on Grid Computing (GRID 2012).2012:76-84
[10] Genaud S,Rattanapoka C.P2P-MPI:A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids[J].Journal of Grid Computing,2009,5(1):27-42
[11] Genaud S,Rattanapoka C.A Peer-to-Peer Framework for Message Passing Parallel Programs[M]∥Xhafa F,eds.Parallel Programming,Models and Applications in Grid and P2P Systems,Advances in Parallel Computing.IOS Press,2009:118-147
[12] Carpenter B,Getov V,Judd G,et al.MPJ:MPI-like messagepassing for Java[J].Concurrency-Practice and Experience (CONCURRENCY),2000,12(11):1019-1038
[13] Fedak G,He H,Cappello F.BitDew:A Data Management and Distribution Service with Multi-protocol File Transfer and Metadata Abstraction[J].Journal of Network and Computer Applications,2009,32(5):961-975

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!