Computer Science ›› 2016, Vol. 43 ›› Issue (11): 6-11.doi: 10.11896/j.issn.1002-137X.2016.11.002

Previous Articles     Next Articles

Review of Research and Application on Hadoop in Cloud Computing

XIA Jing-bo, WEI Ze-kun, FU Kai and CHEN Zhen   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Hadoop is one of the most popular technologies in the area of cloud computing and big data nowadays,the combination of its relevant software ecosystem with Spark technology influences the academic development and business model.This paper firstly introduced the origin and advantages of Hadoop,and clarified the relevant technical principles,such as MapReduce,HDFS,YARN,Spark and so on.Then we focused on the analysis of the current Hadoop academic research achievements,and summarized four aspects:the improvement and innovation of the MapReduce algorithm,optimization and innovation of technology of HDFS,secondary development and other combination,innovation and practice of application field.And then the developing situation of domestic and foreign application was described.Hadoop with the Spark is the trend of the future.This paper finally discussed the development direction of the future research and some crucial problems which should be solved pressingly.

Key words: Cloud computing,Big data,Hadoop,Spark,MapReduce

[1] Big data:science in the petabyte era .[2015-9-10].
[2] Mayer-Schnberger V,Cukier K.BIG DATA [M].Hodder Export,2013
[3] Wang L Z,Laszewski G,Younge A,et al.Cloud computing:a perspective[J].ACM SIGCOMM Computer Communication Review,2009,9(1):50-55
[4] Ghemawat S,Gobioff H,Leung S T.The google file system[C]∥Proc of the 19th ACM Symposium on Operating System Principles.2003:29-43
[5] Dean J,Ghemawat S.MapRecuce:Simplified data processing on large clusters[C]∥Proc of the 6th Symposium on Operating System Design and Implementation.2004:137-150
[6] Chang F,Dean J,Ghemawat S,et al.A distributed storage system for structured data[C]∥Proc of the 7th USENIX Symp.on Operating Systems Design and Implementation.2006:205-218
[7] Apache.Hadoop.[2015-9-10].
[8] White T.Hadoop:The Definitive Guide[M].周敏奇,王晓玲,金澈清,钱卫宁,译.Hadoop:权威指南.清华大学出版社,2014
[9] Srinath Perera Thilina Gunarathne.Hadoop MapRecuce Cook-book[M].北京:人民邮电出版社,2015
[10] 刘军.Hadoop大数据处理[M].北京:人民邮电出版社.2013
[11] Apache.HDFS Architecture Guide.[2015-9-10].
[12] Lam C.Hadoop in action[M].USA:Manning Publications,2012
[13] Yan C R,Li T,Huang Y F,et al.Hmfs:efficient support ofsmall files processing over HDFS[J].Algorithms Archit Parallel Process Lect Notes Comput Sci.,2014,6(31):54-67
[14] Liu X,Yu Q,Liao J.FastDFS:a high performance distributed file system.[J].ICIC Express Lett Part B Appl Int J Res Surv,2014,5(6):1741-1746
[15] Dong Xin-hua,Li Rui-xuan,Zhou Wan-wan.Performance Optimization and Feature Enhancements of Hadoop System[J].Journal of Computer Research and Development,2013,0(S2):1-15(in Chinese) 董新华,李瑞轩,周湾湾,等.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,0(S2):1-15
[16] 王晓华.MapReduce2.0源码分析与编程实战[M].北京:人民邮电出版社,2014
[17] Apache.Hadoop NextGen MapReduce(YARN) .[2015-9-10]. /docs/current2/ hadoop-yarn/ hadoop-yarn-site/ YARN.html
[18] Apache.Spark.[2015-9-20].
[19] 高彦杰.Spark大数据处理:技术、应用与性能优化[M].北京:机械工业出版社,2014
[20] Yang H,Dasdan A,Hsiao R L,et al.Map-reduce-merge:Simplified relational data processing on large clusters[C]∥Proc of the 2007 ACM SIUMOD Int Conf on Management of Data(SIUMOD’07).New York:ACM,2007:1029-1040
[21] Verma A,Zea N,Cho B,et al.Breaking the MapReduce stagebarrier [C]∥Proc of 2010 IEEE Int Conf on Cluster Computing(CLUSTER’10).Piscataway,NJ:IEEE,2010:235-244
[22] Verma A,Cherkasova L,Campbell R H.Two sides of a coin:Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance[C]∥Proc of the 20th IEEE Int Symp on Modeling.Analysis & Simulation of Computer and Telecommunication Systems(MASCOTS’12).Piscataway,NJ:IEEE,2012:11-18
[23] Valvag S V,Johansen D.Ovios:simple and efficient distributed data processing[C]∥Pro of the 10th IEEE Int Conf on High Performance Computing and Communications(HPCC’08).Piscataway,NJ:IEEE,2008:113-122
[24] Bu Y,Howe B,Balazinska M,et al.HaLoop:Efficient iterativedata processing on large clusters[J].Proc of the VLDB Endowment,2010,3(1/2):285-296
[25] Bu Y,Howe B,Balazinska M,et al.The HaLoop approach tolarge-scale iterative data analysis[J].VLDB Journal,2012,21(2):169-190
[26] Nandakumar V.Transparent in-memory cache for Hadoop-MapReduce[D].Master of Applied Science Graduate Department of Electrical and Computer,2014
[27] Chen Q,Zhang D,Cuo M,et al.Samr:A self-adaptive MapReduce scheduling algorithm in heterogeneous environment[C]∥Proc of the 10th IEEE Int Conf on Computer and Information Technology(CIT’IO).Piscataway,NJ:IEEE.2010:2736-2743
[28] Kwon Y C,Balazinska M,Howe B,et al.Skewtune Mitigatingskew in mapreduce applications[C]∥Proc of the 2012 ACM SIUMOD Int Conf on Management of Data(SIUMOD’12).New York:ACM,2012:25-36
[29] Li D,Chen Y,Hai R H.Skew-aware task scheduling in clouds[C]∥Proc of the 6th IEEE Int Symp on Service Oriented System Engineering(SOSE’13).Piscataway,NJ:IEEE,2013:341-346
[30] Ahmad F,Chakradhar S T,Raghunathan A,et al.Tarazu:Optimizing MapReduce on heterogeneous clusters[C]∥Proc of the 17th Int Conf on Architectural Support for Programming Languages and Operating Systems(ASPLOS’12).NewYork:ACM,2012:61-74
[31] Cherniak A,Zaidi H,Zadorozhny V.Optimization strategies for A/B testing on HADOOP[J].Proceedings of the VLDB Endowment,2013,6(11):973-984
[32] Hu Jun,Hu Xian-De,Chen Jia-xing.Big Data Hybrid Compu-ting Mode Based on Spark[J].Computer Systems & Applications,2015(24):214-218(in Chinese) 胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015(24):214-218
[33] Fu Song-ling,Liao Xiang-ke,Huang Chen-lin,et al.FlatLFS:a lightweight file system for optimizing the performance of acces-sing massive small files[J].Journal of National University of Defense Technology,2013,5(2):120-126(in Chinese) 付松龄,廖湘科,黄辰林,等.FlatLFS:一种而向海量小文件处理优化的轻量级文件系统[J].国防科技大学学报,2013,35(2):120-126
[34] Zhang Chun-ming,Rui Jian-wu,He Ting-ting.An approach for storing and accessing small files on Hadoop[J].Computer Applications and Software,2012(11):95-100(in Chinese) 张春明,芮建武,何婷婷.一种Hadoop小文件存储和读取的方法[J].计算机应用与软件,2012,(11):95-100
[35] Xiong An-ping,Huang Rong,Zou Yang.A kind of HDFS small files storage strategy based on hybrid index[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2014,7(1):97-102(in Chinese) 熊安萍,黄容,邹样.一种基于混合索引的HDFS小文件存储策略[J].重庆邮电大学学报(自然科学版),2014,27(1):97-102
[36] Zhu Yuan-yuan,Wang Xiao-jing.HDFS optimization programbased on GE coding[J].Journal of Computer Applications,2013,3(3):730-733(in Chinese) 朱媛媛,王晓京.基于GE码的HDFS优化方案[J].计算机应用,2013,3(3):73 0-733
[37] Song Bao-yan,Wang Jun-Lu,Wang Yan.Optimized Storage St-rategy Research of HDFS Based on Vandermonde Code[J].Chinese Journal of Computers,2015,9(38):1826-1836(in Chinese) 宋宝燕,王俊陆,王妍.基于范德蒙码的HDFA优化存储策略研究[J].计算机学报,2015,9(38):1826-1836
[38] Lu Mei-lian,Zhu Liang-liang.Load Balancing Strategy Based on CMM Model in HDFS[J].Journal of Beijing University of Posts and Telecommunications,2014,0(37):20-25(in Chinese) 卢美莲,朱亮亮.基于CMM模型的HDFS负载均衡策略[J].北京邮电大学学报,2014,10(37):20-25
[39] Xi Ping,Xue Feng.Replica Placement Strategy Based on Multi-layer Consistent Hashing in HDFS[J].Computer Systems & Applications,2015,24(2):127-133(in Chinese) 席屏,薛峰.多层一致性哈希的HDFS副本放置策略[J].计算机系统应用,2015,24(2):127-133
[40] Kun L,Dai Dong,Sun Ming-ming.HDFS+:Concurrent Writes Improvements for HDFS[C]∥Proc of IEEE International Conference on Big Data.2013:182-183
[41] Islam N S,Lu X,Wasi-ur-Rahman M,et al.In-Memory I/O and Replication for HDFS with Memcached:Early Experiences[C]∥Proc of IEEE International Conference on Big Data.2014:213-218
[42] Ekanayake J,Li H,Zhang B,et al.Twister:A runtime for iterative mapreduce [C]∥Proceedings of the 19th ACM Internatio-nal Symposium on High Performance Distributed Computing.ACM.2010:810-818
[43] Abouzeid A,Bajda-Pawlikowski K,Abadi D J,et al.HadoopDB:An architectural hybrid of MapReduce and DBMS technologies for analytical workloads [J].PVLDB,2009,2(1):922-933
[44] Abouzeid A,Bajda-Pawlikowski K,Adadi D J,et al.HadoopDB in action:Building real world applications[C]∥Elamagarmid AK,Agrawal D,eds.Proc of th SIGMOD.Indiana:ACM Press,2010:1111-1114
[45] Qin X,Wang H,Li F,et al.Beyond Simple Integration of RDBMS and MapReduce—Paving the Way toward a Unified System for Big Data Analytics:Vision and Progress[C]∥2012 Se-cond International Conference on Cloud and Green Computing(CGC).IEEE,2012:716-725
[46] Talbot J,Yoo R M,Kozyrakis C.Phoenix:modular MapReduce for shared-memory systems[C]∥Proceedings of the Second International Workshop on MapReduce and its Applications.ACM.2011:9-16
[47] Xu T,Wang D S,Liu G D.Banian:A Cross-Platform Interactive Query System for Structure Big Data[J].Tsinghua Science and Technology,2015,7(11):62-71
[48] Anderson Q.Storm real-time processing cookbook[M].Bir-mingham:Packt Publishing,2015
[49] Jin Yong-chao,Wu Huai-gu.Research on the Big Data Process Framework Based on Storm and Hadoop[J].Modern Computer.2015(3):1419-1423(in Chinese) 靳永超,吴怀谷.基于Storm和Hadoop的大数据处理架构的研究[J].现代计算机,2015(3):1419-1423
[50] Lee R,Luo T,Huai Y,et al.Ysmart:Yet Another Sql-to-mapreduce Translator[C]∥2011 International Conference on Distri-buted Computing Systems(ICDCS).IEEE,2011:25-36
[51] Zhao Huan,Chen Xi.Chinese Tourism Information Search Platform based on Cloud Computing [C]∥International Industrial Informatics and Computer Engineering Conference(IIICEC 2015).Beijing,2015:1236-1240
[52] Chen Hong.Research on Chinese segmentation algorithm based on Hadoop cloud platform[C]∥Information Technology and Mechatronics Engineering Conference(ITOEC 2015).2015:134-138
[53] Hadoop各商业发行版之比较.[2016-1-10].
[54] 翟周伟.Hadoop核心技术[M].北京:机械工业出版社,2015:4
[55] Hadoop目前在国内外的现状介绍.[2015-9-10].
[56] FusionInsight大数据平台.[2016-1-10]. /fusioninsight

No related articles found!
Full text



No Suggested Reading articles found!