Computer Science ›› 2014, Vol. 41 ›› Issue (12): 155-159.doi: 10.11896/j.issn.1002-137X.2014.12.033

Previous Articles     Next Articles

Research on Optimization of Sorting Algorithm under MapReduce

JIN Jing   

  • Online:2018-11-14 Published:2018-11-14

Abstract: This specification is set for the theses to be published in computer applications and software,including fonts,margins,page size and print area.MapReduce is the standard parallel computing model on big data analysis.Ideally,a MapReduce system should make all nodes involved in computation highly load balancing,and minimize space usage,CPU,disk and network overhead.In fact,these principles are the MapReduce algorithm design guidelines.However,few studies on optimization achieve all these metrics simultaneously.To solve this problem,this paper firstly presented the criterions of optimal MapReduce algorithms,which should keep excellent parallelism and optimize all metrics simultaneously.In this paper,we studied sorting algorithm,which is the most important algorithm on data processing,and proposed an optimal sorting algorithm under MapReduce paradigm.Then we illustrated the effectiveness and efficiency by the theory and experiments.

Key words: MapReduce,Optimization,Big data analysis,Sorting algorithm

[1] Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113
[2] Kwon Y,Balazinska M,Howe B,et al.Skewtune:mitigatingskew in mapreduce applications[C]∥SIGMOD.2012:25-36
[3] Hodoop Wiki.Apache Hadoop.http://hadoop.apache.org
[4] Abouzeid A,Bajda-Pawlikowski K,Abadi D,et al.HadoopDB:an architectural hybrid of MapReduce and DBMS technologies for analytical workloads[J].Proceedings of the VLDB Endowment,2009,2(1):922-933
[5] Abouzied A,Bajda-Pawlikowski K,Huang J,et al.HadoopDB in action:building real world applications[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management Df data.ACM,2010:1111-1114
[6] Dittrich J,Quiané-Ruiz J A,Jindal A,et al.Hadoop++:Making a yellow elephant run like a cheetah (without it even noticing)[J].Proceedings of the VLDB Endowment,2010,3(1/2):515-529
[7] Elghandour I,Aboulnaga A.ReStore:reusing results of MapRe-duce jobs[J].Proceedings of the VLDB Endowment,2012,5(6):586-597
[8] Nykiel T,Potamias M,Mishra C,et al.MRShare:sharing across multiple queries in MapReduce[J].Proceedings of the VLDB Endowment,2010,3(1/2):494-505
[9] Eltabakh M Y,Tian Y,Ozcan F,et al.CoHadoop:flexible data placement and its exploitation in Hadoop[J].Proceedings of the VLDB Endowment,2011,4(9):575-585
[10] He Y,Lee R,Huai Y,et al.RCFile:A fast and space-efficient data placement structure in MapReduce-based warehouse systems[C]∥2011 IEEE 27th International Conference on Data Engineering (ICDE).IEEE,2011:1199-1208
[11] Floratou A,Patel J M,Shekita E J,et al.Column-oriented storage techniques for MapReduce[J].Proceedings of the VLDB Endowment,2011,4(7):419-429
[12] Shinnar A,Cunningham D,Saraswat V,et al.M3R:increasedperformance for in-memory Hadoop jobs[J].Proceedings of the VLDB Endowment,2012,5(12):1736-1747
[13] Gufler B,Augsten N,Reiser A,et al.Load balancing in mapreduce based on scalable cardinality estimates[C]∥2012 IEEE 28th International Conference on Data Engineering (ICDE).IEEE,2012:522-533
[14] Kolb L,Thor A,Rahm E.Load balancing for mapreduce-basedentity resolution[C]∥2012 IEEE 28th International Conference on Data Engineering (ICDE).IEEE,2012:618-629
[15] Blanas S,Patel J M,Ercegovac V,et al.A comparison of join algorithms for log processing in mapreduce[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.ACM,2010:975-986
[16] Afrati F N,Ullman J D.Optimizing multiway joins in a map reduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1282-1298
[17] Lin Y,Agrawal D,Chen C,et al.Llama:leveraging columnarstorage for scalable join processing in the MapReduce framework[C]∥Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data.ACM,2011:961-972
[18] Vernica R,Carey M J,Li C.Efficient parallel set-similarity joins using MapReduce[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM,2010:495-506
[19] Metwally A,Faloutsos C.V-smart-join:a scalable mapreduceframework for all-pair similarity joins of multisets and vectors[J].Proceedings of the VLDB Endowment,2012,5(8):704-715
[20] Zhang X,Chen L,Wang M.Efficient multi-way theta-join processing using MapReduce[J].Proceedings of the VLDB Endowment,2012,5(11):1184-1195
[21] Bahmani B,Chakrabarti K,Xin D.Fast personalized pagerank on mapreduce[C]∥Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data.ACM,2011:973-984
[22] Lattanzi S,Moseley B,Suri S,et al.Filtering:a method for solving graph problems in mapreduce[C]∥Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures.ACM,2011:85-94
[23] Lu W,Shen Y,Chen S,et al.Efficient processing of k nearest neighbor joins using mapreduce[J].Proceedings of the VLDB Endowment,2012,5(10):1016-1027
[24] 鲁伟明,杜晨阳,魏宝刚,等.基于MapReduce的分布式近邻传播聚类算法[J].计算机研究与发展,2012,49(8):1762-1772
[25] 亓开元,韩燕波,赵卓峰,等.支持高并发数据流处理的MapReduce中间结果缓存[J].计算机研究与发展,2013,50(1):111-121
[26] 刘义,静宁,陈荦,等.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,4(8):1836-1851
[27] 和亮,冯登国,王蕊,等.基于MapReduce的大规模在线社交网络蠕虫仿真[J].软件学报,2013,4(7):1666-1682
[28] 程兴国,肖南峰.粗粒度并行遗传算法的MapReduce并行化实现[J].重庆理工大学学报:自然科学版,2013,27(10):66-70

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!