Computer Science ›› 2017, Vol. 44 ›› Issue (8): 46-53.doi: 10.11896/j.issn.1002-137X.2017.08.009

Previous Articles     Next Articles

Efficiency Optimization Method for MapReduce Similarity Computing Based on Spark

LIAO Bin, ZHANG Tao, YU Jiong, GUO Bing-lei and LIU Yan   

  • Online:2018-11-13 Published:2018-11-13

Abstract: With the exponential growth of both internet users and contents,the similarity computation of big data needs more efficiency.In order to improve the performance of the algorithm,the implementation of the algorithm was analyzed,as the characteristics of the Spark is suitable for the iterative and interactive tasks.The algorithm based on the 2D partition algorithm was transplanted from the MapReduce to the Spark.And through the parameter adjustment,memory optimization etc.we improved the efficiency of the algorithm.The experimental results with 2 data sets on 3 different sizes of clusters indicated that compared Spark with MapReduce,the algorithm implementation efficiency of Spark platform is 4.715 times higher than MapReduce,and energy consumption is only 24.86% of the average energy consumption of Hadoop,which is about 4 times higher than Hadoop.

Key words: Similarity computing,MapReduce,Spark optimization,Energy optimization

[1] DEAN J,GHEMAWAT S.MapReduce:Simplified Data Pro-essing on Large Clusters[J].Communication of the ACM,2008,51(1):107-113.
[2] DEAN J,GHEMAWAT S.MapReduce:a Flexible Data Pro-cessing Tool[J].Communication of the ACM,2010,53(1):72-77.
[3] WANG P,MENG D,ZHAN J F,et al.Review of Programming models for data-Intensive computing[J].Journal of Computer Research and Development,2010,47(11):1993-2002.(in Chinese) 王鹏,孟丹,詹剑锋,等.数据密集型计算编程模型研究进展[J].计算机研究与发展,2010,47(11):1993-2002.
[4] CHEN S,SCHLOSSER S.Map-Reduce Meets Wider Varieties of Applications :Technical Report IRPTR-08-05[R].Intel Research Pittsburgh,2008.
[5] WHITE B,YEH T,LIN J,et al.Web-Scale Computer Vision using MapReduce for Multimedia Data Mining[C]∥Proceedings of the International Workshop on Multimedia Data Mining.IEEE,2010:1-10.
[6] X-RIME.(2015-05-15).http://xrime.sourceforge.net.
[7] SHI J,XUE W,WANG W,et al.Scalable Community Detection in Massive Social Networks using MapReduce[J].IBM Journal of Research and Development,2013,57(3/4):1-14.
[8] MATSUNAGE A,TSUGAWA M,FORTES J.CloudBLAST:Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications[C]∥Proceedings of the IEEE International Conference on e-Science.IEEE,2008:222-229.
[9] WILEY K,CONNOLLY A,GARDNER J P,et al.Astronomy inthe Cloud:Using Mapreduce for Image Co-addition[J].Astronomy,2011,123(901):366-380.
[10] LU W,HUANG J,HONG L.Massive Data MapReduce Fingerprint Discriminant Algorithm Based on Hadoop[J].Applied Mechanics and Materials,2013,263(1):2655-2660.
[11] TIMES N Y.Power,Pollution and the Internet [EB/OL].[2016-05-02].http://www.nytimes.com/2012/09/23/techno-logy/data-ceneters-waste-vast-amounts-of-energy-belying-industry-image.html.
[12] BARROSO L A,HLZLE U.The datacenter as a computer:An introduction to the design of warehouse-scale machines [R].Morgan:Synthesis Lectures on Computer Architecture,Morgan &Claypool Publishers,2009.
[13] LIAO B,ZHANG T,YU J,et al.Energy Consumption Modeling and Optimization Analysis for MapReduce[J].Journal of Computer Research and Development,2016,53(9):2107-2131.(in Chinese) 廖彬,张陶,于炯,等.MapReduce能耗建模及优化分析[J].计算机研究与发展,2016,53(9):2107-2131.
[14] LIU Y,JING N,CHEN L,et al.Algorithm for processing k-nearest join based on R-tree in MapReduce[J].Journal of Software,2013,24(8):1836-1851.(in Chinese) 刘义,景宁,陈荦,等.MapReduce框架下基于R-树的k-近邻连接算法[J].软件学报,2013,4(8):1836-1851.
[15] CHEN Y,KEYS L,KATZ R H.Towards energy effcientmapreduce[R].Berkeley:EECS Department,University of California,2009.
[16] LEVERICH J,KOZYRAKIS C.On the energy (in) efficiency of hadoop clusters[J].ACM SIGOPS Operating Systems Review,2010,44(1):61-65.
[17] KAUSHIK R T,BHANDARKAR M.GreenHDFS:Towards an energy-conserving,storage-efficient,hybrid hadoop compute clu-ster [C]∥Proceedings of the 2010 International Conference on Power Aware Computing and Systems.Piscataway,NJ:IEEE,2010:1-9.
[18] KAUSHIK R T,BHANDARKAR M,NAHRSTEDT K.Evalua-tion and analysis of GreenHDFS:A self-adaptive,energy conserving variant of the hadoop distributed file system [C]∥Procee-dings of the 2nd IEEE International Conference on Cloud Computing Technology and Science.Piscataway,NJ:IEEE,2010:274-287.
[19] LANG W,PATEL J M.Energy management for mapreduceclusters[J].Proceedings of the VLDB Endowment,2010,3(1/2):129-139.
[20] WIRTZ T,GE R.Improving MapReduce energy efficiency for computation intensive workloads[C]∥2011 International Green Computing Conference and Workshops (IGCC).IEEE,2011:1-8.
[21] GOIRI ,LE K,NGUYEN T D,et al.GreenHadoop:leveraging green energy in data-processing frameworks[C]∥Proceedings of the 7th ACM European Conference on Computer Systems.ACM,2012:57-70.
[22] CARDOSA M,SINGH A,PUCHA H,et al.Exploiting Spatio-Temporal Tradeoffs for Energy Efficient MapReduce in the Cloud[R].Department of Computer Science and Engineering,University of Minnesota,2010.
[23] CHEN Y,GANAPATHI A,KATZ R H.To Compress or Not to Compress-Compute vs.IO Tradeoffs for Mapreduce Energy Efficiency[C]∥Proceedings of the First ACM SIGCOMM Workshop on Green Networking.ACM,2010:23-28.
[24] SONG J,LI T T,ZHU Z L,et al.Benchmarking and analyzing the energy consumption of cloud data management system[J].Chinese Journal of Computers,2013,36(7):1485-1499.(in Chinese) 宋杰,李甜甜,朱志良,等.云数据管理系统能耗基准测试与分析[J].计算机学报,2013,6(7):1485-1499.
[25] LIAO B,YU J,SUN H,et al.Energy-efficient algorithms for distributed storage system based on data storage structure reconfiguration[J].Journal of Computer Research and Development,2013,50(1):3-18.(in Chinese) 廖彬,于炯,孙华,等.基于存储结构重配置的分布式存储系统节能算法[J].计算机研究与发展,2013,50(1):3-18.
[26] LIAO B,YU J,ZHANG T,et al.Energy-efficient algorithms for distributed file system HDFS[J].Chinese Journal of Compu-ters,2013,36(5):1047-1064.(in Chinese) 廖彬,于炯,张陶,等.基于分布式文件系统HDFS的节能算法[J].计算机学报,2013,6(5):1047-1064.
[27] LIN J C,LEU F Y,CHEN Y P.Impact of MapReduce Policies on Job Completion Reliability and Job Energy Consumption[J].IEEE Transactions on Parallel & Distributed Systems,2015,26(5):1364-1378.
[28] LIAO B,YU J,ZHANG T,et al.Energy-Efficient Algorithms for Distributed Storage System Based on Block Storage Structure Reconfiguration[J].Journal of Network and Computer Applications,2015,48(2):71-86.
[29] YANG X Y,YU J,IBRAHIM T,et al.Collaborative filtering model fusing singularity and diffusion process[J].Journal of Software,2013,24(8):1868-1884.(in Chinese) 杨兴耀,于炯,吐尔根·依布拉音,等.融合奇异性和扩散过程的协同过滤模型[J].软件学报,2013,4(8):1868-1884.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!