计算机科学 ›› 2017, Vol. 44 ›› Issue (6): 31-35.doi: 10.11896/j.issn.1002-137X.2017.06.005
• 2016 年全国信息存储技术学术年会 • 上一篇 下一篇
孟红涛,余松平,刘芳,肖侬
MENG Hong-tao, YU Song-ping, LIU Fang and XIAO Nong
摘要: Spark系统是基于Map-Reduce模型的大数据处理框架。Spark能够充分利用集群的内存,从而加快数据的处理速度。Spark按照功能把内存分成不同的区域:Shuffle Memory和Storage Memory,Unroll Memory,不同的区域有不同的使用特点。首先,测试并分析了Shuffle Memory和Storage Memory的使用特点。RDD是Spark系统最重要的抽象,能够缓存在集群的内存中;在内存不足时,需要淘汰部分RDD分区。接着,提出了一种新的RDD分布式权值缓存策略,通过RDD分区的存储时间、大小、使用次数等来分析RDD分区的权值,并根据RDD的分布式特征对需要淘汰的RDD分区进行选择。最后,测试和分析了多种缓存策略的性能。
[1] ZAHARIA M,CHOWDHURY M,FRANKLINM J,et al.Spark:cluster computing with working sets[C]∥ Usenix Conference on Hot Topics in Cloud Computing.2010:10 . [2] WARNEKE D,LENG C.A Case For Dynamic Memory Partitioning in Data Centers[C]∥ The Workshop on Data Analytics in the Cloud.2013:41-45. [3] LI H,GHODSI A,ZAHARIA M,et al.Tachyon:Reliable,memory speed storage for cluster computing frameworks[C]∥Proceedings of the ACM Symposium on Cloud Computing.ACM,2014:1-15. [4] ANANTHANARAYANAN G,GHODSI A,WANG A,et al.PACMan:coordinated memory caching for parallel jobs[C]∥Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.USENIX Association,2012:20. [5] DUAN M,LI K,TANG Z,et al.Selection and replacement algorithms for memory performance improvement in Spark[J].Concurrency and Computation:Practice and Experience,2015,28(8):2473-2486. [6] FENG L.Research and Implementation of Memory Optimization Based on Parallel Computing Engine Spark[D].Beijing:Tsinghua University,2013.(in Chinese) 冯琳.集群计算引擎Spark中的内存优化研究与实现[D].北京:清华大学,2013. [7] ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing: UCB/EECS-2011-82[R].EECS Department,University of California,Berkeley,2011. [8] ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX Confe-rence on Networked Systems Design and Implementation.USENIX Association,2012:2. [9] GRISHCHENKO A.Spark Architecture:Shuffle[EB/OL].(2015-08)[2016-09].https://0x0fff.com/spark-architecture-shuffle. [10] WHITE T.Hadoop:The Definitive Guide,3E.[M].California:O’Reilly Medis,2012:226-227. [11] WANG L,ZHAN J,LUO C,et al.Bigdatabench:A big databenchmark suite from internet services[C]∥2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).IEEE,2014:488-499. [12] GAO Y J.DataProcessing with Spark,Technology,Application and Performance Optimization[J].Beijing:China Machine Press,2014:38-39.(in Chinese) 高彦杰.Spark大数据处理技术,应用与性能优化[M].北京:机械工业出版社,2014:38-39. |
No related articles found! |
|