计算机科学 ›› 2015, Vol. 42 ›› Issue (Z6): 332-336.
伍秋平,刘 波,林伟伟
WU Qiu-ping, LIU Bo and LIN Wei-wei
摘要: Hadoop默认采用副本冗余方式实现数据容错,但这种容错方式存在着空间占用过大、存储效率低等问题。为此,在分析了ARC缓存淘汰算法的基础上,提出了一种面向云存储数据容错的ARC缓存淘汰机制ARCMFF。在文件的访问过程中,ARCMFF通过维护一个LRU队列和一个LFU队列统计出访问频率高的文件并将其加入缓存系统中,以提高访问性能;在ARCMFF中,大部分文件采用的是纠删码方式容错存储,只有缓存中的文件才用副本冗余方式存储。纠删码的编码效率很高,因此系统能够节省大量的存储空间。实验结果表明,在分布式文件系统中,ARCMFF能够节省文件存储空间,大大地提高Hadoop的存储效率,且能够在一定程度上提高文件的写入性能。
[1] 郭全中,郭凤娟.大数据时代下的媒体机遇.http://media.people.com.cn/n/2014/0304/c192370-24525582.html [2] Pinheiro E,Weber W D,Barroso L A.Failure trends in a large disk drive population[C]∥Proc of the 5th USENIX Conf on File and Storage Technologies.Berkeley.CA:USENIX Association,2007:17-28 [3] Schroeder B,Gibson G A.Disk failures in the real world:What does an MTTF of 1,000,000 hours mean to you?[C]∥Proc of the 5th USENIX Conf on File and Storage Technologies.Berkeley.CA:USENIX Association,2007:1-16 [4] Bairavasundaram L N,Goodson G R,Pasupathy S,et al.An analysis of latent sector errors in disk drives[C]∥Proc of 2007 ACM SIGMETRICS IntConf on Measurement and Modeling of Computer Systems.New York:ACM,2007:289-300 [5] Satyanarayanan M,Howard J H,Nichols D A,et al.The ITC distributed file system:principles and design[M].ACM,1985 [6] Ghemawat S,Gobioff H,Leung S T.The Google file system[C]∥ ACM SIGOPS Operating Systems Review.ACM,2003,37(5):29-43 [7] Borthakur D.The hadoop distributed file system:Architectureand design[J].Hadoop Project Website,2007,11:21 [8] Palankar M R,Iamnitchi A,Ripeanu M,et al.Amazon S3 for science grids:a viable solution?[C]∥Proceedings of the 2008 international workshop on Data-aware distributed computing.ACM,2008:55-64 [9] Chu Yu.淘宝TFS的wiki.http://code.taobao.org/p/ tfs/wiki/index/ [10] McAuley A J.Reliable broadband communication using a burst erasure correcting code[J].ACM SIGCOMM Computer Communication Review,1990,20(4):297-306 [11] Weatherspoon H,Kubiatowicz J D.Erasure coding vs.replica-tion:A quantitative comparison[M]∥Peer-to-Peer Systems.Springer Berlin Heidelberg,2002:328-337 [12] Wu L,Liu B,Lin W.A Dynamic Data Fault-Tolerance Mechanism for Cloud Storage[C]∥2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies(EIDWT).IEEE,2013:95-99 [13] 林伟伟.一种改进的Hadoop数据放置策略[J].华南理工大学学报:自然科学版,2012,40(1):152-158 [14] 利业鞑,林伟伟.一种Hadoop数据复制优化方法[J].计算机工程与应用,2012,48(21):58-61 [15] 林伟伟,刘波.基于动态带宽分配的Hadoop数据负载均衡方法[J].华南理工大学学报:自然科学版,2012,0(9):42-47 [16] 林伟伟,贺品嘉,刘波.云存储系统的能耗优化节点管理方法[J].华南理工大学学报:自然科学版,2014,42(1):104-110 [17] Megiddo N,Modha D S.ARC:A Self-Tuning,Low Overhead Replacement Cache[C]∥FAST.2003,3:115-130 [18] 罗象宏,舒继武.存储系统中的纠删码研究综述[J].计算机研究与发展,2012,49(1):1-11 [19] Lin W K,Chiu D M,Lee Y B.Erasure Code Replication Revisited[C]∥Peer-to-Peer Computing.2004:90-97 [20] 康殿统,王文娟,杨雯.关于 Pareto 分布的一个综合研究[J].河西学院学报,2008,24(2):1-5 |
No related articles found! |
|