计算机科学 ›› 2014, Vol. 41 ›› Issue (7): 30-35.doi: 10.11896/j.issn.1002-137X.2014.07.005

• 综述 • 上一篇    下一篇

一种面向分布式文件系统的文件预取模型的设计与实现

师明,刘轶,唐歌实   

  1. 北京航天飞行控制中心 北京100094;航天飞行动力学技术重点实验室 北京100094;北京航空航天大学计算机学院中德联合软件研究所 北京100191;北京航天飞行控制中心 北京100094;航天飞行动力学技术重点实验室 北京100094
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家“十二五”863计划信息技术领域重大项目“云计算关键技术与系统”课题:以公众汉语服务为主的搜索引擎研制(2011AA01A205)资助

Design and Implementation of File Prefetching Module Oriented to Distributed File System

SHI Ming,LIU Yi and TANG Ge-shi   

  • Online:2018-11-14 Published:2018-11-14

摘要: 如何为上层应用和计算提供稳定高效的文件I/O性能,是分布式文件系统性能研究的热点。文中分析分布式文件系统在设计机理上的共同特征,基于此提出一种通用型的启发式文件预取模型,并选取HDFS平台进行系统实现。启发式文件预取对上层应用透明,采用在文件系统内部建立预取线程池的方法,以组成文件块的数据存储文件为预取单位,在分布式文件系统内部实现。这种设计思路具有一定的普适性,适合推广应用于多种分布式文件系统。实验结果表明,所述的启发式文件预取,能够有效提升分布式文件系统的I/O性能。

关键词: 分布式文件系统,文件预取,启发式,HDFS 中图法分类号TP393文献标识码A

Abstract: How to provide a stable and efficient file I/O performance for the upper application and computing,is the performance research hotspot oriented to distributed file system.This paper analyzed the mechanism in the design of the distributed file system on the common features,presented a general-purpose file prefetching heuristic module,and selected HDFS platform system to implement.The heuristic file prefetching module services the upper application and accomplishes the implementation in the internal of distributed file system,using the method of establishing prefetching thread pool within the file system,and the data not block as prefetching unit.This idea has certain universality,and is suitable for a variety of distributed file systems.Experimental results show that the heuristic file prefetching method can enhance the distributed file system I/O performance effectively.

Key words: Distributed file system,File prefetching,Heuristic,HDFS

[1] Yue Yin-liang,Feng Dan,Wang Juang,et al.High AvailabilityStorage System Based on Two-level Metadata Management[C]∥Proceedings of Frontier of Computer Science and Technology(FCST 2007).2007:41-48
[2] Mackey G,Sehrish S,Wang Jun.Improving Metadata Management for Small Files in HDFS[C]∥Proceedings of Cluster Computing and Workshops.2009:1-4
[3] Yu Wei-kuan,Oral H S,Canon R S,et al.Empirical Analysis of a Large-Scale Hierarchical Storage System[C]∥Euro-Par 2008-Parallel Processing.2008:130-140
[4] Yu Wei-kuan,Jeffrey S V,et al.Performance Characterizationand Optimization of Parallel I/O on the Cray XT[C]∥The 22nd IEEE International Parallel and Distributed Processing Symposium(IPDPS2008).2008:1-11
[5] Dittrich J,Quiane R J A,Jindal A,et al.Hadoop++:Making a Yellow Elephant Run Like a Cheetah(Without It Even Noticing)[J].Proceedings of the VLDB Endowment,2010(3):515-529
[6] OMalley O.The Anatomy of Hadoop I/O Pipeline[EB/OL].http://developer.yahoo.com/
[7] Liu Xu-hui,Han Ji-zhong,Zhong Yun-qin,et al.Implementing WebGIS on Hadoop:A case study of improving small file I/O performance on HDFS[C]∥Cluster Computing and Workshops(CLUSTER’09).2009:1-8
[8] Dong Bo,Qiu Jie,Zheng Qing-hua,et al.A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop:A Case Study by PowerPoint Files[C]∥Services Computing Conference(SCC).2009:65-72

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!