计算机科学 ›› 2015, Vol. 42 ›› Issue (10): 76-80.

• 网络与通信 • 上一篇    下一篇

海量教育资源中小文件的存储研究

游小容,曹晟   

  1. 电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受教育部——中国移动科研基金项目:海量教育资源去存储与获取关键技术研究与实现(MCM20121041)资助

Storage Research of Small Files in Massive Education Resource

YOU Xiao-rong and CAO Sheng   

  • Online:2018-11-14 Published:2018-11-14

摘要: Hadoop作为成熟的分布式云平台,能提供可靠高效的存储服务,常用来解决大文件的存储问题,但在处理海量小文件时效率显著降低。提出了基于Hadoop的海量教育资源中小文件的存储优化方案,即利用教育资源小文件间的关联关系,将小文件合并成大文件以减少文件数量,并用索引机制访问小文件及元数据缓存和关联小文件预取机制来提高文件的读取效率。实验证明,以上方法提高了Hadoop文件系统对小文件的存取效率。

关键词: Hadoop,海量小文件,小文件合并,预取缓存

Abstract: As a distributed cloud platform,Hadoop is one of the most widely used cloud storage technology for applications with large datasets to provide reliable and efficient storage service,but it suffers a performance penalty with increased number of small files.In order to improve the efficiency of storing and accessing the small files on Hadoop,we proposed a scheme,based on the relationship of small files.In the scheme,a set of correlated files is combined into a large file to reduce the file count,indexing mechanism is used to access small file and metadata cache, and associated small file prefetching mechanism is used to improve the efficiency of file read.The experimental results indicate that the above methods can improve the storage and access efficiency of small file on Hadoop.

Key words: Hadoop,Massive small files,Merged small files,Prefetching and cache

[1] kkdelta.告诉你Hadoop是什么[EB/OL].[2014-06-17].http://www.thebigdata.cn/Hadoop/10722.html
[2] 周敏奇,王晓玲,金澈清,等.Hadoop权威指南(第2版)[M].北京:清华大学出版社,2011:8-20
[3] White T.The small files problem [EB/OL].[2009-2-2].http://www.cloudera.com/blog/2009/02/the-small-files-problem
[4] Dong Bo,Qiu Jie,Zheng Qing-hua,et al.A novel approach to improving the efficiency of storing and accessing small files on Hadoop:a case study by powerpoint files [C]∥IEEE International Conference on Services Computing.Miami,Florida,Piscataway:IEEE,2010:65-72
[5] 李宽.基于HDFS的分布式Namenode节点模型的研究 [D].广州:华南理工大学,2011 Li Kuan.Research of the Model of Distributed Namodes in HDFS[D].Guangzhou:South China University of Technology,2011
[6] 赵晓永,杨扬,孙莉莉,等.基于Hadoop的海量MP3文件存储架构[J].计算机应用技术,2012,32(6):1724-1726 Zhao Xiao-yong,Yang Yang,Sun Li-li,et al.Hadoop-based storage architecture for mass MP3 files[J].Journal of Computer Applications,2012,2(6):1742-1726
[7] Fu Song-ling,Huang Chen-lin,He Li-gang,et al.iFlatLFS:Performance Optimization for Accessing Massive Small Files[C]∥20th International Conference on High Performance Computing.Bangalor,Piscataway:IEEE,2013:10-19
[8] Li Jia,Lin Kun-hui,Wang Jing-jin.Design of the Mass Multimedia Files Storage Architecture Based on Hadoop[C]∥the 8th International Conference on Computer Science & Education.Colomlo,Piscataway:IEEE,2013:801-804
[9] 王涛,姚世红,徐正全,等.云存储中面向访问任务的小文件合并与预取策略[J].武汉大学学报(信息科学版),2013,8(12):1504-1508 Wang Tao,Yao Shi-hong,Yu Zheng-quan,et al.A Small File Merging and Prefetching Strategy Based on Access Task in Cloud Storage[J].Geomatics and Information Science of Wuhan University,2013,8(12):1504-1508
[10] Chandrasekar S,Dakshinamurthy R,Seshakumar P G,et al.A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System[C]∥2013 International Conference on Computer Communication and Informatics (ICCCI).Coimbatore,Piscataway:IEEE,2013:1-8
[11] 郑庆华,董博,刘均,等.一种基于Hadoop的海量可归类小文件关联存储方法:中国,102332029A[P].2012-01-25

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!