计算机科学 ›› 2016, Vol. 43 ›› Issue (Z11): 495-498.doi: 10.11896/j.issn.1002-137X.2016.11A.111

• 软件工程与数据库技术 • 上一篇    下一篇

一种基于重复数据删除的镜像文件存储方法研究

李锋,陆婷婷,郭建华   

  1. 东华大学计算机科学与技术学院 上海201620,东华大学计算机科学与技术学院 上海201620,上海大学计算机科学与技术学院 上海200444
  • 出版日期:2018-12-01 发布日期:2018-12-01

Effective Image File Storage Technique Using Improved Data Deduplication

LI Feng, LU Ting-ting and GUO Jian-hua   

  • Online:2018-12-01 Published:2018-12-01

摘要: 在云计算环境中,基础设施即服务的日益发展导致虚拟机和虚拟机镜像的急剧增加,例如,Amazon Elastic Compute Cloud(EC2)有6521个公共虚拟机镜像文件,这给云环境的管理带来了极大的挑战,特别是大量镜像文件带来的重复数据的空间存储问题。为解决这一问题,提出一种基于固定分块的镜像文件重复数据删除的存储方案。当存储一个镜像文件时,先计算该镜像文件的指纹,并与指纹库的指纹比较,若存在则用指针替代,否则采用固定分块对镜像文件分割存储。为此,可以设计镜像文件元数据格式和镜像文件MD5索引表来解决上述问题。实验结果表明,内容相同的镜像文件只是元数据的开销并实现秒传,而相同版本、相同系统、不同软件的镜像组的重删率约达到58%。因此,本方案是非常有效的。

关键词: 云计算,重复数据删除,镜像文件存储

Abstract: In the cloud computing environment,the increasing development of Infrastructure as a Service leads to the sharp increase of virtual machine and virtual machine image.For example,Amazon Elastic Compute Cloud(EC2)has 6521 public virtual machine image files,which bring a great challenge to the management of the cloud environment.In particular,the spatial storage of duplicate data brought by a large number of mirror images.In order to solve this problem,this paper proposed a storage scheme for a fixed block of the image file based deduplication.When an image file is stored,we should calculate the image file’s fingerprint first and compared with the fingerprint database.If it exits in the fingerprint database,we should replace it with pointer,else using fixed block to splite and storage image file.To this end,we designed the image file metadata format and mirror file MD5 index table.The experiment shows that the same content image file is just the cost of metadata and the second pass.And the same version of the same system,but different software’s mirror group,whose deletion rate is about 58%.As a result,our scheme is very effective.

Key words: Cloud computing,Deduplication,Image file storage

[1] http://www.gartner.com/newsroom/id/2603 623
[2] Cloud A E C.Amazon elastic compute cloud (amazon ec2)[J].2013
[3] Beloglazov A,Buyya R.OpenStack Neat:a framework for dynamic and energy-efficient consolidation of virtual machines in OpenStack clouds[J].Concurrency and Computation:Practice and Experience,2014,27(5):1310-1333
[4] Ferdaus M H,Murshed M,Calheiros R N,et al.Network-Aware Virtual Machine Placement and Migration in Cloud Data Centers[J].Emerging Research in Cloud Distributed Computing Systems,2015,42
[5] Peng C,Kim M,Zhang Z,et al.VDN:Virtual machine imagedistribution network for cloud data centers[C]∥INFOCOM,2012 Proceedings IEEE.IEEE,2012:181-189
[6] Lent A F,Morrissette P M,Clayton-Luce T J.System and method for storage and deployment of virtual machines in a virtual server environment[P].U.S.Patent 8,3,203,2015-1-27
[7] Xu J,Zhang W,Zhang Z,et al.Clustering-based acceleration for virtual machine image deduplication in the cloud environment[J].Journal of Systems and Software,2016,121:144-156
[8] Jayaram K R,Peng C,Zhang Z,et al.An Empirical Analysis of Similarity in Virtual Machine Images[C]∥Proceedings of the Middleware 2011 Industry Track Workshop(Middleware’11) .New York,NY,USA.ACM,2011:1-6
[9] Wang J,Zhao Z,Xu Z,et al.I-sieve:An Inline High PerformanceDeduplication System Used in Cloud StorageI-sieve:An Inline High Performance Deduplication System Used in Cloud Storage[J].Tsinghua Science and Technology,2015,20(1):19-29
[10] Takahashi,Kazushi,Sasada K,et al.A fast virtual machine storage migration technique using data deduplication[C]∥The Third International Conference on Cloud Computing,GRIDs,and Virtualization(CLOUD COMPUTING 2012).2012
[11] Lewis R,Hartman J H.Accordion:multi-scale recipes for adaptive detection of duplication[C]∥7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15).2015
[12] Li T,Zhou R.Method and apparatus for virtual machine live storage migration in heterogeneous storage environment[P].U.S.Patent 9,5,401,2015-11-24
[13] Fu Yin-jin,et al.Deduplication based storage optimization technique for virtual desktop[J].Journal of Computer Research and Development,2012(S1)
[14] http://www.boot-us.com/gloss11.htm
[15] Padhye J D,Kandula S,Bahl P.Flyways in data centers[P].U.S.Patent 8,2,60,2015-3-3
[16] Reddy R,Kathpal A,Basak J,et al.Data layout for power effi-cient archival storage systems[C]∥Proceedings of the Workshop on Power-Aware Computing and Systems.ACM,2015:16-20

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!