摘要: 目前企业对数据量不断增长的需求使得数据中心面临严峻的挑战。研究发现,存储系统中高达60%的数据是冗余的,如何缩减存储系统中的冗余数据受到越来越多科研人员的关注。重复数据删除技术利用CPU计算资源,通过数据块指纹对比能够有效地减少数据存储空间,已成为工业界和学术界研究的热点。在分析和总结近10年重复数据删除技术文献后,首先通过分析卷级重删系统体系结构,阐述了重删系统的原理、实现机制和评价标准。然后结合数据规模行为对重删系统性能的影响,重点分析和总结了重删系统的各种性能改进技术。最后对各种应用场景的重删系统进行对比分析,给出了4个需要重点研究的方向,包括基于主存储环境的重删方案、基于分布式集群环境的重删方案、快速指纹查询优化技术以及智能数据检测技术。
[1] Gartner:IT数据量平均增长40%至60% [EB/ OL].http://www.199it.com/archives/16863.html,2011-10-13/2012-06-05 [2] Greenan K M,Long D D E,et al.A spin-up save- d is energy earned:achieving power-efficient,erasurecoded storage[A]∥Proceedings of the 4th Conference on Hot Topics in System Dependability[C].Berkeley:USENIX,2008:4-4 [3] 郭平.消除冗余解放容量[EB/OL].http://www2.ccw.com.cn/07/0710/c/0710c24_4.html,2007-03-19/2012-06-07 [4] McKnight J,Asaro T,et al.Digital archiving:end-user surveyand market forecast 2006-2010[EB/OL].http://www.esg-global.com/research-reports/digital-archiving-end-user-survey-market-forecast-2006-2010/,2006-03-15/2012-06-07 [5] 敖莉,舒继武,李明强.重复数据删除技术[J].软件学报,2010,21(5):916-929 [6] 付印金,肖侬,刘芳.重复数据删除关键技术研究进展[J].计算机研究与发展,2012,49(1):12-20 [7] Lessfs:Open source data deduplication[EB/OL].http://www.lessfs.com/wordpress/,2009-03-25/2012-07-05 [8] OpenDedup:Deduplication with OpenDedup [EB/ OL].http://www.tuxlanding.net/deduplication- with-opendedup/,2011-07-13/2012-05-05 [9] FUSE:File systems using FUSE[EB/OL].http:// fuse.sourceforge.net/,2012-08-23/2012-08-25 [10] SCST:GENERIC SCSI TARGET SUBSYSTEM FOR LINUX[EB/OL].http://scst.sourceforge.net /index.html,2012-03-20/2012-06-25 [11] Ng C-H,Ma Ming-cao,et al.Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud[A]∥Proceedings of the 12th ACM/IFIP/USENIX International Conference on Middleware[C].Berlin:Spinger-Verlag,2011:81-100 [12] Koller R,Rangaswami R.I/O Deduplication:Utilizing Content Similarity to Improve I/O Performance[J].ACM Transactions on Storage,2010,6(3):13 [13] Srinivasan K,Bisson T,et al.iDedup:Latency-aware,inline data deduplication for primary storage[A]∥Proceedings of 10th USENIX Conference on File and Storage Technologies [C].CA,USA:USENIX,2012:299-312 [14] Hong Bo,Plantenberg D,et al.Duplicate data elimination in aSAN file system[A]∥Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies[C].College Park,MD:IEEE,2004:301-314 [15] 去重和压缩[EB/OL].http://articles.e-works.net.cn/storage/article79873.htm,2010-08-24/2012-07-03 [16] Bolosky W J,Corbin S,et al.Single instance storage in windows 2000[A]∥Proceedings of the 4th USENIX Windows System Symposium[C].Washington:USENIX,2000:13-24 [17] Tsuchiya Y,Watanabe T,et al.DBLK:Deduplication for Primary Block Storage[A]∥Proceedings of the 27th IEEE Symposium on Mass Storage Systems and Technologies[C].Piscataway:IEEE,2011:1-5 [18] Denehy T E,Hsu W W.Duplicate management for reference data[R].IBM Research Report,RJ 10305(A0310-017).IBM Research Division,2003 [19] Bobbarjung D R,Jagannathan S,et al.Improving DuplicateElimination in Storage Systems[J].ACM Transaction on Storage,2006,2(4):424-448 [20] Understanding data deduplication ratios [EB/OL].http://www.snia.org/sites/default/files/Understanding_Data_Deduplication_Ratios-20080718.pdf,2008-07-18/2012-03-15 [21] Tan Yu-juan,Jiang Hong,et al.SAM:A Semantic-AwareMulti-Tiered Source De-duplication Framework for Cloud backup [A]∥Proceedings of the 39th International Conference on Parallel Processing[C].Los Alamitos,CA,USA:IEEE,2010:614-623 [22] Hash Collisions:The Real Odds[EB/OL].http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html,2007-10-14/2011-12-05 [23] Guo F,Efstathopoulos P.Building a high performance deduplication system[A]∥Proceedings of the 2011USENIX Annual Technical Conference [C].Berkeley:USENIX,2011:25-25 [24] Zhu Benjamin,Li Kai,et al.Avoiding the disk bottleneck in the Data Domain deduplication file system[A]∥Proceedings of the 6th USENIX Conference on File and Storage Technologies [C].Berkeley:USENIX,2008:269-282 [25] Quinlan S,Dorward S.Venti:A new approach to archival stora-ge[A]∥Proceedings of the FAST’02Conference on File and Storage Technologies[C].Berkeley:USENIX,2002:89-101 [26] Lillibridge M,Eshghi K,et al.Sparse indexing:Large scale,inline deduplication using sampling and locality[A]∥Proceedings of the 7th USENIX Conference on File and Storage Technologies[C].Berkeley:USENIX,2009:111-123 [27] Bhagwat D,Eshghi K,et al.Extreme Binning:Scalable,parallel deduplication for chunk-based file backup[A]∥Proceedings of the 17th IEEE International Symposium on Modeling,Analysis,and Simulation of Computer and Telecommunication Systems[C].London:IEEE,2009:1-9 [28] Xia Wen,Jiang Hong,et al.Accelerating Data De- duplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors [EB/OL].http://static.usenix.org/events/fast12/ poster_descriptions/Xiadescription.pdf,2012-3-2/2012-7-6 [29] Ousterhout J K,Agrawal P,et al.The case for RAMClouds:scalable high-performance storage entirely in DRAM[J].Opera-ting Systems Review,2009,43(4):92-105 [30] Bartizal D.Thomas Northfield.Solid State Drive PerformanceWhite Paper[EB/OL].http://www.csee.umbc.edu/~squire/images/ssd2.pdf,2008-3-24/2012-6-7 [31] Benefits of SSD vs.HDD[EB/OL].http://www.amplicon.com/docs/white-papers/SSD-vs-HDD-white-paper.pdf,2012-3-21/2012-7-8 [32] Solid State Drive vs.Hard Disk Drive Price and PerformanceStudy[EB/OL].http://www.dell.com/downloads/global/products/pvaul/en/ssd_vs_hdd_price_and_performance_study.pdf,2011-5-1/2012-8-19 [33] Flash Memory Technology in Enterprise Storage Flexible Cho-ices to Optimize Performance [EB/ OL].http://www.itdialogue.com/wp-content/ uploads/2010/04/Flash-in-Enterprise-Storage.pdf,2008-11-1/2012-3-4 [34] Debnath B,Sengupta S,et al.Chunkstash:speeding up in-linestorage deduplication using flash memory[A]∥Proceedings of the 2010USENIX Annual Technical Conference[C].Boston:USENIX,2010:16-16 [35] The Art of Data Deduplication.http:// www.ecsl.cs.sunysb.edu/tr/rpe21.pdf [36] Dubnicki C,Gryz L,et al.HYDRAstor:a scalable secondarystorage[A]∥Proceedings of the 7th USENIX Conference on File and Storage Echnologies [C].Berkeley:USENIX,2009:197-210 [37] IBM System Storage N series Software Guide.http://www.redbooks.ibm.com/abstracts/sg247129.html,December 2010 [38] Alvarez C.NetApp deduplication for FAS and V-Series deployment and implementation guide[R].Technical Report TR-3505.NetApp,2011 [39] EMC.Achieving storage efficiency through EMC Celerra data deduplication[M].White paper,Mar.2010 [40] IBM Corporation.IBM white paper:IBM Storage Tank-A distributed storage system[M].Jan.2002 [41] Kulkarni P,Douglis F,et al.Redundancy elimination withinlarge collections of files[A]∥Proceedings of the 2004USENIX Annual Technical Conference[C].Boston:USENIX,2004:59-72 [42] You L L,Pollack K T,et al.Deep Store:An archival storagesystem architecture[A]∥Proceedings of the 21st International Conference on Data Engineering[C].Los Alamitos:IEEE,2005:804-815 [43] Jain N,Dahlin M,et al.TAPER:Tiered approach for eliminating redundancy in replica synchronization[A]∥Proceedings of the 5th USENIX Conference on File and Storage Technologies [C].Berkeley:USENIX,2005:281-294 [44] Rhea S,Cox R,et al.Fast,inexpensive content-addressed storage in foundation[A]∥Proceedings of the 2008USENIX AnnualTechnical Conference[C].Berkeley:USENIX,2008:143-156 [45] Meister D,Brinkmann A.dedupv1:Improving deduplication th-roughput using solid state drives (SSD)[A]∥Proceedings of the 26th IEEE Conference on Mass Storage Systems and Technologies[C].Piscataway:IEEE,2010:1-6 [46] Dong W,Douglis F,et al.Tradeoffs in scalable data routing for deduplication clusters[A]∥Proceedings of the Ninth USENIX Conference on File and Storage Technologies [C].Berkeley:USENIX,2011:15-29 [47] Xia W,Jiang H,et al.Silo:a similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput[A]∥Proceedings of the 2011USENIX Annual Technical Con- ference[C].Berkeley:USENIX,2011:26-28 [48] Zfs deduplication[EB/OL].https://blogs.oracle.com/bonwick/entry/zfs_dedup,2009-11-01/2011.11.05 [49] Data striping[EB/OL].http://en.wikipedia.org /wiki/Data_striping,2012-08-15/2012-08-23 [50] Reed-Solomon Codes [EB/OL].http://hscc.cs.nthu.edu.tw/~sheujp/lecture_note/rs.pdf |
No related articles found! |
|