Computer Science ›› 2025, Vol. 52 ›› Issue (1): 120-130.doi: 10.11896/jsjkx.231200011
• Database & Big Data & Data Science • Previous Articles Next Articles
YAO Zilu1, FU Yinjin2, XIAO Nong1,2
CLC Number:
[1]国际数据中心IDC[EB/OL].https://www.idc.com/. [2]IBM数据生命周期管理[EB/OL].https://www.ibm.com/cn-zh/topics/data-lifecycle-management. [3]XIE P.Survey on Data Deduplication Techniques for StorageSystem.[J].Computer Science,2014,41(1):22-30,42. [4]FU Y,XIAO N,LIU F.Research and Development on KeyTechniques of Data Deduplication[J].Journal of Computer Research & Development,2012,49(1):12-20. [5]LEE T,MONGA S K,MIN C W,et al.Memtis:Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination[C]//ACM SIGOPS 29th Symposium on Operating Systems Principles.ACM,New York,NY,USA,2023. [6]HILDEBRAND M,KHAN J,TRIKA S,et al.AutoTM:Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming[C]//Architectural Support for Programming Languages and Operating Systems.ACM,2020. [7]Amazon Web Services.Amazon s3 price[EB/OL].https://www.amazonaws.cn/s3/pricing/. [8]Microsoft Azure.Storage Price [EB/OL].https://azure.mi-crosoft.com/zh-cn/pricing/details/storage/blobs/#pricing. [9]百度云对象存储BOS[EB/OL].https://cloud.baidu.com/pro-duct/bos.html. [10]阿里云对象存储OSS[EB/OL].https://www.aliyun.com/product/oss. [11]腾讯云对象存储COS[EB/OL].https://cloud.tencent.com/product/cos. [12]KOTLARSKA I,JACKOWSKI A,LICHOTA K,et al.Infty-Dedup:scalable and cost-effective cloud tiering with deduplication[C]//Proceedings of the 21st USENIX Conference on File and Storage Technologies.2023. [13]YANG Z Y,WANG Y F,BHAMIN I,et al.EAD:elasticityaware deduplication manager for datacenters with multi-tier sto-rage systems[J].Cluster Computing,2018,21(3):1561-1579. [14]WANG H,ZHANG J W,HUANG P,et al.Cache What YouNeed to Cache:Reducing Write Traffic in Cloud Cache via “One-Time-Access-Exclusion” Policy[J].ACM Transactions on Sto-rage,2020,16(3):1-24. [15]ETHEM A.Introduction to Machine Learning[J].MIT Press,Cambridge,MA,2014. [16]LEO B,FRIEDMAN J H,OLSHEN R A,et al.Classificationand Regression Trees[J].Biometrics,1984,40(3),358. [17]XIA W,JIANG H,FENG D,et al.A comprehensive study of the past,present,and future of data deduplication[J].Proceedings of the IEEE,2016,104(9):1681-1710. [18]WANG C D,WEI Q S,YANG J,et al.Nv-dedup:Highperformance inline deduplication for non-volatile memory[J].IEEE Transactions on Computers,2017,67(5):658-671. [19]QIU J S,PAN Y Q,XIA W,et al.Light-Dedup:A Light-weight Inline Deduplication Framework for Non-Volatile Memory File Systems[C]//USENIX Annual Technical Conference.2023:101-116. [20]BOTELHO F C,GARG N,SHILANE P N,et al.Memory efficient sanitization of a deduplicated storage system.中国专利:US9317218[P],2016.04.19. [21]DUGGAL A,JENKINS F,SHILANE P,et al.Data domain cloud tier:backup here,backup there,deduplicated everywhere![C]//USENIX Annual Technical Conference.2019. [22]ZHU B,LI K,PATTERSON H.Avoiding the disk bottleneck in the Data Domain deduplication file system[C]//6th USENIX Conference on File and Storage Technologies.2008. [23]MERKLE RALPH C.Digital signature system and methodbased on a conventional encryption function,US7967587A[P].1987.07.30. [24]ESHGHI K,LILLIBRIDGE M,WILCOCK L,et al.Jumbostore:Providing efficient incremental upload and versioning for a utility rendering service[C]//5th USENIX Conference on File and Storage Technologies.2007. [25]SONG L S,DENG Y H,XIE J J.Exploiting Fingerprint Prefetching to Improve the Performance of Data Deduplication[C]//IEEE International Conference on High Performance Computing & Communications & IEEE International Confe-rence on Embedded & Ubiquitous Computing.2013. [26]ZHOU Y T,DENG Y H,XIE J J.Leverage similarity and locality to enhance fingerprint prefetching of data deduplication[C]//IEEE International Conference on Parallel and Distributed Systems.2014. [27]ZHOU Y T,DENG Y H,CHEN X G,et al.Identifying file similarity in large data sets by modulo file length[C]//Algorithms and Architectures for Parallel Processing.2014. [28]MANKU G S,JAIN A,DAS S A.Detecting near-duplicates for web crawling[C]//International Conference on World Wide Web.ACM,2007. [29]CHARIKAR M S.Similarity estimation techniques from rounding algorithms[C]//Thiry-fourth Acm Symposium on Theory of Computing.ACM,2002:380-388. [30]INDYK P,MOTWANI R.Approximate nearest neighbors:towards removing the curse of dimensionality[C]//Proceedings of the 30th ACM Symposium on Theory of Computing(STOC’98)1998:604-613. [31]QIN Y B,ZHANG X B,DAVID J.PBCCF:Accelerated Deduplication by Prefetching Backup Content Correlated Fingerprints[C]//2020 IEEE 38th International Conference on Computer Design.2020. [32]GUO F,EFSTATHOPOULOS P.Building a high-performance deduplication system[C]//2011 USENIX Annual Technical Conference.USENIX Association,2011. [33]ZHANG Y C,XIA W,FENG D,et al.Finesse:Fine-grained feature locality based fast resemblance detection for postdeduplication delta compression[C]//USENIX FAST.2019. [34]PARK J,KIM J,KIM Y,et al.DeepSketch:A New MachineLearning-Based Reference Search Technique for Post-Deduplication Delta Compression[C]//20th USENIX Conference on File and Storage Technologies(FAST 22).2022:247-264. [35]LLOYD S P.Least squares quantization in PCM[J].IEEETrans.,1982,28(2):129-137. [36]SU S P,ZHANG C,HAN K,et al.Greedy hash:Towards fast optimization for accurate hash coding in cnn[C]//NIPS.2018:806-815. [37]Yahoo! Japan Corp.Neighborhood graph and tree for indexing high-dimensional data[EB/OL].https://github.com/yahoojapan/NGT. [38]ZUO P F,HUA Y,ZHAO M,et al.Improving the Performance and Endurance of Encrypted Non-Volatile Main Memory through Deduplicating Writes[C]//2018 51st Annual IEEE/ACM International Symposium on Microarchitecture(MICRO).ACM,2018. [39]JAULMES L,MORETO M,VALERO M,et al.A Vulnerability Factor for ECC-protected Memory[C]//2019 IEEE 25th International Symposium on On-Line Testing And Robust System Design(IOLTS).IEEE,2019. [40]DU C F,WU S Z,WU J P,et al.ESD:An ECC-assisted and Selective Deduplication for Encrypted Non-Volatile Main Memory[C]//2023 IEEE International Symposium on High-Perfor-mance Computer Architecture.2023. [41]YIN J W,TANG Y,DENG S G,et al.MUSE:A Multi-Tierdand SLA-Driven Deduplication Framework for Cloud Storage Systems[J].IEEE Transactions on Computers,2021,70(5):759-774. [42]SLEATOR D D,Tarjan R E.Amortized efficiency of list update paging rules[J].Communications of the ACM,1985,28(2):202-208. [43]MEGIDDO N.ARC:A self-tuning,low overhead Replacement cache[C]//USENIX File and Storaqe Technologies Conference(FAST’03).2003. [44]JIANG S.LIRS:An Efficient Low Inter-reference Recency Set Replacement Policy to Improve Buffer Cache Performance[C]//Proceedings of the International Conference on Measurements and Modeling of Computer Systems,2002. [45]ZHOU Y Y,PHILBIN J,LI K.The multi-queue replacement algorithm for second level buffer caches[C]//Proceedings of the USENIX Annual Technical Conference.CA,USA,2002:91-104. [46]WILKES T M W J.My cache or yours? Making storage more exclusive[C]//Proceedings of the General Track:2002 USENIX Annual Technical Conference.2002. [47]XIAO N,ZHAO Y J,LIU F,et al.Dual queues cache replacement algorithm based on sequentiality detection[J].Science China(Information Sciences),2012,55(1):191-199. [48]LI W J,GREGORY J B,JUAN R,et al.CacheDedup:in-line deduplication for flash caching[C]//Proceedings of the 14th Usenix Conference on File and Storage Technologies.2016. [49]CAO Z C,WEN H,GE X Z,et al.TDDFS A Tier-Aware Data Deduplication-Based File System[J].ACM Transactions on Storage,2019,15(1):4. [50]KISOUS R,KOLIKANT A,DUGGAL A,et al.The what,The from,and The to:The Migration Games in Deduplicated Systems[J].ACM Transactions on Storage,2022,18(4):1-29. [51]HARNIK D,HERSHCOVITCH M,SHATSKY Y,et al.Ske-tching volume capacities in deduplicated storage[C]//17th USENIX Conference on File and Storage Technologies.2019. [52]NACHMAN A,SHEINVALD S,KOLIKANT A,et al.Go-Seed:Optimal seeding plan for deduplicated storage[J].ACM Transactions on Storage,2021,17(3):1-28. [53]Gurobi[EB/OL].https://www.gurobi.com/. [54]LIU Y,WANG H,ZHOU K,et al.A survey on AI for storage[J].CCF Transactions on High Performance Computing,2022,4(3):233-264. |
[1] | MAO Zhixiong, LIU Zhinan, GAO Xuning, WANG Mengxiang, GONG Shufeng, ZHANG Yanfeng. Power-PCSR:An Efficient Dynamic Graph Storage Structure for Power-law Graphs [J]. Computer Science, 2024, 51(8): 56-62. |
[2] | ZHOU Yiteng, TANG Xin, JIN Luchao. Adaptive MSB Reversible Data Hiding Based Security Deduplication for Encrypted Images in Cloud Storage [J]. Computer Science, 2024, 51(12): 352-360. |
[3] | XU Kun, FU Yin-jin, CHEN Wei-wei, ZHANG Ya-nan. Research Progress on Blockchain-based Cloud Storage Security Mechanism [J]. Computer Science, 2021, 48(11): 102-115. |
[4] | LU Ye-shan. Common Issues and Case Analysis of System Data Migration [J]. Computer Science, 2019, 46(6A): 412-416. |
[5] | ZHANG Gui-peng, CHEN Ping-hua. Secure Data Deduplication Scheme Based on Merkle Hash Tree in HybridCloud Storage Environments [J]. Computer Science, 2018, 45(11): 187-192. |
[6] | ZHANG Yong, ZHANG Jie-hui and LIU Bin. Big Data Dynamic Migration Method Based on Global Load Balancing in Cloud Environment [J]. Computer Science, 2018, 45(1): 196-199. |
[7] | LI Feng, LU Ting-ting and GUO Jian-hua. Effective Image File Storage Technique Using Improved Data Deduplication [J]. Computer Science, 2016, 43(Z11): 495-498. |
[8] | XIE Ping. Survey on Data Deduplication Techniques for Storage Systems [J]. Computer Science, 2014, 41(1): 22-30. |
[9] | SHI Guang-yuan and ZHANG Yu. Hierarchical Storage Access Model Based on Multi-Attributes Measurement [J]. Computer Science, 2013, 40(Z11): 165-169. |
[10] | SHI Guang-yuan and ZHANG Yu. Research on Fuzzy Logic-based Model of Tiered Storage [J]. Computer Science, 2013, 40(Z11): 284-287. |
[11] | ZHENG Sheng and LI Tong. Data Placement Algorithm for Large-scale Storage System [J]. Computer Science, 2013, 40(Z11): 270-273. |
[12] | LUO Xiang-yu,WANG Yun and CHEN Xiao-mei. Evaluation and Analysis of Load Balancing Mechanisms in Storage Systems [J]. Computer Science, 2013, 40(9): 55-60. |
[13] | . Research on Evidence Collection under Cloud Computing Environment [J]. Computer Science, 2012, 39(9): 105-108. |
[14] | GE Xiong-zi,FENG Dan,LU Cheng-tao,JIN Chao. Dynamic Analysis Model of Green Network Storage Systems [J]. Computer Science, 2011, 38(8): 291-296. |
[15] | LIU Ke,QIN Lei-hua,ZHOU Jing-li,NIE Xue-jun,ZENG Dong. Two-phrase Retrieval Strategy in Content Aware Network Storage System [J]. Computer Science, 2011, 38(5): 20-23. |
|