计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 48-57.doi: 10.11896/jsjkx.241000022

• 数据库&大数据&数据科学 • 上一篇    下一篇

三数据中心下的纠删码算法研究

孙婧1, 牛虹婷2, 梁松涛3   

  1. 1 华东政法大学智能科学与信息法学系 上海 201620
    2 北京航空航天大学计算机学院 北京 100191
    3 上海哔哩哔哩科技有限公司多媒体技术中心 上海 200433
  • 收稿日期:2024-10-08 修回日期:2024-11-26 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 孙婧(jingsuncs@126.com)
  • 基金资助:
    国家自然科学基金(12161080);数据恢复四川省重点实验室开放基金项目(DRN2204)

Study on Erasure Code Algorithm for Three Data Centers

SUN Jing1, NIU Hongting2, LIANG Songtao3   

  1. 1 Department of Intelligent Science and Information Law,East China University of Political Science and Law,Shanghai 201620,China
    2 School of Computer Science and Engineering,Beihang University,Beijing 100191,China
    3 Multimedia Technology Center,Shanghai Bilibili Technology Co.,Ltd,Shanghai 200433,China
  • Received:2024-10-08 Revised:2024-11-26 Online:2025-02-15 Published:2025-02-17
  • About author:SUN Jing,born in 1985,Ph.D,lecturer,is a member of CCF(No.30246M).Her main research interests include distributed storage systems,cloud computing and knowledge distillation.
  • Supported by:
    National Natural Science Foundation of China(12161080) and Open Research Fund Program of Data Recovery Key Laboratory of Sichuan Province (DRN2204).

摘要: 纠删码算法在单数据中心和多数据中心得到了广泛的应用。目前对纠删码算法的研究更多地关注存储成本和修复带宽,对于如何在专线带宽、交换机受限的情况下完成多数据中心之间的修复,如何在可靠性、容错能力等核心因素之间实现最佳权衡等问题,没有进行充分的分析和解决。针对三数据中心这种最常用的多数据中心场景,首先,提出了纠删码在系统设计中重要的4个因素:冗余度、可靠性、容错能力及解码带宽。其次,根据提出的4个因素,设计了一种单数据中心下满足最优带宽修复的S-LRC算法。再根据提出的S-LRC算法,设计了满足三中心架构体系下的G-LRC算法。相比传统的编码方案,提出的G-LRC算法具有更高的可靠性、更大的容错性及解码带宽惩罚比。其两节点故障时解码带宽惩罚比仅为传统方案的1/7~2/7。最后,将G-LRC算法在大文件存储系统中进行了实现和验证,并且设计了解码最优决策算法来减少修复的带宽,解决了非最大距离可分割码算法在系统中落地难的问题。

关键词: 三数据中心, 纠删码, 局部可修复码, 最大可恢复编码, Reed-Solomon码

Abstract: Erasure coding algorithms are widely applied in both single-data-center and multi-data-center environments.Current research on erasure coding algorithms primarily focuses on storage cost and repair bandwidth.However,challenges such as how to perform repairs across multiple data centers with limited dedicated bandwidth and switch constraints,and how to balance key factors like reliability and fault tolerance,have not been thoroughly analyzed or addressed.This study targets the three-data-center scenario,which is one of the most common multi-data-center configurations.First,it identifies four crucial factors for erasure coding in system design:redundancy,reliability,fault tolerance,and decoding bandwidth.Based on these factors,it then proposes ansingle-data-center local reconstruction code(S-LRC) algorithm that achieves optimal bandwidth repair within a single data center.Building on the S-LRC algorithm,it further develops the global local reconstruction code(G-LRC) algorithm to accommodate the architecture of a three-data-center setup.Compared to traditional coding schemes,the proposed G-LRC algorithm offers higher reliability,greater fault tolerance,and a lower decoding bandwidth penalty.Specifically,for two-node failures,the decoding bandwidth penalty of G-LRC is only 1/7~2/7 of that of traditional schemes.Finally,the G-LRC algorithm is implemented and validated in a large file storage system,where an optimal decoding decision algorithm is designed to reduce repair bandwidth,addressing the deployment challenges of non maximum distance separable(non-MDS) codes in practical systems.

Key words: Three data center, Erasure code, Local reconstruction code, Maximum recovery code, Reed-solomon code

中图分类号: 

  • TP391.4
[1]FORD D,LABELLE F,POPOVICI F I,et al.Availability inGlobally Distributed Storage Systems[C]//The 9th USENIX Symposium on Operating Systems Design and Implementation.2010:61-74.
[2]SVEND F,ARIF M,YASUSHI S,et al.A decentralized algo-rithm for erasure-coded virtual disks[C]//In International Conference on Dependable Systems and Networks.2004:125-134.
[3]SCHROEDER B,GIBSON G A.Disk failures in the real world:What does an MTTF of 1 000 000 hours mean to you? [C]//Proceedings of the 5th USENIX Conference on File and Storage Technologies.2007:1-16.
[4]SUN J,LIANG S T,LU X J.An approximately optimal disk repair algorithm for distributed storage systems[J].Science China Information Sciences,2020,50:1834-1849.
[5]HU Y,LIU Y,LI W,et al.Unequal failure protection coding technique for distributed cloud storage Systems[J].IEEE Transactions on Cloud Computing,2021,9(1):386-400.
[6]ZHANG C,XU Y, HU Y P,et al.A Blockchain-based multi-cloud storage data auditing scheme to locate faults[J].IEEE Transactions on Cloud Computing,2022,10(4):2252-2263.
[7]HUANG C,XU L.STAR:An efficient coding scheme for correcting triple storage node failures[J].IEEE Transactions on Computers,2008,57(7):889-901.
[8]KV RASHMI,NIHAR B S,GU D K,et al.A hitchhiker's guide to fast and efcient data reconstruction in erasure-coded data centers[J].ACM SIGCOMM Computer Communication Review,2014,44:331-342.
[9]IRVING S R,GUSTAVE S.Polynomial codes over certain finite fields[J].Journal of the Society for Industrial and Applied Mathematics,1960,8(2):300-304.
[10]HUANG C,SIMITCI H,XU Y,et al.Erasure coding in windows azure storage[C]//Proceedings of the 2012 USENIX Conference on Annual Technical Conference.Berkeley:USENIX Association,2012:15-26.
[11]GOPALAN P,HUANG C,JENKINS B,et al.Explicit maximally recoverable codes with locality[J].IEEE Transactions on Information Theory,2014,60(9),5245-5256.
[12]LIU K,PENG J,HUANG Z,et al.Adaptive and scalable caching with erasure codes in distributed cloud-edge storage systems[J].IEEE Transactions on Cloud Computing,2023,11(2):1840-1853.
[13]HOU H,LEE P P C.Rack-aware regenerating codes for data centers[J].IEEE Transactions on Information Theory,2019,65(8):4730-4745.
[14]SUN M,TANG D,LI Y,et al.An Erasure Code with Low Repair-Cost Based on A Combined-stripe Encoding Structure[J].International Journal of Network Security,2024,26(2):206-216.
[15]ZHOU H,FENG D.Boosting erasure-coded multi-stripe repairin rack architecture and heterogeneous clusters:design and ana-lysis[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(8):2251-2264.
[16]PATIL A,RANGARAO D,SEIPP H,et al.Cloud object storage as a service:IBM cloud object storage from theory to practice-for developers,IT architects and IT specialists[EB/OL].https://www.redbooks.ibm.com.
[17]DELL TECHNICAL WHITE PAPER.Predicts 2021:Dell EMC ECS:High Availability Design[EB/OL].https://www.delltechnologies.com/asset/en-ie/products/storage/industry-market/h16344-ecs-high-availability-design.pdf.
[18]年终盘点:2023年最重大的15次云故障[EB/OL].https://weibo.com/ttarticle/p/show?id=2309404982883017425028.
[19]GREENAN K M,PLANK J S,W J J.Mean time to Meaningless:MTTDL,Markov models,and storage system reliability[C]//HotStorage.2010:1-5.
[20]YANG C C.Large scale distributed storage system:principleanalysis and architecture practice[M].Mechanical Industry Press.2013:22-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!