Computer Science ›› 2025, Vol. 52 ›› Issue (2): 48-57.doi: 10.11896/jsjkx.241000022

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Erasure Code Algorithm for Three Data Centers

SUN Jing1, NIU Hongting2, LIANG Songtao3   

  1. 1 Department of Intelligent Science and Information Law,East China University of Political Science and Law,Shanghai 201620,China
    2 School of Computer Science and Engineering,Beihang University,Beijing 100191,China
    3 Multimedia Technology Center,Shanghai Bilibili Technology Co.,Ltd,Shanghai 200433,China
  • Received:2024-10-08 Revised:2024-11-26 Online:2025-02-15 Published:2025-02-17
  • About author:SUN Jing,born in 1985,Ph.D,lecturer,is a member of CCF(No.30246M).Her main research interests include distributed storage systems,cloud computing and knowledge distillation.
  • Supported by:
    National Natural Science Foundation of China(12161080) and Open Research Fund Program of Data Recovery Key Laboratory of Sichuan Province (DRN2204).

Abstract: Erasure coding algorithms are widely applied in both single-data-center and multi-data-center environments.Current research on erasure coding algorithms primarily focuses on storage cost and repair bandwidth.However,challenges such as how to perform repairs across multiple data centers with limited dedicated bandwidth and switch constraints,and how to balance key factors like reliability and fault tolerance,have not been thoroughly analyzed or addressed.This study targets the three-data-center scenario,which is one of the most common multi-data-center configurations.First,it identifies four crucial factors for erasure coding in system design:redundancy,reliability,fault tolerance,and decoding bandwidth.Based on these factors,it then proposes ansingle-data-center local reconstruction code(S-LRC) algorithm that achieves optimal bandwidth repair within a single data center.Building on the S-LRC algorithm,it further develops the global local reconstruction code(G-LRC) algorithm to accommodate the architecture of a three-data-center setup.Compared to traditional coding schemes,the proposed G-LRC algorithm offers higher reliability,greater fault tolerance,and a lower decoding bandwidth penalty.Specifically,for two-node failures,the decoding bandwidth penalty of G-LRC is only 1/7~2/7 of that of traditional schemes.Finally,the G-LRC algorithm is implemented and validated in a large file storage system,where an optimal decoding decision algorithm is designed to reduce repair bandwidth,addressing the deployment challenges of non maximum distance separable(non-MDS) codes in practical systems.

Key words: Three data center, Erasure code, Local reconstruction code, Maximum recovery code, Reed-solomon code

CLC Number: 

  • TP391.4
[1]FORD D,LABELLE F,POPOVICI F I,et al.Availability inGlobally Distributed Storage Systems[C]//The 9th USENIX Symposium on Operating Systems Design and Implementation.2010:61-74.
[2]SVEND F,ARIF M,YASUSHI S,et al.A decentralized algo-rithm for erasure-coded virtual disks[C]//In International Conference on Dependable Systems and Networks.2004:125-134.
[3]SCHROEDER B,GIBSON G A.Disk failures in the real world:What does an MTTF of 1 000 000 hours mean to you? [C]//Proceedings of the 5th USENIX Conference on File and Storage Technologies.2007:1-16.
[4]SUN J,LIANG S T,LU X J.An approximately optimal disk repair algorithm for distributed storage systems[J].Science China Information Sciences,2020,50:1834-1849.
[5]HU Y,LIU Y,LI W,et al.Unequal failure protection coding technique for distributed cloud storage Systems[J].IEEE Transactions on Cloud Computing,2021,9(1):386-400.
[6]ZHANG C,XU Y, HU Y P,et al.A Blockchain-based multi-cloud storage data auditing scheme to locate faults[J].IEEE Transactions on Cloud Computing,2022,10(4):2252-2263.
[7]HUANG C,XU L.STAR:An efficient coding scheme for correcting triple storage node failures[J].IEEE Transactions on Computers,2008,57(7):889-901.
[8]KV RASHMI,NIHAR B S,GU D K,et al.A hitchhiker's guide to fast and efcient data reconstruction in erasure-coded data centers[J].ACM SIGCOMM Computer Communication Review,2014,44:331-342.
[9]IRVING S R,GUSTAVE S.Polynomial codes over certain finite fields[J].Journal of the Society for Industrial and Applied Mathematics,1960,8(2):300-304.
[10]HUANG C,SIMITCI H,XU Y,et al.Erasure coding in windows azure storage[C]//Proceedings of the 2012 USENIX Conference on Annual Technical Conference.Berkeley:USENIX Association,2012:15-26.
[11]GOPALAN P,HUANG C,JENKINS B,et al.Explicit maximally recoverable codes with locality[J].IEEE Transactions on Information Theory,2014,60(9),5245-5256.
[12]LIU K,PENG J,HUANG Z,et al.Adaptive and scalable caching with erasure codes in distributed cloud-edge storage systems[J].IEEE Transactions on Cloud Computing,2023,11(2):1840-1853.
[13]HOU H,LEE P P C.Rack-aware regenerating codes for data centers[J].IEEE Transactions on Information Theory,2019,65(8):4730-4745.
[14]SUN M,TANG D,LI Y,et al.An Erasure Code with Low Repair-Cost Based on A Combined-stripe Encoding Structure[J].International Journal of Network Security,2024,26(2):206-216.
[15]ZHOU H,FENG D.Boosting erasure-coded multi-stripe repairin rack architecture and heterogeneous clusters:design and ana-lysis[J].IEEE Transactions on Parallel and Distributed Systems,2023,34(8):2251-2264.
[16]PATIL A,RANGARAO D,SEIPP H,et al.Cloud object storage as a service:IBM cloud object storage from theory to practice-for developers,IT architects and IT specialists[EB/OL].https://www.redbooks.ibm.com.
[17]DELL TECHNICAL WHITE PAPER.Predicts 2021:Dell EMC ECS:High Availability Design[EB/OL].https://www.delltechnologies.com/asset/en-ie/products/storage/industry-market/h16344-ecs-high-availability-design.pdf.
[18]年终盘点:2023年最重大的15次云故障[EB/OL].https://weibo.com/ttarticle/p/show?id=2309404982883017425028.
[19]GREENAN K M,PLANK J S,W J J.Mean time to Meaningless:MTTDL,Markov models,and storage system reliability[C]//HotStorage.2010:1-5.
[20]YANG C C.Large scale distributed storage system:principleanalysis and architecture practice[M].Mechanical Industry Press.2013:22-23.
[1] ZHANG Yushu, HE Xiaotong, XIAO Xiangli, ZHU Youwen, WANG Liangming. Grouping Storage Optimization Method for Blockchain Ledger Based on Erasure Code [J]. Computer Science, 2023, 50(10): 350-361.
[2] ZHANG Hang, TANG Dan, CAI Hong-liang. Study on Predictive Erasure Codes in Distributed Storage System [J]. Computer Science, 2021, 48(5): 130-139.
[3] MA Liang-li and LIU Qing. Researches of Redundancy Coding Technologies on Reducing Reconstruction Data Amount [J]. Computer Science, 2017, 44(Z6): 463-469.
[4] WU Yang, FU Yin-jin, CHEN Wei-wei and NI Gui-qiang. Efficient Mechanism of Hybrid Memory Placement and Erasure Code [J]. Computer Science, 2017, 44(6): 57-62.
[5] JIN Xing-tong, LI Peng, WANG Gang, LIU Xiao-guang and LI Zhong-wei. Optimizing Small XOR-based Non-systematic Erasure Codes [J]. Computer Science, 2017, 44(6): 36-42.
[6] WU Qiu-ping, LIU Bo and LIN Wei-wei. Adaptive Replacement Cache Mechanism for Fault Tolerance in Cloud Storage [J]. Computer Science, 2015, 42(Z6): 332-336.
[7] DU Jun-zhao , LIU Hui , LI Xiao-jun , ZHANG Ying-jun , ZHANG Yun-yang. Performance Evaluation of Information Dissemination Protocol in WSNs Based on RS Erasure Codes [J]. Computer Science, 2011, 38(Z10): 315-318.
[8] JIANG Guo-song,ZOU Chen,XIE Chang-sheng. Research on Matrix Reconstruction in RAID Controller [J]. Computer Science, 2009, 36(7): 262-266.
[9] LIU Gang, ZHOU Jing-Li, Qin Lei-Hua ,CHEN Xiao-Ping (Computer Department of Huazhong University of Science and Technology, Wuhan 430074). [J]. Computer Science, 2007, 34(5): 75-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!