Computer Science ›› 2021, Vol. 48 ›› Issue (5): 130-139.doi: 10.11896/jsjkx.200300124

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Predictive Erasure Codes in Distributed Storage System

ZHANG Hang, TANG Dan, CAI Hong-liang   

  1. School of Software Engineering,Chengdu University of Information Technology,Chengdu 610225,China
  • Received:2020-03-23 Revised:2020-07-20 Online:2021-05-15 Published:2021-05-09
  • About author:ZHANG Hang,born in 1995,postgra-duate.His main research interests include coding theory and distributed storage systems.(1521717495@qq.com)
    TANG Dan,born in 1982,Ph.D,professor,is a member of China Computer Federation.His research main research interests include coding theory and distributed storage systems.
  • Supported by:
    Science and Technology Program of Sichuan Province(20ZDYF1156),Major AI Projects(2018GZDZX0030) and Sichuan Science and Technology Achievements Transfer and Transformation Demonstration Project(2018CC0093).

Abstract: Erasure coding consumes less storage space and obtains a higher data reliability,thus being widely used by distributed storage systems.However,when erasure codes are used to repair data,their high repair costs limit their application.In order to reduce the repair cost of erasure codes,researchers have researched a lot on block codes and regenerative codes.But block codes and regeneration codes are passive fault tolerance.For some nodes that are prone to failure,using active fault tolerance can better reduce repair costs and maintain the system reliability.Therefore,this paper proposes a proactive basic-Pyramid(PPyramid) code.The PPyramid code uses the hard disk failure prediction method to adjust the association between redundant and data blocks in the Pyramid code,divides hard disks that are predicted to fail into the same group,thus making all read operations to be performed within the team when recovering data,thereby reducing the number of read data blocks and saving repair costs.In a distributed storage system based on Ceph,it is compared with other commonly used erasure codes,when repairing multiple hard drives.Experimental results show that,PPyramid codes can reduce repair costs by 6.3%~34.9% and decrease repair time by 7.6%~63.6% compared with basic-Pyramid.Compared with LRC code,pLRC code,SHEC code and DLRC code,it can reduce repair costs by 8.6%~52% and decrease repair time by 10.8%~52.4%.Meanwhile,PPyramid codes are flexible in construction and have strong practical application value.

Key words: Data repair, Distributed storage system, Erasure codes, Failure prediction, Hard disk failure

CLC Number: 

  • TP302.8
[1]Daily economic news,China's total data will account for 20% of global data in 2020.Information infrastructure protection is the key to big data security[EB/OL].https://baijiahao.baidu.com/s?id=1601722855211864246&wfr=spider&for=pc.
[2]WANG Y J,SUN W,ZHOU S,et al.Key technologies of distributed storage in cloud computing environment [J].Journal of Software,2012,(4):232-256.
[3]WANG Y J,LI S.Research and performance evaluation of data replication technology in distributed storage systems[J].Computers & Mathematics with Applications,2006,51(11):1625-1632.
[4]HUANG C,SIMITCI H,XU Y,et al.Erasure coding in win-dows azure storage[C]//Proceedings of the 2012 USENIX Conference on Annual Technical Conference.USA:USENIX Association,2012:2-2.
[5]HUANG C,CHEN M,LI J.Pyramid codes:Flexible schemes totrade space for access efficiency in reliable data storage systems[C]//Sixth IEEE International Symposium on Network Computing and Applications(NCA 2007).Piscataway:IEEE,2007:79-86.
[6]SATHIAMOORTHY M,ASTERIS M,PAPAILIOPOULOSD,et al.XORing Elephants:Novel Erasure Codes for Big Data[C]//VLDB2013:Proceedings of the 39th International Confe-rence on Very Large Data Bases.Trento:VLDB Endowment,2013:325-336.
[7]ZHOU S,WANG Y J.EXPyramid:An Array-Based FlexibleCoding Scheme with High Fault-Tolerance and Low Recovery-Overhead[J].Journal of Computer Research & Development,2011,48(s1):30-36.
[8]MENG Y,ZHANG L,XU D,et al.A Dynamic Erasure Code Based on Block Code[C]//Proceedings of the 2019 International Conference on Embedded Wireless Systems and Networks.USA:Junction Publishing,2019:379-383.
[9]MIYAMAE T,NAKAO T,SHIOZAWA K.Erasure code with shingled local parity groups for efficient recovery from multiple disk failures[C]//HotDep 2014:Proceedings of the 10th Workshop on Hot Topics in System Dependability.USA:USENIX Association,2014:5-5.
[10]HAFNER,JAMES L.WEAVER Codes:Highly Fault Tolerant Erasure Codes for Storage Systems[C]//Proceedings of the FAST '05 Conference on File and Storage Technologies.USA:USENIX Association,2005:16-16.
[11]RASHMI K V,SHAH N B,KUMAR P V,et al.Explicit construction of optimal exact regenerating codes for distributed storage[C]//2009 47th Annual Allerton Conference on Communication,Control,and Computing(Allerton).Piscataway:IEEE,2009:1243-1249.
[12]WU Y,DIMAKIS A G.Reducing repair traffic for erasure co-ding-based storage via interference alignment[C]//ISIL2009:Proceedings of the 2009 IEEE international conference on Symposium on Information Theory.Piscataway:IEEE,2009:2276-2280.
[13]SCHROEDER B,GIBEON G A.Disk failures in the real world:What does an MTTF of 1 000 000 hours mean to you?[J].ACM Transactions on Storag,2007,7(1):1-16.
[14]HUGHES G F,MURRAY J F,KREUTZDELGADO K,et al.Improved disk-drive failure warnings[J].IEEE Transactions on Reliability,2002,51(3):350-357.
[15]LI P,LI J,STONES R J,et al.ProCode:A Proactive Erasure Coding Scheme for Cloud Storage Systems[C]//Proceedings of the 2016 IEEE 35th Symposium on Reliable Distributed Systems.Piscataway:IEEE,2016:219-228.
[16]HU Y,LIU Y,LI W,et al.Unequal Failure Protection Coding Technique for Distributed Cloud Storage Systems[J].IEEE Transactions on Cloud Computing,2017(99):1-1.
[17]ZHANG X Y,XU J,HU Y.Predictive Local Repair Codes in Cloud Storage Systems [J].Journal of Computer Research and Development,2019,56(9):1988-2000.
[18]ZHU B,WANG G,LIU X,et al.Proactive drive failure prediction for large scale storage systems[C]//Proceedings of the 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies.Piscataway:IEEE,2013:1-5.
[19]HAMERLY G,ELKAN C.Bayesian approaches to failure prediction for disk drives[C]//Proceedings of the Eighteenth International Conference on Machine Learning.USA:Morgan Kaufmann Publishers Inc,2001,1:202-209.
[20]HUGHES G F,MURRAY J F,KREUTZ-DELGADO K,et al.Improved disk-drive failure warnings[J].IEEE transactions on reliability,2002,51(3):350-357.
[21]MURRAY J F,HUGHES G F,KREUTZ-DELGADO K.Machine Learning Methods for Predicting Failures in Hard Drives:A Multiple-Instance Application[J].Journal of Machine Lear-ning Research,2005,6(1):783-816.
[22]LI J,JI X,JIA Y,et al.Hard Drive Failure Prediction Using Classification and Regression Trees[C]//Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Piscataway:IEEE,2014:383-394.
[23]CIDON A,ESCRIVA R,KATTI S,et al.Tiered replication:acost-effective alternative to full cluster geo-replication[C]//Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference.USA:USENIX Association,2015:31-43.
[24]FORD D,LABELLE F,POPOVICI F I,et al.Availability inGlobally Distributed Storage Systems.[C]//Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.USA:USENIX Association,2010:61-74.
[25]SILBERSTEIN M,GANESH L,WANG Y,et al.Lazy means smart:Reducing repair bandwidth costs in erasure-coded distri-buted storage[C]//Proceedings of International Conference on Systems and Storage.New York:Association for Computing Machinery,2014:1-7.
[1] ZHANG Xiao, ZHANG Si-meng, SHI Jia, DONG Cong, LI Zhan-huai. Review on Performance Optimization of Ceph Distributed Storage System [J]. Computer Science, 2021, 48(2): 1-12.
[2] ZHONG Feng-yan, WANG Yan, LI Nian-shuang. Node Selection Scheme for Data Repair in Heterogeneous Distributed Storage Systems [J]. Computer Science, 2019, 46(8): 35-41.
[3] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics [J]. Computer Science, 2018, 45(4): 169-172.
[4] MA Liang-li and LIU Qing. Researches of Redundancy Coding Technologies on Reducing Reconstruction Data Amount [J]. Computer Science, 2017, 44(Z6): 463-469.
[5] WU Yang, FU Yin-jin, CHEN Wei-wei and NI Gui-qiang. Efficient Mechanism of Hybrid Memory Placement and Erasure Code [J]. Computer Science, 2017, 44(6): 57-62.
[6] JIN Xing-tong, LI Peng, WANG Gang, LIU Xiao-guang and LI Zhong-wei. Optimizing Small XOR-based Non-systematic Erasure Codes [J]. Computer Science, 2017, 44(6): 36-42.
[7] ZHANG Xiao-feng and ZHANG De-ping. Software Failure Prediction Model Based on Quasi-likelihood Method [J]. Computer Science, 2016, 43(Z11): 486-489.
[8] LIU Bo, CAI Mei and ZHOU Xu-chuan. Study on Data Repair and Consistency Query Processing [J]. Computer Science, 2016, 43(1): 232-236.
[9] WANG Yu,ZHAO Yue-long and HOU Fang. Minimum Redundancy Storage Regeneration Code Research MSRRC Based on Matrix Operation [J]. Computer Science, 2014, 41(Z11): 191-194.
[10] DU Jun-zhao , LIU Hui , LI Xiao-jun , ZHANG Ying-jun , ZHANG Yun-yang. Performance Evaluation of Information Dissemination Protocol in WSNs Based on RS Erasure Codes [J]. Computer Science, 2011, 38(Z10): 315-318.
[11] . [J]. Computer Science, 2007, 34(6): 47-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!