计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 130-139.doi: 10.11896/jsjkx.200300124
张航, 唐聃, 蔡红亮
ZHANG Hang, TANG Dan, CAI Hong-liang
摘要: 纠删码消耗的存储空间较少,获得的数据可靠性较高,因此被分布式存储系统广泛采用。但纠删码在修复数据时较高的修复成本限制了其应用。为了降低纠删码的修复成本,研究人员在分组码和再生码上进行了大量的研究。由于分组码和再生码属于被动容错方式,对于一些容易出现失效的节点,采用主动容错的方式能更好地降低修复成本,维护系统的可靠性,因此,提出了一种主动容错的预测式纠删(Proactive basic-Pyramid,PPyramid)码。PPyramid码利用硬盘故障预测方法来调整basic-Pyramid码中冗余块和数据块之间的关联,将预测出的即将出现故障的硬盘划分到同一小组,使得在修复数据时,所有的读取操作在小组内进行,从而减少读取数据块的个数,节省修复成本。在基于Ceph搭建的分布式存储系统中,在修复多个硬盘故障时,将PPyramid码与其他常用的纠删码进行对比。实验结果表明,相比basic-Pyramid码,PPyramid码能降低6.3%~34.9%的修复成本和减少7.6%~63.6%的修复时间,相比LRC码、pLRC码、SHEC码、DLRC码,能降低8.6%~52%的修复成本和减少10.8%~52.4%的修复时间。同时,PPyramid码构造灵活,具有很强的实际应用价值。
中图分类号:
[1]Daily economic news,China's total data will account for 20% of global data in 2020.Information infrastructure protection is the key to big data security[EB/OL].https://baijiahao.baidu.com/s?id=1601722855211864246&wfr=spider&for=pc. [2]WANG Y J,SUN W,ZHOU S,et al.Key technologies of distributed storage in cloud computing environment [J].Journal of Software,2012,(4):232-256. [3]WANG Y J,LI S.Research and performance evaluation of data replication technology in distributed storage systems[J].Computers & Mathematics with Applications,2006,51(11):1625-1632. [4]HUANG C,SIMITCI H,XU Y,et al.Erasure coding in win-dows azure storage[C]//Proceedings of the 2012 USENIX Conference on Annual Technical Conference.USA:USENIX Association,2012:2-2. [5]HUANG C,CHEN M,LI J.Pyramid codes:Flexible schemes totrade space for access efficiency in reliable data storage systems[C]//Sixth IEEE International Symposium on Network Computing and Applications(NCA 2007).Piscataway:IEEE,2007:79-86. [6]SATHIAMOORTHY M,ASTERIS M,PAPAILIOPOULOSD,et al.XORing Elephants:Novel Erasure Codes for Big Data[C]//VLDB2013:Proceedings of the 39th International Confe-rence on Very Large Data Bases.Trento:VLDB Endowment,2013:325-336. [7]ZHOU S,WANG Y J.EXPyramid:An Array-Based FlexibleCoding Scheme with High Fault-Tolerance and Low Recovery-Overhead[J].Journal of Computer Research & Development,2011,48(s1):30-36. [8]MENG Y,ZHANG L,XU D,et al.A Dynamic Erasure Code Based on Block Code[C]//Proceedings of the 2019 International Conference on Embedded Wireless Systems and Networks.USA:Junction Publishing,2019:379-383. [9]MIYAMAE T,NAKAO T,SHIOZAWA K.Erasure code with shingled local parity groups for efficient recovery from multiple disk failures[C]//HotDep 2014:Proceedings of the 10th Workshop on Hot Topics in System Dependability.USA:USENIX Association,2014:5-5. [10]HAFNER,JAMES L.WEAVER Codes:Highly Fault Tolerant Erasure Codes for Storage Systems[C]//Proceedings of the FAST '05 Conference on File and Storage Technologies.USA:USENIX Association,2005:16-16. [11]RASHMI K V,SHAH N B,KUMAR P V,et al.Explicit construction of optimal exact regenerating codes for distributed storage[C]//2009 47th Annual Allerton Conference on Communication,Control,and Computing(Allerton).Piscataway:IEEE,2009:1243-1249. [12]WU Y,DIMAKIS A G.Reducing repair traffic for erasure co-ding-based storage via interference alignment[C]//ISIL2009:Proceedings of the 2009 IEEE international conference on Symposium on Information Theory.Piscataway:IEEE,2009:2276-2280. [13]SCHROEDER B,GIBEON G A.Disk failures in the real world:What does an MTTF of 1 000 000 hours mean to you?[J].ACM Transactions on Storag,2007,7(1):1-16. [14]HUGHES G F,MURRAY J F,KREUTZDELGADO K,et al.Improved disk-drive failure warnings[J].IEEE Transactions on Reliability,2002,51(3):350-357. [15]LI P,LI J,STONES R J,et al.ProCode:A Proactive Erasure Coding Scheme for Cloud Storage Systems[C]//Proceedings of the 2016 IEEE 35th Symposium on Reliable Distributed Systems.Piscataway:IEEE,2016:219-228. [16]HU Y,LIU Y,LI W,et al.Unequal Failure Protection Coding Technique for Distributed Cloud Storage Systems[J].IEEE Transactions on Cloud Computing,2017(99):1-1. [17]ZHANG X Y,XU J,HU Y.Predictive Local Repair Codes in Cloud Storage Systems [J].Journal of Computer Research and Development,2019,56(9):1988-2000. [18]ZHU B,WANG G,LIU X,et al.Proactive drive failure prediction for large scale storage systems[C]//Proceedings of the 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies.Piscataway:IEEE,2013:1-5. [19]HAMERLY G,ELKAN C.Bayesian approaches to failure prediction for disk drives[C]//Proceedings of the Eighteenth International Conference on Machine Learning.USA:Morgan Kaufmann Publishers Inc,2001,1:202-209. [20]HUGHES G F,MURRAY J F,KREUTZ-DELGADO K,et al.Improved disk-drive failure warnings[J].IEEE transactions on reliability,2002,51(3):350-357. [21]MURRAY J F,HUGHES G F,KREUTZ-DELGADO K.Machine Learning Methods for Predicting Failures in Hard Drives:A Multiple-Instance Application[J].Journal of Machine Lear-ning Research,2005,6(1):783-816. [22]LI J,JI X,JIA Y,et al.Hard Drive Failure Prediction Using Classification and Regression Trees[C]//Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Piscataway:IEEE,2014:383-394. [23]CIDON A,ESCRIVA R,KATTI S,et al.Tiered replication:acost-effective alternative to full cluster geo-replication[C]//Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference.USA:USENIX Association,2015:31-43. [24]FORD D,LABELLE F,POPOVICI F I,et al.Availability inGlobally Distributed Storage Systems.[C]//Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.USA:USENIX Association,2010:61-74. [25]SILBERSTEIN M,GANESH L,WANG Y,et al.Lazy means smart:Reducing repair bandwidth costs in erasure-coded distri-buted storage[C]//Proceedings of International Conference on Systems and Storage.New York:Association for Computing Machinery,2014:1-7. |
[1] | 曾友渝, 谢强. 基于改进RNN和VAR的船舶设备故障预测方法 Fault Prediction Method Based on Improved RNN and VAR for Ship Equipment 计算机科学, 2021, 48(6): 184-189. https://doi.org/10.11896/jsjkx.200700117 |
[2] | 张晓, 张思蒙, 石佳, 董聪, 李战怀. Ceph分布式存储系统性能优化技术研究综述 Review on Performance Optimization of Ceph Distributed Storage System 计算机科学, 2021, 48(2): 1-12. https://doi.org/10.11896/jsjkx.201000149 |
[3] | 钟凤艳, 王艳, 李念爽. 异构分布式存储系统再生码数据修复的节点选择方案 Node Selection Scheme for Data Repair in Heterogeneous Distributed Storage Systems 计算机科学, 2019, 46(8): 35-41. https://doi.org/10.11896/j.issn.1002-137X.2019.08.006 |
[4] | 王楠, 孙善武. 基于半监督聚类分析的无人机故障识别 UAV Fault Recognition Based on Semi-supervised Clustering 计算机科学, 2019, 46(6A): 192-195. |
[5] | 刘琴. 计算机取证过程中基于约束的数据质量问题研究 Study on Data Quality Based on Constraint in Computer Forensics 计算机科学, 2018, 45(4): 169-172. https://doi.org/10.11896/j.issn.1002-137X.2018.04.028 |
[6] | 吴炀,付印金,陈卫卫,倪桂强. 一种高效的混合内存布局机制与编码技术 Efficient Mechanism of Hybrid Memory Placement and Erasure Code 计算机科学, 2017, 44(6): 57-62. https://doi.org/10.11896/j.issn.1002-137X.2017.06.009 |
[7] | 金星彤,李鹏,王刚,刘晓光,李忠伟. 基于异或的隐私保护码优化研究 Optimizing Small XOR-based Non-systematic Erasure Codes 计算机科学, 2017, 44(6): 36-42. https://doi.org/10.11896/j.issn.1002-137X.2017.06.006 |
[8] | 刘波,蔡美,周绪川. 数据修复与一致性查询处理研究 Study on Data Repair and Consistency Query Processing 计算机科学, 2016, 43(1): 232-236. https://doi.org/10.11896/j.issn.1002-137X.2016.01.050 |
[9] | 谢晨,王睿智,李 飏,苗夺谦,焦 娜. 数据驱动的燃气涡轮机跳闸预警方法的研究 Research of Data Driven Method for Gas Turbine Trip Prediction 计算机科学, 2015, 42(6): 23-27. https://doi.org/10.11896/j.issn.1002-137X.2015.06.005 |
[10] | 王禹,赵跃龙,侯昉. 基于矩阵运算的最小冗余存储再生码MSRRC研究 Minimum Redundancy Storage Regeneration Code Research MSRRC Based on Matrix Operation 计算机科学, 2014, 41(Z11): 191-194. |
[11] | 陈达智,赵荣彩,姚远,韩林. MPI自动并行化编译系统中消息传递代码生成算法 Message-passing Code Generation Algorithm in the MPI Automatic Parallelizing Compilation System 计算机科学, 2012, 39(6): 301-304. |
[12] | 杜军朝,刘惠,李晓军,张荧俊,张云扬. 基于RS纠删码的无线传感器网络信息分发协议性能评价 Performance Evaluation of Information Dissemination Protocol in WSNs Based on RS Erasure Codes 计算机科学, 2011, 38(Z10): 315-318. |
[13] | 姜国松,邹辰,谢长生. RAID控制器中矩阵重构方法研究 Research on Matrix Reconstruction in RAID Controller 计算机科学, 2009, 36(7): 262-266. https://doi.org/10.11896/j.issn.1002-137X.2009.07.064 |
[14] | . P2P分布式存储系统 计算机科学, 2007, 34(6): 47-48. |
[15] | . 基于专家知识库属性重要度的故障诊断方法研究 计算机科学, 2007, 34(4): 231-233. |
|