Computer Science ›› 2019, Vol. 46 ›› Issue (8): 35-41.doi: 10.11896/j.issn.1002-137X.2019.08.006

• Big Data & Data Science • Previous Articles     Next Articles

Node Selection Scheme for Data Repair in Heterogeneous Distributed Storage Systems

ZHONG Feng-yan, WANG Yan, LI Nian-shuang   

  1. (School of Software,East China Jiaotong University,Nanchang 330013,China)
  • Received:2018-07-12 Online:2019-08-15 Published:2019-08-15

Abstract: In recent years,the growth of massive data poses severe challenges to existing storage systems,including storage cost and data reliability requirements.Because erasure code can provide higher data reliability under the same storage overhead,it has been paid wide attention.However,the coding characteristics of erasure code increases the extra overhead for the storage system using erasure code,in the process of data repair,such as computing,scheduling,transmission,disk reading and writing,and so on.In recent years,the study of erasure code data recovery is based on the assumption that each node in the distributed storage system is indiscriminate.In a large scale of data center,however,equipment replacement,hardware failure and other reasons may not only cause data loss,but also lead to different sto-rage cost of each storage node in the data center,so that the amount of data stored on each storage node is not always the same,this phenomenon is called storage capacity isomerism.The repair process under the heterogeneous storage capacity is faced with the selection of the providers.It is necessary to design a node selection strategy to make the repair cost lower,and improve the reliability and availability of the storage system.Based on the different transformation cost of nodes participating in repair in the actual repair process of data,this paper proposed a node selection strategy,namely tree topology repair algorithm,to reduce the cost of repair in the whole repair process.The simulation results show that the proposed tree selection strategy can further reduce the cost of data repair compared with the fixed node selection strategy of IFR code

Key words: Distributed storage system, Node heterogeneity, Regenerating code, Data repair

CLC Number: 

  • TP309.3
[1] HUANG C,SIMITCI H,XU Y,et al.Erasure coding in windows azure storage[C]∥Usenix Conference on Technical Conference.USENIX Association,2012:2.
[2] BEAVER D,KUMAR S,LI H C,et al.Finding a needle in Haystack:facebook’s photo storage[C]∥Usenix Conference on Operating Systems Design and Implementation.USENIX Association,2010:47-60.
[3] RASHMI K V,BORTHAKUR D,BORTHAKUR D,et al.A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers[C]∥ACM Conference on SIGCOMM.ACM,2014:331-342.
[4] 杨传辉.大规模分布式存储系统:原理解析与架构实战[M].北京:机械工业出版社,2013.
[5] ZOU S C,WANG H Q,FENG G S,et al.Multi-strategy Trust Evolution Model for Cognitive relay Network Based on Moran Process[J].Journal of Chinese Computer Systems,2014,35(10):2209-2214.(in Chinese) 邹世辰,王慧强,冯光升,等.基于Moran过程的认知中继网络多策略信任演化模型[J].小型微型计算机系统,2014,35(10):2209-2214.
[6] SCHROEDER B,GIBSON G.Disk failures in the real world: What does an mttf of 1000000 hours mean to you?[C]∥Proc.of USENIX FAST.2007.
[7] SONG Y,ZHANG F,SHAO Y,et al.Energy efficiency optimization of cognitive relay network based on cooperative spectrum sensing[J].The Journal of China Universities of Posts and Telecommunications,2015,22(3):26-34.
[8] BANG J,LEE J,KIM S,et al.An Efficient Relay Selection Strategy for Random Cognitive Relay Networks[J].IEEE Transactions on Wireless Commsunications,2015,14(3):1555-1566.
[9] BORTHAKUR D.HDFS architecture guide[OL].[2013-08- 04].
[10] ASTERIS M,ASTERIS M,PAPAILIOPOULOS D,et al.XORing elephants:novel erasure codes for big data[J].Proceedings of the VLDB Endowment,2013,6(5):325-336.
[11] DIMAKIS A G,GODFREY P B,WU Y,et al.Network coding for distributed storage systems[J].IEEE Transactions on Information Theory,2010,56(9):4539-4551.
[12] DIMAKIS A G,BRIGHTEN P,WAINWRIGHT M J,et al.The Benefits of Network Coding for Peer-to-Peer Storage Systems[C]∥Proceedings of the International Conference on Computer Communications (INFOCOM).2007.
[13] WANG Y,YIN X,WANG X.Two New Classes of Two-Parity MDS Array Codes With Optimal Repair[J].IEEE Communications Letters,2016,20(7):1293-1296.
[14] SHEN Z,LEE P P C,SHU J,et al.Encoding-Aware Data Placement for Efficient Degraded Reads in XOR-Coded Storage Systems[C]∥Reliable Distributed Systems.IEEE,2016:239-248.
[15] YE L,FENG D,HU Y,et al.Hybrid-RC:Flexible Erasure Codes with Optimized Recovery Performance and Low Storage Overhead[C]∥Reliable Distributed Systems.IEEE,2017:124-133.
[16] WANG J,LUO W,OUYANG M S,et al.Segmentation Coding Scheme Based on Simple Regenerating Codes[J].Computer Science,2016,43(8):148-153.(in Chinese) 王静,罗威,欧阳明生,等.基于简单再生码的分段编码方案[J].计算机科学,2016,43(8):148-153.
[17] AKHLAGHI S,KIANI A,GHANAVATI M R.Cost-ban- dwidth tradeoff in distributed storage systems[J].Computer Communications,2010,33(17):2105-2115.
[18] GERAMI M,XIAO M,SKOGLUND M.Optimal-cost repair in multi-hop distributed storage systems[C]∥IEEE International Symposium on Information Theory Proceedings.IEEE Xplore,2012:1437-1441.
[19] LI S T,JIN X.Low-cost cloud storage scheme based on hybrid strategy[J].Journal of Computer Applications,2014,34(10):2800-2805.(in Chinese) 李松涛,金欣.基于混合策略的低成本云存储方案[J].计算机应用,2014,34(10):2800-2805.
[20] DU Y,XIONG R,JIN J,et al.A Cost-Efficient Data Placement Algorithm with High Reliability in Hadoop[C]∥International Conference on Advanced Cloud & Big Data.IEEE Computer Society,2017:100-105.
[21] YU Q,SHUM K W,CHI W S.Minimization of Storage Cost in Distributed Storage Systems with Repair Consideration[C]∥Global Telecommunications Conference.IEEE,2012:1-5.
[22] YU Q,CHI W S,CHAN T H.Irregular Fractional Repetition Code Optimization for Heterogeneous Cloud Storage[J].IEEE Journal on Selected Areas in Communications,2014,32(5):1048-1060.
[23] LI J,YANG S,WANG X,et al.Tree-structured data regeneration in distributed storage systems with regenerating codes[C]∥Proc of the IEEE INFOCOM.Piscataway,NJ:IEEE,2010:1-9.
[24] XIA M,SAXENA M,BLAUM M,et al.A tale of two erasure codes in HDFS[C]∥Usenix Conference on File and Storage Technologies.USENIX Association,2015:213-226.
[25] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to Algorithms(Second Edition)[M].DBLP,2001.
[26] GONG Q,WANG J,WANG Y,et al.Topology-Aware Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems[J].arXiv preprint arXiv:1506.05579,2015.
[27] SHAH N B,RASHMI K V,KUMAR P V.A flexible class of regenerating codes for distributed storage[C]∥IEEE International Symposium on Information Theory.IEEE,1947:1943-1947.
[28] LI J,YANG S,WANG X,et al.Tree-structured data regeneration with network coding in distributed storage systems[C]∥International Workshop on Quality of Service.IEEE,2009:2892-2900.
[1] WANG Xue-bing. Minimum Storage Regenerating Code with Variable Parameters [J]. Computer Science, 2020, 47(6A): 305-309.
[2] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics [J]. Computer Science, 2018, 45(4): 169-172.
[3] DING Bing-chen and LI Wei-zhong. Partially Regenerating Codes Combined with Replicas [J]. Computer Science, 2016, 43(9): 203-208.
[4] WANG Jing, LUO Wei, OUYANG Ming-sheng, JIANG Can and WANG Xin-mei. Segmentation Coding Scheme Based on Simple Regenerating Codes [J]. Computer Science, 2016, 43(8): 148-153.
[5] LIU Bo, CAI Mei and ZHOU Xu-chuan. Study on Data Repair and Consistency Query Processing [J]. Computer Science, 2016, 43(1): 232-236.
[6] WANG Yu,ZHAO Yue-long and HOU Fang. Minimum Redundancy Storage Regeneration Code Research MSRRC Based on Matrix Operation [J]. Computer Science, 2014, 41(Z11): 191-194.
[7] . [J]. Computer Science, 2007, 34(6): 47-48.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .