基于可用性度量的分布式文件系统节点失效恢复算法

Computer Science ›› 2013, Vol. 40 ›› Issue (1): 144-149.

Node Failure Recovery Algorithm for Distributed File System Based on Measurement of Data Availability

Online:2018-11-16 Published:2018-11-16

Abstract

Abstract: The strategy for distributed file system dealing with node failure needs much bandwidth and disk space resources and affects stability of the system. By studying HDFS' s cluster structure, data blocks storage mechanism, the state relationship between node and block, we defined the cluster nodes matrix, node status matrix, file block partition matrix, block storage matrix and block state matrix Those definitions enable us to model the availability of data block easily. Based on the measurement of data block's availability,we proposed the new node failure recovery algorithm and analyzed the performance of the algorithm. The experimental results show that compared with the original strategy, the new algorithm ensures the availability of all blocks in the system and reduces the bandwidth and disk space resources for recovery, shorts the recovery time, and improvs the stability of system.

Key words: Cloud computing,Distributed file system,Failure recovery,Measurement of data availability