计算机科学 ›› 2013, Vol. 40 ›› Issue (Z11): 270-273.

• 数据存储与挖掘 • 上一篇    下一篇

一种应用于大规模存储系统的数据分布算法

郑胜,李通   

  1. 武汉工程大学电气信息学院 武汉430205;武汉工程大学电气信息学院 武汉430205
  • 出版日期:2018-11-16 发布日期:2018-11-16

Data Placement Algorithm for Large-scale Storage System

ZHENG Sheng and LI Tong   

  • Online:2018-11-16 Published:2018-11-16

摘要: 随着大数据时代的到来,PB级、EB级甚至ZB级数据集出现,存储系统的建设需要根据业务的发展,逐渐进行扩展。不同性能存储设备的加入、旧设备的退出以及多设备同时失效等问题的出现对传统存储系统数据分布算法提出严峻挑战。设计了一种新的hash映射算法,该算法引入节点权重和多副本,并考虑节点失效和节点过载情况,能够适应存储系统扩容、节点失效、节点过载的动态环境。该算法能从概率上保证系统伸缩时,数据对象及其副本分布在不同的节点上,以及在节点间保持概率上分布的均衡性和迁移数据量最优;针对系统运行过程中节点失效和节点过载,该算法也进行了有效处理,提高了系统的可用性和性能。通过数学分析和实验验证了该分布算法自动适应存储系统的伸缩变化,保证了数据分布均匀性和对节点失效和过载的有效处理。

关键词: 分布式文件系统,在线扩展,数据映射,数据迁移

Abstract: With the era of big data coming,t PB and EB even ZB-level dataset makes storage system scalable.Traditional data distribution algorithm was confronted with serious challenge because of different performance storage devices added and the old ones quitted,even multiple devices failed simultaneously.A new hash mapping algorithm was proposed which supports the node weight and multi-replica and also considers node failure and node overload.The algorithm can adapt dynamically to change of storage nodes and promises data even distribution probabilistically for different performance nodes.Besides,the one can effectively deal with node failure and node overload which can improve the availability and performance of the system.

Key words: Distributed file system,Scalability,Data placement,Data migration

[1] Goel A,Shahabi C,Yao D S,et al.SCADDAR:An efficient randomized technique to reorganize continuous media blocks [C]∥Proc of the 18th Int Conf on Data Engineering(ICDE 02).Piscataway,NJ:IEEE,2002:473-482
[2] Litwin W, Risch T.LH*g:a high-availability scalable distributed data structure by record grouping[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(4):923-927
[3] 刘仲,周兴铭,等.基于动态区映射的数据对象布局算法[J].软件学报,2005,16(11):1886-1893
[4] Honicky R J,Miller E L.A fast algorithm for online placement and reorganization of replicated data[C]∥Dongarra J,ed.Proc.of the 17th Int’l Parallel & Distributed Processing Symp.Nice:IEEE Computer Society,2003
[5] Honicky R J,Miller E L.Replication under scalable hashing:A family of algorithms for scalable DecentRalized data distribution[C]∥Proceedings of the 18th International Pallel & Distributed Processing Symposium.Santa Fe,NM,2004
[6] 穆飞,薛巍,舒继武,等.一种面向大规模存储系统的数据副本映射算法[J].计算机研究与发展,2009,3:492-497
[7] 罗象宏舒继武.存储系统中的纠删码研究综述[J].计算机研究与展,2012,9(1):1-11
[8] Peter S,Gerhard W,Peter Z.Data partitioning and load balancing in parallel disk systems[J].The VLDB Journal,1998(7):48-66
[9] 潘承洞,潘承彪,等.初等数论(第三版)[M].北京:北京大学出版社,2013
[10] 郑胜,郝毫毫.基于贝努力大数定律的数据分布算法[J].计算机工程,2009,10(19):59-61

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!