计算机科学 ›› 2017, Vol. 44 ›› Issue (5): 178-183.doi: 10.11896/j.issn.1002-137X.2017.05.032

• 软件与数据库技术 • 上一篇    下一篇

面向大数据分布式存储的动态负载均衡算法

张栗粽,崔园,罗光春,陈爱国,卢国明,王晓雪   

  1. 电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731,电子科技大学计算机科学与工程学院 成都611731
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受四川省科技厅应用基础(2015JY0228),科技支撑计划(2015SZ0045,2014GZ0174),电子科大基础研究(ZYGX2015J063),海外留学回国人员科研启动费项目基金资助

Dynamic Load Balance Algorithm for Big-data Distributed Storage

ZHANG Li-zong, CUI Yuan, LUO Guang-chun, CHEN Ai-guo, LU Guo-ming and WANG Xiao-xue   

  • Online:2018-11-13 Published:2018-11-13

摘要: 随着大数据时代的到来,分布式存储技术应运而生。目前主流大数据技术Hadoop的HDFS分布式存储系统的元数据存储架构上一直存在可扩展性差和写延迟高等问题,其在官方2.0版本中针对可扩展性的解决方案(Fe-deration)仍不完美,仅解决了原有HDFS扩展性的问题,在元数据分配的问题上没有考虑NameNode的异构性能差异,也未解决NameNode集群动态负载均衡的问题。针对该情况,提出了一种动态负载均衡的分布NameNode算法,通过元数据多副本异构节点的动态适应性备份,使元数据在考虑节点性能及负载的情况下实现了动态分布,保证了元数据服务器集群的性能;同时结合缓存策略及自动恢复机制,提高了元数据的读写性及可用性。该算法在试验验证中达到了较为理想的效果。

关键词: 大数据,分布式存储,元数据管理,HDFS

Abstract: Distributed storage is the major approach for handling the “Big Data”.Currently,the major technology is hadoop distributed file system (HDFS),which has been beset by the issues of scalability and write latency.In official 2.0 version,a new feature‘HDFS Federation’ addresses this limitation by adding support for multiple NameNodes/name spaces to HDFS.However,it does not take the isomerism of NameNode into account,and still lacks of dynamic load balance ability.Consequently,a dynamic load balance algorithm for HDFS NameNode was proposed,and it dynamically allocated the metadata into a NameNodes cluster with multiple copies,in order to improve the performance of metadata utilizations.In addition,the proposed algorithm increases the readability by the adoption of metadata caches,and improves the stability by a built-in failover mechanism.Finally,an experiment was carried out,to illustrate and evaluate the utilizations of the proposed algorithm.

Key words: Big data,Distributed file storage,Meta data management,Hadoop distributed file system (HDFS)

[1] GANTZ J,REINSEL D.The digital universe in 2020:Big data,bigger digital shadows,and biggest growth in the far east.IDC iView:IDC Analyze the future[R/OL].https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf.
[2] GANTZ J F.The Diverse and Exploding Digital Universe.An Idc White Paper Retrieved [R/OL].https://italy.emc.com/collateral/analyst-reports/emc-digital-universe-china-brief.pdf.
[3] TATE J,LUCCHESE F,Moore R,et al.Introduction to Storage Area Networks[M].Vervante,2006.
[4] GIBSON G A,VAN METER R.Network attached storage architecture[J].Communications of the Acm,2000,43(11):37-45.
[5] SHVACHKO K,KUANG H,RADIA S,et al.The hadoop distributed file system[C]∥2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).IEEE,2010:1-10.
[6] WHITE T.Hadoop:The Definitive Guide[M].Yahoo! Press,2011.
[7] ZHANG X.Research and Implementation of Cloud Storage Platform Based on Hadoop[D].Chengdu:University of Electronics and Technology of China,2013.(in Chinese) 张兴.基于Hadoop的云存储平台的研究与实现[D].成都:电子科技大学,2013.
[8] GHEMAWAT S,GOBIOFF H,LEUNG S T.The Google file system[J].Acm Sigops Operating Systems Review,2003,37(5):29-43.
[9] BORTHAKUR D.HDFS architecture guide[EB/OL].https://hadoop.apache.org/docs/r1.2.1/hdfs_design.pdf.
[10] SASHI K,THANAMANI A S.Dynamic replication in a data grid using a Modified BHR Region Based Algorithm[J].Future Generation Computer Systems,2011,27(2):202-210.
[11] TATEBE O,HIRAGE K,SODA N.Gfarm Grid File System[J].New Generation Computing,2010,28(3):257-275.
[12] Hadoop Apache Project,HDFS Federation .http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html.
[13] AZZEDIN F.Towards a scalable HDFS architecture[C]∥2013 International Conference on Collaboration Technologies and Systems (CTS).IEEE,2013:155-161.
[14] STOICA I,MORRIS R,KARGER D,et al.Chord:A scalablepeer-to-peer lookup service for internet applications [J].ACM SIGCOMM Computer Communication Review,2001,31(4):149-160.
[15] BREWER E A.Towards robust distributed systems (abstract)[C]∥Nineteenth ACM Symposium on Principles of Distributed Computing.ACM,2000:7.
[16] GRAY J.The transaction concept:virtues and limitations (invited paper)[C]∥International Conference on Very Large Data Bases.VLDB Endowment,1981:144-154.
[17] GRAY J,REUTER A.Transaction Processing:Concepts andTechniques[M].Morgan Kaufmann Publishers Inc.,1992.
[18] EASTLAKE R D,JONES P.US Secure Hash Algorithm 1(SHA1)[M].RFC Editor,2001.
[19] TZENG G H,HUANG J J.Multiple Attribute Decision Ma-king:Methods and Applications[J].Lecture Notes in Economics &Mathematical Systems,2011,375(4):1-531.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .