计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 85-89.doi: 10.11896/j.issn.1002-137X.2017.04.019

• NASAC 2015 • 上一篇    下一篇

存储中的副本分级存储调度策略

杨冬菊,李青   

  1. 北方工业大学云计算研究中心 北京100144大规模流数据集成与分析技术北京市重点实验室 北京100144,北方工业大学云计算研究中心 北京100144大规模流数据集成与分析技术北京市重点实验室 北京100144
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受北京市教育委员会科技计划重点项目:支持数据资源联动的云服务社区研究(KZ201310009009),北京市属高等学校创新团队建设与教师职业发展计划基金资助

Scheduling Strategy of Hierarchical Storage about Replication in Cloud Storage

YANG Dong-ju and LI Qing   

  • Online:2018-11-13 Published:2018-11-13

摘要: 当集群中的部分节点是廉价主机时,采用HDFS的随机存储策略可能使访问频率高的数据存储在廉价节点上,受到廉价节点的性能影响,访问时间过长,降低了集群效率。为改善以上问题,提出一种改进的副本分级存储调度策略。为减少副本调度的次数,先根据节点的CPU、内存、网络、存储负载以及网络距离来评价节点的性能,再从中选取高性能节点进行存储。副本调度以节点中副本的访问频率为依据,结合硬件配置,把访问频率高的副本尽可能存储在高性能、高配置的节点中,以加快集群响应速度。实验结果表明,改进后的策略可以在异构集群中提高副本的访问效率,优化负载均衡。

关键词: 云存储,HDFS,分级存储,副本调度

Abstract: HDFS takes random storage strategy,if cluster has some cheap nodes,it is possible to make high frequency data store in the low processing performance nodes,causing a long time access and poor efficiency.To solve these problems,an improved scheduling strategy of hierarchical storage about replication was proposed.In order to reduce the number of replication scheduling,firstly,the information of data node from CPU load,memory load,network load,sto-rage load and network distance are used to evaluate node availability.Secondly,the optimal one is selected.Accessing frequency and hardware configuration are used to realize the replication scheduling.The response rate of cluster is improved by making high frequency data store on the high processing performance and high configuration node.The experimental results show that the strategy can improve access efficiency of replicas and local balancing for data storage in the heterogeneous clusters.

Key words: Cloud storage,HDFS,Hierarchical storage,Replication scheduling

[1] CHEN K,ZHENG W M.Clouding Computing:System Instan-ces and Current Research[J].Journal of Software,2009,20(5):1337-1348.(in Chinese) 陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348.
[2] Apache Hadoop .[2013-07-10].http://hadoop.apache.org.
[3] MILOJICIC D,WOLSKI R.Eucalyptus:delivering a private cloud[J].Computer,2011,44(4):102-104.
[4] 蔡斌,陈湘萍.Hadoop技术内幕:深入解析Hadoop Common和HDFS架构设计与实现原理[M].北京:机械工业出版社,2013.
[5] REN C,YANG D J.The two-stage Dynamic Optimized Scheduling Mechanism Based on Cloud Storage[J].Computer & Digital Engineering,2014,42(9):1553-1557,1716.(in Chinese) 任川,杨冬菊.基于云存储的二阶段动态优化调度机制[J].计算机与数字工程,2014,42(9):1553-1557,1716.
[6] TAO Y C,SHI L.Research on Dynamic Management of Data Replicas of Cloud Computing in Heterogeneous Environments[J].Journal of Chinese Computer Systems,2013,34(2):97-102.(in Chinese) 陶永才,石磊.异构资源环境下的MapReduce性能优化[J].小型微型计算机系统,2013,34(2):97-102.
[7] KARGER D,LEHMAN E,LEIGHTON T,et al.Consistenthashing and random trees:distributed caching protocols for relieving hot spots on the world wide web[C]∥ACM Symposium on Theory of Computing.CA,USA,1997:654-663.
[8] XIE J,YIN S,RUAN X J,et al.Improving mapreduce perfor-mance through data placement in heterogeneous Hadoop clusters[C]∥IPDPS Workshops.Atlanta:IEEE Computer Society Press,2010:1-9.
[9] ZAMAN S,GROSU D.A distributed algorithm for the replica placement problem[C]∥Proc.of IEEE Transaction on Parallel and Distributed System.2011:1455-1468.
[10] LUO P,GONG X.Research and Improvement of Data Place-ment Strategy for HDFS [J].Computer Engineering and Design,2014,35(4):1127-1131.(in Chinese) 罗鹏,龚勋.HDFS数据存放策略的研究与改进[J].计算机工程与设计,2014,35(4):1127-1131.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!