计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 202-208.doi: 10.11896/jsjkx.180901623

• 软件与数据库技术 • 上一篇    下一篇

云存储系统中最小开销的数据副本布局转换策略

吴修国, 刘翠   

  1. (山东财经大学管理科学与工程学院 济南250014)
  • 收稿日期:2018-09-03 修回日期:2019-01-29 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 吴修国(1975-),男,博士,副教授,主要研究方向为云计算环境下的数据管理以及任务调度等,E-mail:xiuguosd@163.com。
  • 作者简介:刘翠(1994-),女,硕士生,主要研究方向为云环境下的任务调度等。
  • 基金资助:
    本文受国家自然科学基金(61571272),山东省自然基金(ZR2016FM01)资助。

Data Replicas Distribution Transition Strategy in Cloud Storage System

WU Xiu-guo, LIU Cui   

  1. (School of Management Science and Engineering,Shandong University of Finance and Economics,Jinan 250014,China)
  • Received:2018-09-03 Revised:2019-01-29 Online:2019-10-15 Published:2019-10-21

摘要: 副本技术是提高云存储系统中数据可靠性访问和系统容错性的常用策略。依据用户需求以及环境变化,及时对数据副本布局进行动态调整,是目前副本管理研究的重要内容之一。然而,现有研究大都以副本布局转换是自动完成的为前提,仅关注于数据副本数目与位置等副本布局方案设计,较少涉及副本布局转换的任务调度问题。事实上,副本布局转换是有关多数据中心数据副本迁移与删除操作的复杂任务调度问题,不同的任务调度策略占用的空间、时间不同,由此导致成本、效率等存在较大差异。基于此,首先给出云存储系统中面向多数据中心的数据副本布局转换任务调度模型,以及该问题的可行性分析。然后,从降低成本的角度给出最小开销的数据副本布局转换任务调度问题的定义,并基于0-1背包问题证明其是NP完全的。在此基础上,给出随机(Random)、最小传输开销优先(MTCF)、最大机会成本优先(MOCF)以及同数据最小传输成本优先(MTCFSD)等副本布局转换任务调度策略。最后,以CloudSim为仿真平台进行了模拟实验,结果表明,最小开销的数据副本布局转换策略与同类算法相比,在传输次数上减少了约60%,相对开销降低了约50%,证明了转换策略的可靠性与有效性,从而进一步提升了云存储系统的性能。

关键词: 布局转换, 成本, 副本, 任务调度, 云存储系统

Abstract: Replication is a common method used to improve the data access reliability and system fault tolerance in the cloud storage system.It is one of the most important topics to reschedule the replicas distribution dynamically according to the changes of users’ requirements and environment in the replicas management.However,current replicas redistribution strategies mostly focus on the new replicas schemes,such as replicas number and their placements,on the pre-mise that it can be completed automatically,without taking into account the task scheduling problem in practice.In fact,data replica distribution transition is a complex scheduling problem involving data replicas migration and deletion among data centers.In addition,the required disk space and time of different scheduling strategies have large differences,lea-ding to big difference in cost and efficiency.In this way,this paper first proposed a data replicas distribution transition model and feasibility analysis in the cloud storage environment.Also a minimum-cost data replicas distribution transition problem definition was presented,and then its complexity was proven based on 0-1 Knapsack problem.Besides random strategy,three transition strategies (MTCF,MOCF and MTCFSD) were given from minimum-cost view.In the end,a series of experiments were performed on CloudSim simulation platform.The results show that nearly 60% of transmission number and 50% of transmission cost are reduced compared with other methods,indicating the proposed method’sreliability and effectiveness,so as to further improve the cloud storage system performance.

Key words: Cloud storage system, Cost, Distribution transition, Replicas, Tasks scheduling

中图分类号: 

  • TP391
[1]SUN X,LI Q Z,ZHAO P,et al.An optimized replica distribution method for peer-to-peer network[J].Chinese Journal of Computers,2014,37(6):1424-1434.(in Chinese)
孙新,李庆洲,赵璞,等.对等网络中一种优化的副本分布方法[J].计算机学报,2014,37(6):1424-1434.
[2]POLATO I,RÉ R,GOLDMAN A,et al.A comprehensive view of Hadoop research-a systematic literature review[J].Journal of Network & Computer Applications,2014,46:1-25.
[3]HAMROUNI T,SLIMANI S,CHARRADA F B.A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids[J].Engineering Applications of Artificial Intelligence,2016,48(C):140-158.
[4]WU X.Data sets replicas placements strategy from cost-effective view in the cloud[J].Scientific Programming,2016(11):1-13.
[5]SAADAT N,RAHMANI A M.PDDRA:A new pre-fetching based dynamic data replication algorithm in data grids[J].Future Generation Computer Systems,2012,28(4):666-681.
[6]YANG X L,QIAN C,ZHU F X.Evaluation method of big data service based on cloud computing[J].Computer Science,2018,45(5):295-299.(in Chinese)
阳小兰,钱程,朱福喜.基于云计算的大数据服务资源评价方法[J].计算机科学,2018,45(5):295-299.
[7]LIU W,PENG S,DU W,et al.Security-aware intermediate data placement strategy in scientific cloud workflows[J].Knowledge &Information Systems,2014,41(2):423-447.
[8]LI W,YANG Y,YUAN D.Ensuring Cloud data reliability with minimum replication by proactive replica checking[J].IEEE Transactions on Computers,2016,65(5):1494-1506.
[9]GRACE R K,MANIMEGALAI R.Dynamic replica placement and selection strategies in data grids-A comprehensive survey[J].Journal of Parallel & Distributed Computing,2014,74(2):2099-2108.
[10]WU X G.Minimum cost data replicas distribution based on dynamic programming in the cloud storage system[J].Computer Engineering,2017,43(7):29-37.(in Chinese)
吴修国.云存储系统中基于动态规划的最小开销数据副本布局研究[J].计算机工程,2017,43(7):29-37.
[11]LIU G,LI J,XU J.An Improved Min-Min Algorithm in Cloud Computing[C]//Proceedings of the 2012 International Conference of Modern Computer Science and Applications.Springer Berlin Heidelberg,2013:47-52.
[12]MA J,LI W,FU T,et al.A Novel Dynamic Task Scheduling Algorithm Based on Improved Genetic Algorithm in Cloud Computing[M]//Wireless Communications,Networking and Applications.Springer India,2016:184-186.
[13]TANG Y T,HUANG J,XIAO Q.Task scheduling algorithm for MapReduce based on DAG[J].Computer Science,2014,41(S1):42-46.(in Chinese)
唐一韬,黄晶,肖球.一种基于DAG的MapReduce任务调度算法[J].计算机科学,2014,41(S1):42-46.
[14]KHULLER S,KIM Y A,MALEKIAN A.Improved Approximation Algorithms for Data Migration[J].Algorithmica,2012,63(1/2):347-362.
[15]LU P,ZHANG L,LIU X,et al.Highly efficient data migration and backup for big data applications in elastic optical inter-data-center networks[J].Network IEEE,2015,29(5):36-42.
[16]ANDRONIKOU V,MAMOURAS K,TSERPES K,et al.Dy-namic Qos-aware data replication in grid environments based on data “importance”[J].Future Generation Computer Systems,2012,28(3):544-553.
[17]ATENIESE G,BURNS R,CURTMOLA R,et al.Provable data possession at untrusted stores[C]//Proceedings of ACM Conference on Computer and Communications Security.New York:ACM Press,2007:598-609.
[18]LOUKOPOULOS T,TZIRITAS N,LAMPSAS P,et al.Implementing replica placements:feasibility and cost minimization[C]//Proceedings of Parallel and Distributed Processing Symposium.New York:IEEE Press,2007:1-10.
[19]CALHEIROS R N,RANJAN R,BELOGLAZOV A,et al. Cloudsim:a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms[J].Software Practice & Experience,2011,41(1):23-50.
[1] 邵子灏, 杨世宇, 马国杰.
室内信息服务的基础——低成本定位技术研究综述
Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques
计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[2] 严磊, 张功萱, 王添, 寇小勇, 王国洪.
混合云下具有交付期约束的众包任务调度算法
Scheduling Algorithm for Bag-of-Tasks with Due Date Constraints on Hybrid Clouds
计算机科学, 2022, 49(5): 244-249. https://doi.org/10.11896/jsjkx.210300120
[3] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[4] 田冰川, 田臣, 周宇航, 陈贵海, 窦万春.
减少Hadoop集群中网络队头阻塞的调度算法
Reducing Head-of-Line Blocking on Network in Hadoop Clusters
计算机科学, 2022, 49(3): 11-22. https://doi.org/10.11896/jsjkx.210900117
[5] 谭双杰, 林宝军, 刘迎春, 赵帅.
基于机器学习的分布式星载RTs系统负载调度算法
Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning
计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126
[6] 沈彪, 沈立炜, 李弋.
空间众包任务的路径动态调度方法
Dynamic Task Scheduling Method for Space Crowdsourcing
计算机科学, 2022, 49(2): 231-240. https://doi.org/10.11896/jsjkx.210400249
[7] 薛艳芬, 高继梅, 范贵生, 虞慧群, 许亚杰.
边缘计算中基于能耗感知的容错协同任务执行算法
Energy-aware Fault-tolerant Collaborative Task Execution Algorithm in Edge Computing
计算机科学, 2021, 48(6A): 374-382. https://doi.org/10.11896/jsjkx.200900027
[8] 王政, 姜春茂.
一种基于三支决策的云任务调度优化算法
Cloud Task Scheduling Algorithm Based on Three-way Decisions
计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023
[9] 蔡凌峰, 魏祥麟, 邢长友, 邹霞, 张国敏.
故障场景下的边缘计算DAG任务重调度方法
Failure-resilient DAG Task Rescheduling in Edge Computing
计算机科学, 2021, 48(10): 334-342. https://doi.org/10.11896/jsjkx.210300304
[10] 张龙信, 周立前, 文鸿, 肖满生, 邓晓军.
基于异构云计算的成本约束下的工作流能量高效调度算法
Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems
计算机科学, 2020, 47(8): 112-118. https://doi.org/10.11896/jsjkx.200300038
[11] 刘静, 方贤文.
基于成本对齐的业务流程变化挖掘方法
Mining Method of Business Process Change Based on Cost Alignment
计算机科学, 2020, 47(7): 78-83. https://doi.org/10.11896/jsjkx.190600140
[12] 高子妍, 王勇.
面向云服务的分布式消息系统负载均衡策略
Load Balancing Strategy of Distributed Messaging System for Cloud Services
计算机科学, 2020, 47(6A): 318-324. https://doi.org/10.11896/JsJkx.191100012
[13] 孙敏, 陈中雄, 叶侨楠.
云环境下基于HEDSM的工作流调度策略
Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment
计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047
[14] 洪海诚,陈丹伟.
基于RBEC的副本动态存储方法
Replica Dynamic Storage Based on RBEC
计算机科学, 2020, 47(2): 313-319. https://doi.org/10.11896/jsjkx.181102161
[15] 胡俊钦, 张佳俊, 黄引豪, 陈星, 林兵.
边缘环境下DNN应用的计算迁移调度技术
Computation Offloading Scheduling Technology for DNN Applications in Edge Environment
计算机科学, 2020, 47(10): 247-255. https://doi.org/10.11896/jsjkx.190900106
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!