计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 199-207.doi: 10.11896/jsjkx.200900009

• 数据库&大数据&数据科学 • 上一篇    下一篇

混合云环境下基于模糊理论的科学工作流数据布局策略

刘漳辉1,2, 赵旭1,2, 林兵2,3, 陈星1,2   

  1. 1 福州大学数学与计算机科学学院 福州350116
    2 福建省网络计算与智能信息处理重点实验室 福州350116
    3 福建师范大学物理与能源学院 福州350117
  • 收稿日期:2020-09-01 修回日期:2020-12-06 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 林兵(WheelLX@163.com)
  • 作者简介:lzh@fzu.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB1004800);福建省引导性项目(2018H0017)

Data Placement Strategy of Scientific Workflow Based on Fuzzy Theory in Hybrid Cloud

LIU Zhang-hui1,2, ZHAO Xu1,2, LIN Bing2,3, CHEN Xing1,2   

  1. 1 College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350116,China
    2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116,China
    3 College of Physics and Energy,Fujian Normal University,Fuzhou 350117,China
  • Received:2020-09-01 Revised:2020-12-06 Online:2021-11-15 Published:2021-11-10
  • About author:LIU Zhang-hui,born in 1971,master,associate professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include big data technology and intelligent computing.
    LIN Bing,born in 1986,Ph.D,lecturer,postgraduate supervisor,is a member of China Computer Federation.His main research interests include cloud computing and intelligent computing and its application.
  • Supported by:
    National Key R & D Program of China(2018YFB1004800) and Guiding Project of Fujian Province(2018H0017).

摘要: 混合云环境下,合理的数据布局策略对科学工作流的高效执行至关重要。传统的科学工作流数据布局策略主要基于确定性环境,而在实际网络环境中,由于不同数据中心之间的负载不同、带宽波动和网络拥塞等原因以及计算机自身的特性,数据传输时间存在不确定性。为了解决该问题,基于模糊理论,以最小化数据模糊传输时间为目标,提出了一种基于遗传算法算子的模糊自适应离散粒子群优化算法(Fuzzy Adaptive Discrete Particle Swarm Optimization Algorithm Based on Genetic Algorithm Operators,FGA-DPSO),对科学工作流数据进行合理布局,同时满足数据集的隐私要求和数据中心的容量限制。实验结果表明,该算法能够有效地减少混合云环境下科学工作流的数据模糊传输时间。

关键词: 混合云, 科学工作流, 模糊理论, 时间优化, 数据布局

Abstract: A reasonable data placement strategy is essential to the efficient execution of scientific workflow in hybrid cloud environment.The traditional data placement strategy mainly focuses on the deterministic environment,but the data transmission time is uncertain due to the different load,bandwidth fluctuation and network congestion between different data centers and computer characteristics in the actual network environment.To solve this problem,a fuzzy adaptive discrete particle swarm optimization algorithm based on the fuzzy theory and genetic algorithm operator (FGA-DPSO) is proposed to minimize the fuzzy transmission time of data,place the scientific workflow data reasonably and meet the privacy requirements of the data set and the capacity limit of the data center.The experimental results show that the algorithm can effectively reduce the fuzzy data transmission time of scientific workflow in hybrid cloud environment.

Key words: Data placement, Fuzzy theory, Hybrid cloud, Scientific workflow, Time optimization

中图分类号: 

  • TP338
[1]WEISS A.Computing in the clouds[J].Networker,2007,11(4):16-25.
[2]ABRISHAMI H R,REZAEIAN A,TOUSI G K,et al.Scheduling in hybrid cloud to maintain data privacy[C]//Proceedings of the 5th International Conference on the Innovative Computing Technology (INTECH 2015).Piscataway:IEEE,2015:83-88.
[3]ZHAO Z,BELLOUM A,BUBAK M.Special section on workflow systems and applications in e-Science[J].Future Generation Computer Systems,2009,25(5):525-527.
[4]SZABO C,SHENG Q Z,KROEGER T,et al.Science in the Cloud:Allocation and execution of data-intensive scientific workflows[J].Journal of Grid Computing,2014,12(2):245-264.
[5]LI X,ZHANG L,WU Y,et al.A novel workflow-level data placement strategy for data-sharing scientific cloud workflows[J].IEEE Transactions on Services Computing,2019,12(3):370-383.
[6]ZHONG J,YANG Q,GAO W.Dynamic Scheduling Algorithm for Scalable Big Data Stream in Internet of Things[J].Journal of Chongqing University of Technology(Natural Science),2019,33(9):182-189.
[7]SHANG L,LIU X P.Scientific Workflow Dataset Layout Based on Task Assignment and Dataset Replicas[J].Computer Engineering,2020,46(5):122-130,138.
[8]ZHANG L,ZHOU L,WEN H,et al.Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems[J].Computer Science,2020,47(8):112-118.
[9]YUAN D,YANG Y,LIU X,et al.A data placement strategy in scientific cloud workflows[J].Future Generation Computer Systems,2010,26(8):1200-1214.
[10]HUANG D M,DU Y L,HE Q,et al.Marine Monitoring Data Replica Layout Strategy Based on Multiple Attribute Optimization[J].Computer Science,2018,45(6):72-75,104.
[11]CUI L,ZHANG J,YUE L,et al.A genetic algorithm based data replica placement strategy for scientific applications in clouds[J].IEEE Transactions on Services Computing,2018,11(4):727-739.
[12]LIU S W,KONG L M,REN K J,et al.A two-step data placement and task scheduling strategy for optimizing scientific workflow performance on cloud computing platform[J].Chinese Journal of Computers,2011,34(11):2121-2130.
[13]DENG K,REN K,ZHU M,et al.A data and task co-scheduling algorithm for scientific cloud workflows[J].IEEE Transactions on Cloud Computing,2015,8(2):349-362.
[14]ZHAO Q,XIONG C,ZHAO X,et al.A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud[C]//Proceedings of the 15th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing.Piscataway:IEEE,2015:928-934.
[15]BHATTACHARYA H,CHATTOPADHYAY S,CHATTO-PADHYAY M.Problems with Replica Placement Using Data Dependency in Scientific Cloud Workflow[C]//Proceedings of the 5th International Conference on Emerging Applications of Information Technology (EAIT).Piscataway:IEEE,2018.
[16]HUANG Y H,MA Y,LIN B,et al.Cost-driven Workflow Data Placement Method in Hybrid Cloud Environment[J].Computer Science,2019,46(11A):354-358,386.
[17]HAROONABADI A,TESHNEHLAB M.Behavior modeling in uncertain information systems by fuzzy-UML[J].International Journal of Soft Computing,2009,4(1):32-38.
[18]ZADEH L A.Fuzzy sets[J].Information and Control,1965,8(3):338-353.
[19]JUAN J P,MIGUEL A J,CAMINO R V,et al.Genetic tabusearch for the fuzzy flexible job shop problem[J].Computers & Operations Research,2015,54(C):74-89.
[20]LEI D.Fuzzy job shop scheduling problem with availability constraints[J].Computers Industrial Engineering,2010,58(4):610-617.
[21]CANG P,WANG S.The analysis of uncertain knowledge based on meaning of information[J].WSEAS Transactions on Information Science and Applications,2009,6(1):136-145.
[22]SAKAWA M,KUBOTA R.Fuzzy programming for multiobjective job shop scheduling with fuzzy processing time and fuzzy duedate through genetic algorithms[J].European Journal of Operational Research,2000,120(2):393-407.
[23]LEE E S,LI R J.Comparison of fuzzy numbers based on theprobability measure of fuzzy events[J].Computers & Mathematics with Applications,1988,15(10):887-896.
[24]DENG K,REN K,SONG J,et al.A clustering based coschedu-ling strategy for efficient scientific workflow execution in cloud computing[J].Concurrency & Computation Practice & Expe-rience,2014,25(18):2523-2539.
[25]KENNEDY J,EBERHART R.Particle swarm optimization[C]//Proceedings of IEEE International Conference on Neural Networks.Piscataway:IEEE,1995:1942-1948.
[26]MASDARI M,SALEHI F,JALALI M,et al.A survey of PSO-based scheduling algorithms in cloud computing[J].Journal of Network & Systems Management,2017,25(1):122-158.
[27]SHI Y,EBERHART R C.A modified particle swarm optimizer[C]//Proceedings of IEEE International Conference on Evolutionary Computation.Piscataway:IEEE,1998:69-73.
[28]BHARATHI S,CHERVENAK A,DEELMAN E,et al.Characterization of scientific workflows[C]//Proceedings of Workshop on Workflows in Support of Large-scale Science.Piscataway:IEEE,2008.
[1] 吴功兴, 孙兆洋, 琚春华.
考虑中断风险与模糊定价的闭环供应链网络设计模型
Closed-loop Supply Chain Network Design Model Considering Interruption Risk and Fuzzy Pricing
计算机科学, 2022, 49(7): 220-225. https://doi.org/10.11896/jsjkx.201100084
[2] 柳鹏, 刘波, 周娜琴, 彭心怡, 林伟伟.
混合云工作流调度综述
Survey of Hybrid Cloud Workflow Scheduling
计算机科学, 2022, 49(5): 235-243. https://doi.org/10.11896/jsjkx.210300303
[3] 严磊, 张功萱, 王添, 寇小勇, 王国洪.
混合云下具有交付期约束的众包任务调度算法
Scheduling Algorithm for Bag-of-Tasks with Due Date Constraints on Hybrid Clouds
计算机科学, 2022, 49(5): 244-249. https://doi.org/10.11896/jsjkx.210300120
[4] 陈海彪, 黄声勇, 蔡洁锐.
一个基于智能电网的跨层路由的信任评估协议
Trust Evaluation Protocol for Cross-layer Routing Based on Smart Grid
计算机科学, 2021, 48(6A): 491-497. https://doi.org/10.11896/jsjkx.201000169
[5] 季琰, 戴华, 姜莹莹, 杨庚, 易训.
面向混合云的可并行多关键词Top-k密文检索技术
Parallel Multi-keyword Top-k Search Scheme over Encrypted Data in Hybrid Clouds
计算机科学, 2021, 48(5): 320-327. https://doi.org/10.11896/jsjkx.200300160
[6] 穆晓芳, 邓红霞, 李晓宾, 赵鹏.
基于人工蜂群算法的两阶段图像隐写分析算法
Two-phase Image Steganalysis Algorithm Based on Artificial Bee Colony Algorithm
计算机科学, 2019, 46(6): 174-179. https://doi.org/10.11896/j.issn.1002-137X.2019.06.026
[7] 黄引豪, 马郓, 林兵, 於志勇, 陈星.
混合云环境下面向代价优化的工作流数据布局方法
Cost-driven Workflow Data Placement Method in Hybrid Cloud Environment
计算机科学, 2019, 46(11A): 354-358.
[8] 徐健锐, 朱会娟.
基于自适应惩罚函数的云工作流调度协同进化遗传算法
Coevolutionary Genetic Algorithm of Cloud Workflow Scheduling Based on Adaptive Penalty Function
计算机科学, 2018, 45(8): 105-112. https://doi.org/10.11896/j.issn.1002-137X.2018.08.019
[9] 黄冬梅, 杜艳玲, 贺琪, 随宏运, 李瑶.
基于多属性最优化的海洋监测数据副本布局策略
Marine Monitoring Data Replica Layout Strategy Based on Multiple Attribute Optimization
计算机科学, 2018, 45(6): 72-75. https://doi.org/10.11896/j.issn.1002-137X.2018.06.012
[10] 张桂鹏, 陈平华.
一种混合云环境下基于Merkle哈希树的数据安全去重方案
Secure Data Deduplication Scheme Based on Merkle Hash Tree in HybridCloud Storage Environments
计算机科学, 2018, 45(11): 187-192. https://doi.org/10.11896/j.issn.1002-137X.2018.11.029
[11] 李贞,张卓,王黎明.
基于三元概念分析的文本分类算法研究
Research on Text Classification Algorithm Based on Triadic Concept Analysis
计算机科学, 2017, 44(8): 207-215. https://doi.org/10.11896/j.issn.1002-137X.2017.08.036
[12] 缪嘉嘉,付印金,毛捍东.
KingCloud:智能对象归档系统
KingCloud:Object Oriented Archiving System
计算机科学, 2016, 43(Z11): 575-577. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.130
[13] 范菁,沈杰,熊丽荣.
混合云环境中数据敏感工作流调度
Scheduling Data Sensitive Workflow in Hybrid Cloud
计算机科学, 2015, 42(Z11): 400-405.
[14] 王宗江,郑秋生,曹健.
混合云中的一个高效协调器
Efficient Coordinator in Hybrid Cloud
计算机科学, 2015, 42(1): 92-95. https://doi.org/10.11896/j.issn.1002-137X.2015.01.022
[15] 李 艺,李新明,崔云飞.
软件脆弱性危险程度量化评估模型研究
Research of Evaluating Model on the Criticality of Software Vulnerability
计算机科学, 2011, 38(6): 169-172.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!