计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 196-206.doi: 10.11896/jsjkx.241200199

• 数据库&大数据&数据科学 • 上一篇    下一篇

数据空间中基于纠删码的数据布局策略

林兵1,3, 姜海鸥2, 檀啸1, 陈星3,4, 郑裕恒3,4   

  1. 1 福建师范大学物理与能源学院 福州 350117
    2 北京大数据先进技术研究院数据空间技术与系统全国重点实验室 北京 100195
    3 福建省网络计算与智能信息处理重点实验室 福州 350116
    4 福州大学计算机与大数据学院/软件学院 福州 350108
  • 收稿日期:2024-12-30 修回日期:2025-03-28 发布日期:2026-02-10
  • 通讯作者: 姜海鸥(seagullwill@foxmail.com)
  • 作者简介:(wheellx@163.com)
  • 基金资助:
    国家自然科学基金(62072108);福建省高校产学合作项目(2022H6024,2021H6026);数据空间技术与系统全国重点实验室资助项目(QZQC2024015-3);福建省促进海洋与渔业产业高质量发展专项资金(FJHYF-ZH-2023-02);福建省技术创新重点攻关及产业化项目(2024XQ004).

Data Placement Strategy Based on Erasure Code in Data Space

LIN Bing1,3, JIANG Haiou2, TAN Xiao1, CHEN Xing3,4 , ZHENG Yuheng3,4   

  1. 1 College of Physics and Energy,Fujian Normal University,Fuzhou 350117,China
    2 Advanced Institute of Big Data,Beijing,National Key Laboratory of Data Space Technology and System,Beijing 100195,China
    3 Fujian Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116,China
    4 College of Computer and Data Science/College of Software,Fuzhou University,Fuzhou 350108,China
  • Received:2024-12-30 Revised:2025-03-28 Online:2026-02-10
  • About author:LIN Bing,born in 1986,Ph.D,associate professor,postgraduate supervisor,is a member of CCF(No.83773M).His main research interests include cloud computing technology and computatio-nal intelligence.
    JIANG Haiou,born in 1987,Ph.D,associate researcher.Her main research interests include cloud computing technology,big data and service computing.
  • Supported by:
    Natural Science Foundation of China(62072108),University-Industry Cooperation of Fujian Province(2022H6024,2021H6026),Founded Projects of the National Key Laboratory of Data Space Technology and Systems(QZQC2024015-3),Fujian Provincial Special Fund for Promoting the High-Quality Development of Marine and Fishery Industries(FJHYF-ZH-2023-02) and Fujian Province Key Technology Innovation and Industrialization Projects(2024XQ004).

摘要: 针对云边环境下面向多目标优化的科学工作流数据布局问题,考虑数据可靠性、工作流执行时延和数据中心负载均衡等因素,提出了数据空间中基于纠删码的数据布局策略。首先,提出在科学工作流执行时使用低存储开销的纠删码冗余技术以提供容错能力,并通过构建数据空间来管理工作流产生的多样化数据;其次,设计了一种响应式多目标进化算法(Interactive Multi-Objective Evolution Algorithm,IMOEA),同时优化执行时延和数据中心负载均衡,通过与决策者交互,使算法生成的解决方案更符合决策者的期望,提高了优化结果的个性化和可接受性。实验结果表明,针对不同规模和类型的工作流,相比于DIST,MOGA和RAND算法,IMOEA在空间指标(Space,SP)上分别降低了2.3%~36.34%,15.71%~44.01%和22.50%~47.64%,在超体积指标(Hypervolume,HV)上分别优化了7.84%~38.23%,14.65%~48.4%和45.01%~109.45%。此外,IMOEA算法可以很好地对决策者的偏好做出反应,找到令决策者满意的数据布局方案。

关键词: 数据空间, 云边环境, 科学工作流, 数据布局, 纠删码, 多目标优化

Abstract: In response to the multi-objective optimization layout problem of integrated data within scientific workflows in cloud-edge environments,factors such as data reliability,workflow execution latency,and data center load balancing are considered,and a data placement based on erasure coding within the data space is proposed.Firstly,low-storage-overhead erasure code redundancy technology is proposed to provide fault tolerance in scientific workflow execution,and a data space is constructed to manage the diverse data generated by the workflow.Secondly,an Interactive Multi-Objective Evolution Algorithm(IMOEA) is designed to simultaneously optimize execution latency and datacenter load balancing.By interacting with decision-makers,the algorithm generates solutions that better align with the decision-makers’ expectations,enhancing the personalization and acceptability of the optimization results.Experimental results show that for workflows of different scales and types,compared to other algorithms such as DIST,MOGA,and RAND,IMOEA reduces spatial metrics(Space,SP) by 2.3%~36.34%,15.71%~44.01%,and 22.50%~47.64%,and improves hypervolume metrics(Hypervolume,HV) by 7.84%~38.23%,14.65%~48.4%,and 45.01%~109.45%,respectively.Additionally,IMOEA algorithm effectively responds to decision-makers’ preferences,finding satisfactory data placement solutions.

Key words: Data space, Edge-cloud environments, Scientific workflows, Data placement, Erasure code, Multi-objective optimization

中图分类号: 

  • TP338
[1]LI J,LIN B,CHEN X.Reliability Constraint-oriented Workflow Scheduling Strategy in Cloud Environment[J].Computer Science,2023,50(10):291-298.
[2]FRANKLIN M,HALEVY A,MAIER D.From databases todataspaces:a new abstraction for information management[J].ACM Sigmod Record,2005,34(4):27-33.
[3]LI J,LI B.Erasure coding for cloud storage systems:A survey[J].Tsinghua Science and Technology,2013,18(3):259-272.
[4]XIAO G,CALVANESE D,KONTCHAKOV R,et al.Ontology-based data access:A survey[C]//International Joint Confe-rences on Artificial Intelligence.2018:5511-5519.
[5]LI P,CHENG K,JIANG P,et al.Investigation on industrialdataspace for advanced machining workshops:enabling machining operations control with domain knowledge and application case studies[J].Journal of Intelligent Manufacturing,2022,33:103-119.
[6]WANG Y,CHENG Y,ZHU Y,et al.Exploration on industrial system-aware dataspace towards smart manufacturing[C]//2022 IEEE 18th International Conference on Automation Science and Engineering(CASE).IEEE,2022:1883-1889.
[7]LI X J,WU Y,LIU X,et al.Datacenter-Oriented Data Placement Strategy of Workflows in Hybrid Cloud[J].Journal of Software,2015,27(7):1861-1875.
[8]CUI L,ZHANG J,YUE L,et al.A genetic algorithm based data replica placement strategy for scientific applications in clouds[J].IEEE Transactions on Services Computing,2015,11(4):727-739.
[9]LIN B,ZHU F,ZHANG J,et al.A time-driven data placement strategy for a scientific workflow combining edge computing and cloud computing[J].IEEE Transactions on Industrial Informa-tics,2019,15(7):4254-4265.
[10]LI X,ZHANG L,WU Y,et al.A novel workflow-level dataplacement strategy for data-sharing scientific cloud workflows[J].IEEE Transactions on Services Computing,2016,12(3):370-383.
[11]DU X,TANG S,LU Z,et al.A novel data placement strategy for data-sharing scientific workflows in heterogeneous edge-cloud computing environments[C]//2020 IEEE International Conference on Web Services.IEEE,2020:498-507.
[12]DENG K,REN K,ZHU M,et al.A data and task co-scheduling algorithm for scientific cloud workflows[J].IEEE Transactions on Cloud Computing,2015,8(2):349-362.
[13]ZHENG P,CUI L Z,WANG H Y,et al.A Data Placement Strategy for Data-Intensive Applications in Cloud[J].Chinese Journal of Computers,2010,33(8):1472-1480.
[14]SHANG L,LIU X.Scientific Workflow Dataset Layout Basedon Task Assignment and Dataset Replicas[J].Computer Engineering,2020,46(5):122-130.
[15]CHENG H,LI X,WU Y,et al.A multi-objective optimization-based data placement strategy for scientific workflows in cloud environment[J].Computer Applications and Software,2017,34(3):1-6.
[16]WEI X,WANG Y.Popularity-based data placement with load balancing in edge computing[J].IEEE Transactions on Cloud Computing,2021,11(1):397-411.
[17]DENG K,REN K,SONG J,et al.A Clustering based Coschedu-ling Strategy for Efficient Scientific Workflow Execution in Cloud Computing[J].Concurrency and Computation:Practice and Experience,2013,25(18):2523-2539.
[18]WANG X,VEERAVALLI B,SONG J,et al.On the Design and Evaluation of an Optimal Security-and-Time Cognizant Data Placement for Dynamic Fog Environments[J].IEEE Transactions on Parallel and Distributed Systems,2022,34(2):489-500.
[19]HUANG Z Q,LIN B,LU Y,et al.Site Selection and Capacity Determination Method for Charging Stations Oriented to Multi-objective Optimization[J].Journal of Fujian Normal University(Natural Science Edition),2024,40(2):23-35.
[20]BHARATHI S,CHERVENAK A,DEELMAN E,et al.Characterization of scientific workflows[C]//2008 Third Workshop on Workflows in Support of Large-scale Science.IEEE,2008:1-10.
[21]SCHOTT J R.Fault tolerant design using single and multicriteria genetic algorithm optimization[D].Massachusetts:Massachusetts Institute of Technology,1995.
[22]ZITZLER E,THIELE L.Multiobjective evolutionary algo-rithms:a comparative case study and the strength Pareto approach[J].IEEE transactions on Evolutionary Computation,1999,3(4):257-271.
[23]ZHANG M,REN H,XIA C.A dynamic placement policy of virtual machine based on MOGA in cloud environment[C]//2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Confe-rence on Ubiquitous Computing and Communications.IEEE,2017:885-891.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!