Computer Science ›› 2026, Vol. 53 ›› Issue (2): 196-206.doi: 10.11896/jsjkx.241200199

• Database & Big Data & Data Science • Previous Articles     Next Articles

Data Placement Strategy Based on Erasure Code in Data Space

LIN Bing1,3, JIANG Haiou2, TAN Xiao1, CHEN Xing3,4 , ZHENG Yuheng3,4   

  1. 1 College of Physics and Energy,Fujian Normal University,Fuzhou 350117,China
    2 Advanced Institute of Big Data,Beijing,National Key Laboratory of Data Space Technology and System,Beijing 100195,China
    3 Fujian Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116,China
    4 College of Computer and Data Science/College of Software,Fuzhou University,Fuzhou 350108,China
  • Received:2024-12-30 Revised:2025-03-28 Published:2026-02-10
  • About author:LIN Bing,born in 1986,Ph.D,associate professor,postgraduate supervisor,is a member of CCF(No.83773M).His main research interests include cloud computing technology and computatio-nal intelligence.
    JIANG Haiou,born in 1987,Ph.D,associate researcher.Her main research interests include cloud computing technology,big data and service computing.
  • Supported by:
    Natural Science Foundation of China(62072108),University-Industry Cooperation of Fujian Province(2022H6024,2021H6026),Founded Projects of the National Key Laboratory of Data Space Technology and Systems(QZQC2024015-3),Fujian Provincial Special Fund for Promoting the High-Quality Development of Marine and Fishery Industries(FJHYF-ZH-2023-02) and Fujian Province Key Technology Innovation and Industrialization Projects(2024XQ004).

Abstract: In response to the multi-objective optimization layout problem of integrated data within scientific workflows in cloud-edge environments,factors such as data reliability,workflow execution latency,and data center load balancing are considered,and a data placement based on erasure coding within the data space is proposed.Firstly,low-storage-overhead erasure code redundancy technology is proposed to provide fault tolerance in scientific workflow execution,and a data space is constructed to manage the diverse data generated by the workflow.Secondly,an Interactive Multi-Objective Evolution Algorithm(IMOEA) is designed to simultaneously optimize execution latency and datacenter load balancing.By interacting with decision-makers,the algorithm generates solutions that better align with the decision-makers’ expectations,enhancing the personalization and acceptability of the optimization results.Experimental results show that for workflows of different scales and types,compared to other algorithms such as DIST,MOGA,and RAND,IMOEA reduces spatial metrics(Space,SP) by 2.3%~36.34%,15.71%~44.01%,and 22.50%~47.64%,and improves hypervolume metrics(Hypervolume,HV) by 7.84%~38.23%,14.65%~48.4%,and 45.01%~109.45%,respectively.Additionally,IMOEA algorithm effectively responds to decision-makers’ preferences,finding satisfactory data placement solutions.

Key words: Data space, Edge-cloud environments, Scientific workflows, Data placement, Erasure code, Multi-objective optimization

CLC Number: 

  • TP338
[1]LI J,LIN B,CHEN X.Reliability Constraint-oriented Workflow Scheduling Strategy in Cloud Environment[J].Computer Science,2023,50(10):291-298.
[2]FRANKLIN M,HALEVY A,MAIER D.From databases todataspaces:a new abstraction for information management[J].ACM Sigmod Record,2005,34(4):27-33.
[3]LI J,LI B.Erasure coding for cloud storage systems:A survey[J].Tsinghua Science and Technology,2013,18(3):259-272.
[4]XIAO G,CALVANESE D,KONTCHAKOV R,et al.Ontology-based data access:A survey[C]//International Joint Confe-rences on Artificial Intelligence.2018:5511-5519.
[5]LI P,CHENG K,JIANG P,et al.Investigation on industrialdataspace for advanced machining workshops:enabling machining operations control with domain knowledge and application case studies[J].Journal of Intelligent Manufacturing,2022,33:103-119.
[6]WANG Y,CHENG Y,ZHU Y,et al.Exploration on industrial system-aware dataspace towards smart manufacturing[C]//2022 IEEE 18th International Conference on Automation Science and Engineering(CASE).IEEE,2022:1883-1889.
[7]LI X J,WU Y,LIU X,et al.Datacenter-Oriented Data Placement Strategy of Workflows in Hybrid Cloud[J].Journal of Software,2015,27(7):1861-1875.
[8]CUI L,ZHANG J,YUE L,et al.A genetic algorithm based data replica placement strategy for scientific applications in clouds[J].IEEE Transactions on Services Computing,2015,11(4):727-739.
[9]LIN B,ZHU F,ZHANG J,et al.A time-driven data placement strategy for a scientific workflow combining edge computing and cloud computing[J].IEEE Transactions on Industrial Informa-tics,2019,15(7):4254-4265.
[10]LI X,ZHANG L,WU Y,et al.A novel workflow-level dataplacement strategy for data-sharing scientific cloud workflows[J].IEEE Transactions on Services Computing,2016,12(3):370-383.
[11]DU X,TANG S,LU Z,et al.A novel data placement strategy for data-sharing scientific workflows in heterogeneous edge-cloud computing environments[C]//2020 IEEE International Conference on Web Services.IEEE,2020:498-507.
[12]DENG K,REN K,ZHU M,et al.A data and task co-scheduling algorithm for scientific cloud workflows[J].IEEE Transactions on Cloud Computing,2015,8(2):349-362.
[13]ZHENG P,CUI L Z,WANG H Y,et al.A Data Placement Strategy for Data-Intensive Applications in Cloud[J].Chinese Journal of Computers,2010,33(8):1472-1480.
[14]SHANG L,LIU X.Scientific Workflow Dataset Layout Basedon Task Assignment and Dataset Replicas[J].Computer Engineering,2020,46(5):122-130.
[15]CHENG H,LI X,WU Y,et al.A multi-objective optimization-based data placement strategy for scientific workflows in cloud environment[J].Computer Applications and Software,2017,34(3):1-6.
[16]WEI X,WANG Y.Popularity-based data placement with load balancing in edge computing[J].IEEE Transactions on Cloud Computing,2021,11(1):397-411.
[17]DENG K,REN K,SONG J,et al.A Clustering based Coschedu-ling Strategy for Efficient Scientific Workflow Execution in Cloud Computing[J].Concurrency and Computation:Practice and Experience,2013,25(18):2523-2539.
[18]WANG X,VEERAVALLI B,SONG J,et al.On the Design and Evaluation of an Optimal Security-and-Time Cognizant Data Placement for Dynamic Fog Environments[J].IEEE Transactions on Parallel and Distributed Systems,2022,34(2):489-500.
[19]HUANG Z Q,LIN B,LU Y,et al.Site Selection and Capacity Determination Method for Charging Stations Oriented to Multi-objective Optimization[J].Journal of Fujian Normal University(Natural Science Edition),2024,40(2):23-35.
[20]BHARATHI S,CHERVENAK A,DEELMAN E,et al.Characterization of scientific workflows[C]//2008 Third Workshop on Workflows in Support of Large-scale Science.IEEE,2008:1-10.
[21]SCHOTT J R.Fault tolerant design using single and multicriteria genetic algorithm optimization[D].Massachusetts:Massachusetts Institute of Technology,1995.
[22]ZITZLER E,THIELE L.Multiobjective evolutionary algo-rithms:a comparative case study and the strength Pareto approach[J].IEEE transactions on Evolutionary Computation,1999,3(4):257-271.
[23]ZHANG M,REN H,XIA C.A dynamic placement policy of virtual machine based on MOGA in cloud environment[C]//2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Confe-rence on Ubiquitous Computing and Communications.IEEE,2017:885-891.
[1] WEN Jia, WU Shuxia, YU Zhengxin, MIAO Wang, CHEN Zheyi. Multi-objective Optimization for Virtual Machine Placement in Large-scale Hadoop Cluster [J]. Computer Science, 2026, 53(2): 387-395.
[2] HU Kangqi, MA Wubin, DAI Chaofan, WU Yahui, ZHOU Haohao. Federated Learning Evolutionary Multi-objective Optimization Algorithm Based on Improved NSGA-III [J]. Computer Science, 2025, 52(3): 152-160.
[3] SUN Jing, NIU Hongting, LIANG Songtao. Study on Erasure Code Algorithm for Three Data Centers [J]. Computer Science, 2025, 52(2): 48-57.
[4] SUN Liangxu, LI Linlin, LIU Guoli. Sub-problem Effectiveness Guided Multi-objective Evolution Algorithm [J]. Computer Science, 2025, 52(10): 296-307.
[5] ZHAO Chenyang, LIU Lei, JIANG He. Feature Construction for Effort-aware Just-In-Time Software Defect Prediction Based on Multi-objective Optimization [J]. Computer Science, 2025, 52(1): 232-241.
[6] ZHOU Yu, YANG Junling, DANG Kelin. Change Detection in SAR Images Based on Evolutionary Multi-objective Clustering [J]. Computer Science, 2024, 51(9): 140-146.
[7] HAN Lijun, WANG Peng, LI Ruixu, LIU Zhongyao. Dual Direction Vectors-based Large-scale Multi-objective Evolutionary Algorithm [J]. Computer Science, 2024, 51(6A): 230700155-11.
[8] XIE Genlin, CHENG Guozhen, LIANG Hao, WANG Qingfeng. Software Diversity Composition Based on Multi-objective Optimization Algorithm NSGA-II [J]. Computer Science, 2024, 51(6): 85-94.
[9] ZHU Wei, YANG Shibo, TENG Fan, HE Defeng. Study on Unmanned Vehicle Trajectory Planning in Unstructured Scenarios [J]. Computer Science, 2024, 51(4): 334-343.
[10] WANG Zhihong, WANG Gaocai, ZHAO Qifei. Multi-objective Optimization of D2D Collaborative MEC Based on Improved NSGA-III [J]. Computer Science, 2024, 51(3): 280-288.
[11] JIANG Yibo, ZHOU Zebao, LI Qiang, ZHOU Ke. Optimization of Low-carbon Oriented Logistics Center Distribution Based on Genetic Algorithm [J]. Computer Science, 2024, 51(11A): 231200035-6.
[12] LI Sanyi, LIU Shuang. Dynamic Multi-Objective Optimization Algorithm with Irregularly Varying Number of Objectives [J]. Computer Science, 2024, 51(11A): 231000079-11.
[13] LI Wenwang, ZHOU Haohao, DENG Su, MA Wubin, WU Yahui. Joint Optimization of Delay and Energy Consumption of Tasks Offloading for Vehicular EdgeComputing [J]. Computer Science, 2024, 51(11A): 231000080-7.
[14] QIU Mingxin, LEI Shuai, LIU Xianhui, ZHANG Yingyao. Online and Offline Multi-source Heterogeneous Data Fusion System for Recycling Information [J]. Computer Science, 2024, 51(11A): 240100095-7.
[15] GENG Huantong, SONG Feifei, ZHOU Zhengli, XU Xiaohan. Improved NSGA-III Based on Kriging Model for Expensive Many-objective Optimization Problems [J]. Computer Science, 2023, 50(7): 194-206.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!