Computer Science ›› 2025, Vol. 52 ›› Issue (2): 42-47.doi: 10.11896/jsjkx.231200021

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Distributed Hybrid Storage Based on Erasure Coding and Replication

FU Xiong, SONG Zhaoyang, WANG Junchang, DENG Song   

  1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2023-12-04 Revised:2024-04-14 Online:2025-02-15 Published:2025-02-17
  • About author:FU Xiong,born in 1979,Ph.D,professor.His main research interests include cloud computing and distributed computing.
  • Supported by:
    National Natural Science Foundation of China(61602264) and Key Research & Development Program (Social Development) of Jiangsu Province(BE2017743).

Abstract: With the rapid development of big data technology,cloud computing,computer technology and network technology,Internet data has shown explosive growth,and efficient storage of massive data has become an urgent challenge for current Internet technology.However,traditional multi-copy redundancy mechanisms result in huge storage costs,thus drawing attention to new storage solutions.In this context,a distributed hybrid storage strategy based on erasure coding and replica replication is proposed.Based on data characteristics,this strategy uses replica replication for hot data to ensure high reliability and performance,while erasure coding is used for cold data to improve storage utilization.Based on Newton's cooling law,the data files is divided into hot files and cold files,and an adaptive data temperature identification and hot and cold data adaptive dynamic allocation algorithm are introduced,so that the system can automatically adjust the ratio of hot and cold data at runtime,and then intelligently adjust the data storage strategy according to the the hot and cold conditions of real-time data,which reflects the system's adaptability in a dynamic environment.It not only enhances the system's adaptability to dynamic workloads,but also provides a new paradigm for the efficiency and flexibility of distributed storage systems in practical applications.This innovation has important promotion significance at both the academic and practical levels.At the same time,the effectiveness and usability of the strategy have been verified through simulation experiments,which provides new ideas for the optimization of distributed storage systems.

Key words: Big data, Replica replication, Erasure coding, Hot and cold data, Storage utilization rate

CLC Number: 

  • TP393
[1]CHOU R A,KLIEWER J.Secure distributed storage:Optimal trade-off between storage rate and privacy leakage[C]//2023 IEEE International Symposium on Information Theory(ISIT).IEEE,2023:1324-1329.
[2]NAEEM M,JAMAL T,DIAZ-MARTINEZ J,et al.Trends and future perspective challenges in big data[C]//Advances in Intelligent Data Analysis and Applications:Proceeding of the Sixth Euro-China Conference on Intelligent Data Analysis and Applications,15-18 October 2019,Arad,Romania,Springer Singapore,2022:309-325.
[3]GHAZI M R,GANGODKAR D.Hadoop,MapReduce andHDFS:a developers perspective[J].Procedia Computer Science,2015,48:45-50.
[4]RYBINTSEV V O.Optimizing the parameters of the Lustre-file-system-based HPC system for reverse time migration[J].The Journal of Supercomputing,2020,76:536-548.
[5]WANG Y,YE M,HE Q,et al.Ceph storage system node selection method based on software-defined network and multi-attribute decision-making [J].Journal of Computer Science,2019,42(2):93-108.
[6]XIA Y,WANG Y.Fault-tolerant selection algorithm of nodes in Ceph storage system [J].Journal of Guilin University of Electronic Science and Technology,2022,42(5):384-390.
[7]BALAJI S B,KRISHNAN M N,VAJHA M,et al.Erasure co-ding for distributed storage:An overview[J].Science China Information Sciences,2018,61:1-45.
[8]CADAMBE V R,LYU S.Brief Announcement:CausalEC:ACausally Consistent Data Storage Algorithm based on Cross-Object Erasure Coding[C]//Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing.2023:374-377.
[9]SHIN D J,KIM J J.Cache-Based Matrix Technology for Effi-cient Write and Recovery in Erasure Coding Distributed File Systems[J].Symmetry,2023,15(4):872.
[10]DING Y,NIU C,WU F,et al.Federated submodel optimization for hot and cold data features[J].Advances in Neural Information Processing Systems,2022,35:1-13.
[11]LIU J,FAN X,WU Y,et al.HoaKV:High-Performance KV Store Based on the Hot-Awareness in Mixed Workloads[J].Electronics,2023,12(15):3227.
[12]YE X,ZHAI Z,LI X.Off-line Deduplication Method for Solid-State Disk Based on Hot and Cold Data[J].Tehnicˇki Vjesnik,2020,27(2):368-373.
[13]CHEN H,ZHANG H,DONG M,et al.Efficient and available in-memory KV-store with hybrid erasure coding and replication[J].ACM Transactions on Storage(TOS),2017,13(3):1-30.
[14]HSU Y F,IRIE R,MURATA S,et al.A novel automated cloud storage tiering system through hot-cold data classification[C]//2018 IEEE 11th International Conference on Cloud Computing(CLOUD).IEEE,2018:492-499.
[15]LI Z,XIAO C.ER-Store:A Hybrid Storage Mechanism with Erasure Coding and Replication in Distributed Database Systems[J].Scientific Programming,2021,2021:1-13.
[16]CHANG C H,WENG J Y,YEN N Y,et al.Using the Ceph File System and RADOS Gateway to Construct an Integrated Shared Storage[J].Human-centric Computing and Information Sciences,2024,14.
[17]MARUYAMA S,MORIYA S.Newton's Law of Cooling:Follow up and exploration[J].International Journal of Heat and Mass Transfer,2021,164:120544.
[18]PATIL D P,PATIL S A,PATIL K J.Newton's law of cooling by Emad-Falih transform[J].International Journal of Advances in Engineering and Management,2022,4(6):1515-1519.
[19]DA SILVA S L E F.Newton's cooling law in generalised statistical mechanics[J].Physica A:Statistical Mechanics and its Applications,2021,565:125539.
[20]LIN Y,SHEN H.Eafr:An energy-efficient adaptive file replication system in data-intensive clusters[J].IEEE Transactions on Parallel and Distributed Systems,2016,28(4):1017-1030.
[21]HE Q,ZHANG F,BIAN G,et al.File block multi-replica management technology in cloud storage[J].Cluster Computing,2023:1-20.
[22]LLOPIS P,BLAS J G,ISAILA F,et al.Survey of energy-efficient and power-proportional storage systems[J].The Computer Journal,2014,57(7):1017-1032.
[23]QIU N,HU X,WANG P,et al.Research on data cluster storage optimization strategy of consistent hashing [J].Information and Control,2016,45(6):747-752.
[24]ZHANG H,LIU S,TANG D,et al.Low repair cost erasure co-ding in distributed storage systems [J].Computer Applications,2020,40(10):2942.
[25]ADAMOU A,EGLOFF M,PICCD D.Enabling Ontology-Based Data Access to Project Gutenberg[C]//Workshop on Humanities in the Semantic Web.2020:21-32.
[26]REHMAN A U,AGUIAR R L,BARRACA J P.Fault-tolerance in the scope of cloud computing[J].IEEE Access,2022,10:63422-63441.
[1] LIANG Zheheng, WU Yuewen, LI Yongjian , ZHANG Xiaolu , SHEN Guiquan, SU Lingang, LIU Junle. Resource Preference-sensitive Cloud Configuration Recommendation Method for Big DataApplications [J]. Computer Science, 2025, 52(6A): 240800114-9.
[2] ZHANG Manjing, HE Yulin, LI Xu, HUANG Zhexue. Distributed Two-stage Clustering Method Based on Node Sampling [J]. Computer Science, 2025, 52(2): 134-144.
[3] XU Ruida, LI Yongkun, XU Yinlong. Performance Optimization of LSM-tree Based Key-Value Storage System Based on Fine-grained Cache andLearned Index [J]. Computer Science, 2025, 52(2): 33-41.
[4] LIU Wei, SUN Jia, WANG Peng, CHEN Yafan. Development on Methods and Applications of Cognitive Computing of Urban Big Data [J]. Computer Science, 2024, 51(7): 49-58.
[5] WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai. Large-scale Network Community Detection Algorithm Based on MapReduce [J]. Computer Science, 2024, 51(4): 11-18.
[6] CHEN Pan, CHEN Hongmei, LUO Chuan. Academic Influence Ranking Algorithm Based on Topic Reputation and Dynamic HeterogeneousNetwork [J]. Computer Science, 2024, 51(3): 81-89.
[7] YAN Jiahe, LI Honghui, MA Ying, LIU Zhen, ZHANG Dalin, JIANG Zhouxian, DUAN Yuhang. Multi-source Heterogeneous Data Fusion Technologies and Government Big Data GovernanceSystem [J]. Computer Science, 2024, 51(2): 1-14.
[8] FAN Shuhuan, HOU Mengshu. Dataspace:A New Data Organization and Management Model [J]. Computer Science, 2023, 50(5): 115-127.
[9] HU Xuegang, LI Yang, WANG Lei, LI Peipei, YOU Zhuhong. Key Technologies of Intelligent Identification of Biomarkers:Review of Research on Association Prediction Between Circular RNA and Disease [J]. Computer Science, 2023, 50(4): 369-387.
[10] JIANG Chuanyu, HAN Xiangyu, YANG Wenrui, LYU Bohan, HUANG Xiaoou, XIE Xia, GU Yang. Survey of Medical Knowledge Graph Research and Application [J]. Computer Science, 2023, 50(3): 83-93.
[11] MA Wensheng, HOU Xilin, WANG Hongbo, LIU Sen. Study on Value Calculation of Big Data Based on Granular Tree and Usage Relationship [J]. Computer Science, 2023, 50(11A): 230300109-8.
[12] LU Mingchen, LYU Yanqi, LIU Ruicheng, JIN Peiquan. Fast Storage System for Time-series Big Data Streams Based on Waterwheel Model [J]. Computer Science, 2023, 50(1): 25-33.
[13] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[14] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[15] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!