计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 42-47.doi: 10.11896/jsjkx.231200021

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于擦除编码和副本复制的分布式混合存储研究

付雄, 宋朝阳, 王俊昌, 邓松   

  1. 南京邮电大学计算机学院 南京 210023
  • 收稿日期:2023-12-04 修回日期:2024-04-14 出版日期:2025-02-15 发布日期:2025-02-17
  • 通讯作者: 付雄(fux@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金(61602264);江苏省重点研发计划(社会发展)(BE2017743)

Study on Distributed Hybrid Storage Based on Erasure Coding and Replication

FU Xiong, SONG Zhaoyang, WANG Junchang, DENG Song   

  1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2023-12-04 Revised:2024-04-14 Online:2025-02-15 Published:2025-02-17
  • About author:FU Xiong,born in 1979,Ph.D,professor.His main research interests include cloud computing and distributed computing.
  • Supported by:
    National Natural Science Foundation of China(61602264) and Key Research & Development Program (Social Development) of Jiangsu Province(BE2017743).

摘要: 随着大数据技术、云计算、计算机技术和网络技术的迅猛发展,互联网数据呈爆炸性增长,海量数据的高效存储成为当前互联网技术亟待解决的问题。然而,传统的多副本冗余机制导致了巨大的存储成本,引起了研究者们对新型存储解决方案的关注。在这一背景下,提出了一种基于擦除编码和副本复制的分布式混合存储策略。该策略根据数据特性,对热数据采用副本复制以确保高可靠性和性能,而对冷数据则采用擦除编码以提高存储利用率。基于牛顿冷却定律将数据文件划分为热文件和冷文件,并引入一种自适应的数据温度识别及冷热数据自适应动态分配算法,使系统能够在运行时自动调整冷热数据的比例,然后根据实时数据冷热情况智能调整数据的存储策略,体现了系统在动态环境下的自适应性。其不仅增强了系统对动态工作负载的适应能力,也为提高分布式存储系统在实际应用中的效率和灵活性提供了新的范式。这一创新点在学术和实践层面都具有重要的推动意义。同时,通过仿真实验验证了该策略的有效性和可用性,其为分布式存储系统的优化提供了新的思路。

关键词: 大数据, 副本复制, 擦除编码, 冷热数据, 存储利用率

Abstract: With the rapid development of big data technology,cloud computing,computer technology and network technology,Internet data has shown explosive growth,and efficient storage of massive data has become an urgent challenge for current Internet technology.However,traditional multi-copy redundancy mechanisms result in huge storage costs,thus drawing attention to new storage solutions.In this context,a distributed hybrid storage strategy based on erasure coding and replica replication is proposed.Based on data characteristics,this strategy uses replica replication for hot data to ensure high reliability and performance,while erasure coding is used for cold data to improve storage utilization.Based on Newton's cooling law,the data files is divided into hot files and cold files,and an adaptive data temperature identification and hot and cold data adaptive dynamic allocation algorithm are introduced,so that the system can automatically adjust the ratio of hot and cold data at runtime,and then intelligently adjust the data storage strategy according to the the hot and cold conditions of real-time data,which reflects the system's adaptability in a dynamic environment.It not only enhances the system's adaptability to dynamic workloads,but also provides a new paradigm for the efficiency and flexibility of distributed storage systems in practical applications.This innovation has important promotion significance at both the academic and practical levels.At the same time,the effectiveness and usability of the strategy have been verified through simulation experiments,which provides new ideas for the optimization of distributed storage systems.

Key words: Big data, Replica replication, Erasure coding, Hot and cold data, Storage utilization rate

中图分类号: 

  • TP393
[1]CHOU R A,KLIEWER J.Secure distributed storage:Optimal trade-off between storage rate and privacy leakage[C]//2023 IEEE International Symposium on Information Theory(ISIT).IEEE,2023:1324-1329.
[2]NAEEM M,JAMAL T,DIAZ-MARTINEZ J,et al.Trends and future perspective challenges in big data[C]//Advances in Intelligent Data Analysis and Applications:Proceeding of the Sixth Euro-China Conference on Intelligent Data Analysis and Applications,15-18 October 2019,Arad,Romania,Springer Singapore,2022:309-325.
[3]GHAZI M R,GANGODKAR D.Hadoop,MapReduce andHDFS:a developers perspective[J].Procedia Computer Science,2015,48:45-50.
[4]RYBINTSEV V O.Optimizing the parameters of the Lustre-file-system-based HPC system for reverse time migration[J].The Journal of Supercomputing,2020,76:536-548.
[5]WANG Y,YE M,HE Q,et al.Ceph storage system node selection method based on software-defined network and multi-attribute decision-making [J].Journal of Computer Science,2019,42(2):93-108.
[6]XIA Y,WANG Y.Fault-tolerant selection algorithm of nodes in Ceph storage system [J].Journal of Guilin University of Electronic Science and Technology,2022,42(5):384-390.
[7]BALAJI S B,KRISHNAN M N,VAJHA M,et al.Erasure co-ding for distributed storage:An overview[J].Science China Information Sciences,2018,61:1-45.
[8]CADAMBE V R,LYU S.Brief Announcement:CausalEC:ACausally Consistent Data Storage Algorithm based on Cross-Object Erasure Coding[C]//Proceedings of the 2023 ACM Symposium on Principles of Distributed Computing.2023:374-377.
[9]SHIN D J,KIM J J.Cache-Based Matrix Technology for Effi-cient Write and Recovery in Erasure Coding Distributed File Systems[J].Symmetry,2023,15(4):872.
[10]DING Y,NIU C,WU F,et al.Federated submodel optimization for hot and cold data features[J].Advances in Neural Information Processing Systems,2022,35:1-13.
[11]LIU J,FAN X,WU Y,et al.HoaKV:High-Performance KV Store Based on the Hot-Awareness in Mixed Workloads[J].Electronics,2023,12(15):3227.
[12]YE X,ZHAI Z,LI X.Off-line Deduplication Method for Solid-State Disk Based on Hot and Cold Data[J].Tehnicˇki Vjesnik,2020,27(2):368-373.
[13]CHEN H,ZHANG H,DONG M,et al.Efficient and available in-memory KV-store with hybrid erasure coding and replication[J].ACM Transactions on Storage(TOS),2017,13(3):1-30.
[14]HSU Y F,IRIE R,MURATA S,et al.A novel automated cloud storage tiering system through hot-cold data classification[C]//2018 IEEE 11th International Conference on Cloud Computing(CLOUD).IEEE,2018:492-499.
[15]LI Z,XIAO C.ER-Store:A Hybrid Storage Mechanism with Erasure Coding and Replication in Distributed Database Systems[J].Scientific Programming,2021,2021:1-13.
[16]CHANG C H,WENG J Y,YEN N Y,et al.Using the Ceph File System and RADOS Gateway to Construct an Integrated Shared Storage[J].Human-centric Computing and Information Sciences,2024,14.
[17]MARUYAMA S,MORIYA S.Newton's Law of Cooling:Follow up and exploration[J].International Journal of Heat and Mass Transfer,2021,164:120544.
[18]PATIL D P,PATIL S A,PATIL K J.Newton's law of cooling by Emad-Falih transform[J].International Journal of Advances in Engineering and Management,2022,4(6):1515-1519.
[19]DA SILVA S L E F.Newton's cooling law in generalised statistical mechanics[J].Physica A:Statistical Mechanics and its Applications,2021,565:125539.
[20]LIN Y,SHEN H.Eafr:An energy-efficient adaptive file replication system in data-intensive clusters[J].IEEE Transactions on Parallel and Distributed Systems,2016,28(4):1017-1030.
[21]HE Q,ZHANG F,BIAN G,et al.File block multi-replica management technology in cloud storage[J].Cluster Computing,2023:1-20.
[22]LLOPIS P,BLAS J G,ISAILA F,et al.Survey of energy-efficient and power-proportional storage systems[J].The Computer Journal,2014,57(7):1017-1032.
[23]QIU N,HU X,WANG P,et al.Research on data cluster storage optimization strategy of consistent hashing [J].Information and Control,2016,45(6):747-752.
[24]ZHANG H,LIU S,TANG D,et al.Low repair cost erasure co-ding in distributed storage systems [J].Computer Applications,2020,40(10):2942.
[25]ADAMOU A,EGLOFF M,PICCD D.Enabling Ontology-Based Data Access to Project Gutenberg[C]//Workshop on Humanities in the Semantic Web.2020:21-32.
[26]REHMAN A U,AGUIAR R L,BARRACA J P.Fault-tolerance in the scope of cloud computing[J].IEEE Access,2022,10:63422-63441.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!