计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 1-12.doi: 10.11896/jsjkx.201000149

• 新型分布式计算技术与系统* 上一篇    下一篇

Ceph分布式存储系统性能优化技术研究综述

张晓1,2,3, 张思蒙1,2, 石佳1,2, 董聪1,2, 李战怀1,2,3   

  1. 1 西北工业大学计算机学院 西安710129
    2 西北工业大学大数据存储与管理工业和信息化部重点实验室 西安710129
    3 西北工业大学空天地海一体化大数据应用技术国家工程实验室 西安710129
  • 收稿日期:2020-10-16 修回日期:2020-11-26 出版日期:2021-02-15 发布日期:2021-02-04
  • 通讯作者: 张晓(zhangxiao@nwpu.edu.cn)
  • 基金资助:
    国家重点研发计划(2018YFB1004401);北京市自然科学基金-海淀原始创新联合基金(L192027)

Review on Performance Optimization of Ceph Distributed Storage System

ZHANG Xiao1,2,3, ZHANG Si-meng1,2, SHI Jia1,2, DONG Cong1,2, LI Zhan-huai1,2,3   

  1. 1 School of Computer Science,Northwestern Polytechnical University,Xi'an 710129,China
    2 MIIT Key Laboratory of Big Data Storage and Management,Northwestern Polytechnical University,Xi'an 710129,China
    3 National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology,Northwestern Polytechnical University,Xi'an 710129,China
  • Received:2020-10-16 Revised:2020-11-26 Online:2021-02-15 Published:2021-02-04
  • About author:ZHANG Xiao,born in 1978,Ph.D,is a member of China Computer Federation.His main research interests include storage systems,computer networks and distributed file systems.
  • Supported by:
    The National Key Research and Development Program(2018YFB1004401) and Beijing Natural Science Foundation (L192027).

摘要: Ceph是一个统一的分布式存储系统,可同时提供块、文件和对象3种接口的存储服务。与传统的分布式存储系统不同,它采用了无中心节点的元数据管理方式,因此具有良好的扩展性和线性增长的性能。经过十余年的发展,Ceph已被广泛地应用于云计算和大数据存储系统。作为云计算的底层平台,Ceph除了提供虚拟机的存储服务外,还可以直接提供对象存储服务和NAS文件服务。Ceph支撑着云计算系统中多种操作系统和应用的存储需求,它的性能对其上的虚拟机和应用有较大的影响,因此Ceph存储系统的性能优化一直是学术界和工业界的研究热点。文中首先介绍了Ceph的架构和特性;然后针对现有的性能优化技术,从对内部机制进行改进、面向新型硬件和基于应用的优化这3个方面进行了归纳和总结,综述了近年来Ceph存储和优化的相关研究;最后对该领域未来的工作进行了展望,以期为分布式存储系统性能优化的研究者提供有价值的参考。

关键词: Ceph分布式存储系统, 非易失内存, 固态硬盘, 统一存储, 性能优化

Abstract: Ceph is a unified distributed storage system,which can provide storage services of 3 types of interfaces:block,file and object.Different from the traditional distributed storage system,it adopts the metadata management method without central node,so it has good scalability and linear growth performance.After more than ten years of development,Ceph has been widely used in cloud computing and big data storage systems.As the underlying platform of cloud computing,Ceph not only provides storage service for virtual machines,but also directly provides the object storage service and NAS file service.Ceph supports storage requirements of various operating systems and applications in cloud computing systems.Its performance has a great influence on virtual machines and applications running on it.Therefore,the performance optimization of the Ceph storage system has been a research hotspot in academia and industry.This paper first introduces the architecture and characteristics of Ceph,then summarizes existing performance optimization technologies from 3 aspects,including internal mechanism improvement,new hardware-orien-ted and application-based optimization and reviews the recent research on Ceph storage and optimization.Finally,it prospects the future work,hoping to provide a valuable reference for researchers in the performance optimization of distributed storage system.

Key words: Ceph distributed storage system, Non-volatile memory, Performance optimization, Solid state disk, Unified storage

中图分类号: 

  • TP319
[1] WEIL S,BRANDT S,MILLER E,et al.CRUSH:Controlled,scalable,decentralized placement of replicated data[C]//Proceedings of the 2006 ACM/IEEE Conference on Supercompu-ting.SC,2006:122.
[2] WEIL S,BRANDT S,MILLER E,et al.Ceph:A scalable,high-performance distributed file system[C]//7th USENIX Symposium on Operating Systems Design and Implementation(OSDI).2006:307-320.
[3] OPENSTACK ORG.2015:Openstack user survey [EB/OL].https://www.openstack.org/analytics.
[4] INTEL.Ceph Benchmark Tools [EB/OL].https://github.com/ceph/cbt.
[5] CEPHCOMMUNITY.Teuthology[EB/OL].https://github.com/ceph/teuthology.
[6] WAN H T,LI Z H,ZHANG X.A Layered Perflormance Monitoring and Gathering Method of Cloud Storage[J].Joumal of Northwestem Polytechnical University,2016,34(3):529-535.
[7] ZHANG X,KONG L,ZHU S,et al.FSObserver:A Performance Measurement and Monitoring Tool for Distributed Storage Systems[C]//IFIP International Conference on Network and Parallel Computing.Springer,Cham,2018:142-147.
[8] ZHANG X,WANG Y Q,WANG Q,et al.A New Approach to Double I/O Performance for Ceph Distributed File System in Cloud Computing[C]//2019 2nd International Conference on Data Intelligence and Security (ICDIS).IEEE,2019:68-75.
[9] LEE D,JEONG K,HAN S,et al.Understanding Write Beha-viors of Storage Backends in Ceph Object Store[C]//IEEE Conference on Mass Storage Systems and Technologies.IEEE,2017,10.
[10] WEIL S.Bluestore:A New Storage Backend For Ceph[EB/OL].https://www.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in.
[11] AGHAYEV A,WEIL S,KUCHNIK M,et al.File systems unfit as distributed storage backends:lessons from 10 years of Ceph evolution[C]//ACM SIGOPS 27th Symposium on Operating Systems Principles.ACM,2019:353-369.
[12] CEPH DOCUMENTATION.Seastore [EB/OL].https://docs.ceph.com/docs/master/dev/seastore/.
[13] CEPH COMMUNITY.Tuning for All Flash Deployments [EB/OL].https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Tuning-for-All-Flash-Deployments.
[14] SATHIAMOORTHY M,ASTERIS M,PAPAILIOPOULOSD,et al.XORing Elephants:Novel Erasure Codes for Big Data[C]//39th International Conference on Very Large Data Bases (VLDB).VLDB Endowment,2013:325-336.
[15] SUNGJOON K,ZHANG J,MIRYEONG K,et al.Understan-ding System Characteristics of Online Erasure Coding on Scalable,Distributed and Large-Scale SSD Array Systems[C]//2017 IEEE International Symposium on Workload Characterization (IISWC).IEEE,2017:76-86.
[16] ZHOU Y.Ceph Erasure Coding Introduction [EB/OL].ht-tps://software.intel.com/content/www/us/en/develop/blogs/ceph-erasure-coding-introduction.html.
[17] HAN Y,PARK S,LEE K.A dynamic message-Aware communication scheduler for Ceph storage system[C]//Proceedings-IEEE 1st International Workshops on Foundations and Applications of Self-Systems.IEEE,2016:60-65.
[18] BODON J,AWAIS K,SUNGYONG P.Async-LCAM:a lockcontention aware messenger for Ceph distributed storage system[J].Cluster Computing,2018,22(2):1386-7857.
[19] SONG U,JEONG B,PARK S,et al.Performance Optimization of Communication Subsystem in Scale-Out Distributed Storage[C]//2017 IEEE 2nd International Workshops on Foundations and Applications of Self Systems (FASW).IEEE,2017:263-268.
[20] GITHUB.msg/async:ibverbs/rdma support [EB/OL].https://github.com/ceph/ceph/pull/11531.
[21] WANG Y,YE M,HE Q,et al.A New Node Selecting Approach in Ceph Storage System Based on Software Defined Network and Multi-attributes Decision-making Model[J].Chinese Journal of Computers,2019,42(2):95-110.
[22] SHA H M,LIANG Y,JIANG W,et al.Optimizing Data Placement of MapReduce on Ceph-Based Framework under Load-Ba-lancing Constraint[C]//2016 IEEE 22nd International Confe-rence on Parallel and Distributed Systems(ICPADS).IEEE,2016:585-592.
[23] WANG L,ZHANG Y M,XU J W,et al.MAPX:Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems[C]//18th USENIX Conference on File and Storage Technologies.FAST 20,2020:1-12.
[24] OH M,EOM J,YOON J,et al.Performance Optimization for All Flash Scale-Out Storage[C]//IEEE International Confe-rence on Cluster Computing.IEEE,2016:316-325.
[25] MEYER S,MORRISON J P.Impact of Single Parameter Changes on Ceph Cloud Storage Performance[J].Scalable Computing:Practice and Experience,2016,17(4):285-298.
[26] CAO Z,TARASOV V,TIWARI S.Towards better understan-ding of black-box auto-tuning:a comparative analysis for storage systems[C]//Proceedings of the 2018 Annual USENIX Technical Conference.Berkeley.USENIX Association,2018:893-907.
[27] CHEN Y,MAO Y C.Automatic tuning of Ceph parametersbased on random forest and genetic algorithm[J].Journal of Computer Applications,2020,40(2):347-351.
[28] INTEL.CeTunetools[EB/OL].https://github.com/intel/CeTune.
[29] Flash Memory Summit 2018:Ceph Optimizations for NVMe[EB/OL].https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_FTEC-202-1_Ye.pdf.
[30] CEPH COMMUNITY.Bluestore Advanced Performance Investigation[EB/OL].https://ceph.io/community/part-4-rhcs-3-2-bluestore-advanced-performance-investigation/.
[31] LU Y,ZHANG J,YANG Z,et al.OCStore:Accelerating Distributed Object Storage with Open-Channel SSDs[C]// 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).IEEE,2019:271-281.
[32] PYDIPATY R,GEORGE J,SAHA A,et al.The Effect of Non Volatile Memory on a Distributed Storage System[C]//IEEE International Conference on High Performance Computing Data and Analytics.IEEE,2017:11-17.
[33] JIN Z S.Optimization of Distributed Storage on Commodity SSD using NVDIMM[D].Seoul:Graduate School of Seoul University,2017.
[34] PETERSON S.Using persistent memory and RDMA for Ceph client write-back caching[C]//Storage Developer Conference.SNIA,2019:24-27.
[35] WEIL S.Erasure Coding And Cache Tiering[EB/OL].https://www.slideshare.net/sageweil1/20150222-scale-sdc-tiering-and-ec.
[36] STEFAN M,JOHN P M.Supporting Heterogeneous Pools in a Single Ceph Storage Cluster[C]//International Symposium on Symbolic & Numeric Algorithms for Scientific Computing.IEEE,2016:352-359.
[37] WU L,ZHUGE Q,SHA H M,et al.BOSS:An Efficient DataDistribution Strategy for Object Storage Systems with HybridDevices[J].IEEE Access,2017,5(1):23979-23993.
[38] LÜTTGAU J,KUHN M,DUWE K,et al.Survey of storagesystems for high performance computing[J].Supercomputing Frontiers and Innovations,2018,5(1):2313-8734.
[39] LIU J,KOZIOL Q,BUTLER G F,et al.Evaluation of HPC Application I/O on Object Storage Systems[C]//IEEE/ACM International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems.IEEE,2018:24-34.
[40] PATEL T,BYNA S,LOCKWOOD G K,et al.Uncovering Access,Reuse,and Sharing Characteristics of I/O-Intensive Files on Large-Scale Production HPC Systems[C]//18th Conference on File and Storage Technologies.Association,2020:91-101.
[41] JEONG K,DUFFY C,KIM J,et al.Optimizing the Ceph Distri-buted File System for High Performance Computing[C]//2019 27th Euromicro International Conference on Parallel,Distributed and Network-Based Processing (PDP).IEEE,2019:446-451.
[42] ZHAN L,FANG X,LI D,et al.The research and implementation of metadata cache backup technology based on CEPH file system[C]//International Conference on Cloud Computing.IEEE,2016:72-77.
[43] WANG L,WEN Y C.Optimization on Small File Performance for CephFS Distributed File System[EB/OL].https://github.com/ceph/ceph/commit/f8316f1a1a9ecdaebd870ad85159d71ba-3429950.
[44] ZHAN K,XU L,YUAN Z,et al.Performance Optimization of Large Files Writes to Ceph Based on Multiple Pipelines Algorithm[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:525-532.
[1] 陈钧吾, 余华山.
面向无尺度图的Δ-stepping算法改进策略
Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs
计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062
[2] 鄂海红, 张田宇, 宋美娜.
基于Web的数据可视化图表渲染优化方法
Web-based Data Visualization Chart Rendering Optimization Method
计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038
[3] 徐江峰谭玉龙.
基于机器学习的HBase配置参数优化研究
Research on HBase Configuration Parameter Optimization Based on Machine Learning
计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046
[4] 张彭奕, 宋杰.
区块链共识算法效能优化研究进展
Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms
计算机科学, 2020, 47(12): 296-303. https://doi.org/10.11896/jsjkx.200700020
[5] 徐传福,王曦,刘舒,陈世钊,林玉.
基于Python的大规模高性能LBM多相流模拟
Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python
计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009
[6] 王涛, 梁潇, 吴倩倩, 王彭, 曹伟, 孙建伶.
基于NVM的无日志哈希表
Logless Hash Table Based on NVM
计算机科学, 2019, 46(9): 66-72. https://doi.org/10.11896/j.issn.1002-137X.2019.09.008
[7] 张凌浩, 桂盛霖, 穆逢君, 王胜.
基于后缀树的二进制可执行代码的克隆检测算法
Clone Detection Algorithm for Binary Executable Code with Suffix Tree
计算机科学, 2019, 46(10): 141-147. https://doi.org/10.11896/jsjkx.180801573
[8] 徐启泽, 韩文廷, 陈俊仕, 安虹.
众核平台上广度优先搜索算法的优化
Optimization of Breadth-first Search Algorithm Based on Many-core Platform
计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049
[9] 邱赐云, 李礼, 张欢, 吴佳.
大数据时代——从冯·诺依曼到计算存储融合
Age of Big Data:from Von Neumann to Computing Storage Fusion
计算机科学, 2018, 45(11A): 71-75.
[10] 孙涛, 张俊星.
SDN性能优化技术研究综述
Review of SDN Performance Optimization Technology
计算机科学, 2018, 45(11A): 84-91.
[11] 孙志龙,沙行勉,诸葛晴凤,陈咸彰,吴剀劼.
面向内存文件系统的数据一致性更新机制研究
Research on Data Consistency for In-memory File Systems
计算机科学, 2017, 44(2): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2017.02.036
[12] 倪友聪,李松,叶鹏,杜欣.
基于随机搜索规则的软件体系结构层性能演化优化方法
Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level
计算机科学, 2017, 44(11): 156-163. https://doi.org/10.11896/j.issn.1002-137X.2017.11.023
[13] 赵利伟,陈咸彰,诸葛晴凤.
连接操作在SIMFS和EXT4上的性能比较
Performance Comparison of Join Operations on SIMFS and EXT4
计算机科学, 2016, 43(6): 184-187. https://doi.org/10.11896/j.issn.1002-137X.2016.06.037
[14] 柯叶青,马志柔,伍海江,刘 杰.
一种简历语义搜索系统的实现方法
SmartHR:A Resume Query and Management System Based on Semantic Web
计算机科学, 2015, 42(12): 56-59.
[15] 杜欣,汪春燕,倪友聪,叶 鹏,肖如良.
基于规则的软件体系结构层性能优化模型
Rule-based Performance Optimization Model at Software Architecture Level
计算机科学, 2015, 42(10): 189-192.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!