计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 1-12.doi: 10.11896/jsjkx.201000149
张晓1,2,3, 张思蒙1,2, 石佳1,2, 董聪1,2, 李战怀1,2,3
ZHANG Xiao1,2,3, ZHANG Si-meng1,2, SHI Jia1,2, DONG Cong1,2, LI Zhan-huai1,2,3
摘要: Ceph是一个统一的分布式存储系统,可同时提供块、文件和对象3种接口的存储服务。与传统的分布式存储系统不同,它采用了无中心节点的元数据管理方式,因此具有良好的扩展性和线性增长的性能。经过十余年的发展,Ceph已被广泛地应用于云计算和大数据存储系统。作为云计算的底层平台,Ceph除了提供虚拟机的存储服务外,还可以直接提供对象存储服务和NAS文件服务。Ceph支撑着云计算系统中多种操作系统和应用的存储需求,它的性能对其上的虚拟机和应用有较大的影响,因此Ceph存储系统的性能优化一直是学术界和工业界的研究热点。文中首先介绍了Ceph的架构和特性;然后针对现有的性能优化技术,从对内部机制进行改进、面向新型硬件和基于应用的优化这3个方面进行了归纳和总结,综述了近年来Ceph存储和优化的相关研究;最后对该领域未来的工作进行了展望,以期为分布式存储系统性能优化的研究者提供有价值的参考。
中图分类号:
[1] WEIL S,BRANDT S,MILLER E,et al.CRUSH:Controlled,scalable,decentralized placement of replicated data[C]//Proceedings of the 2006 ACM/IEEE Conference on Supercompu-ting.SC,2006:122. [2] WEIL S,BRANDT S,MILLER E,et al.Ceph:A scalable,high-performance distributed file system[C]//7th USENIX Symposium on Operating Systems Design and Implementation(OSDI).2006:307-320. [3] OPENSTACK ORG.2015:Openstack user survey [EB/OL].https://www.openstack.org/analytics. [4] INTEL.Ceph Benchmark Tools [EB/OL].https://github.com/ceph/cbt. [5] CEPHCOMMUNITY.Teuthology[EB/OL].https://github.com/ceph/teuthology. [6] WAN H T,LI Z H,ZHANG X.A Layered Perflormance Monitoring and Gathering Method of Cloud Storage[J].Joumal of Northwestem Polytechnical University,2016,34(3):529-535. [7] ZHANG X,KONG L,ZHU S,et al.FSObserver:A Performance Measurement and Monitoring Tool for Distributed Storage Systems[C]//IFIP International Conference on Network and Parallel Computing.Springer,Cham,2018:142-147. [8] ZHANG X,WANG Y Q,WANG Q,et al.A New Approach to Double I/O Performance for Ceph Distributed File System in Cloud Computing[C]//2019 2nd International Conference on Data Intelligence and Security (ICDIS).IEEE,2019:68-75. [9] LEE D,JEONG K,HAN S,et al.Understanding Write Beha-viors of Storage Backends in Ceph Object Store[C]//IEEE Conference on Mass Storage Systems and Technologies.IEEE,2017,10. [10] WEIL S.Bluestore:A New Storage Backend For Ceph[EB/OL].https://www.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in. [11] AGHAYEV A,WEIL S,KUCHNIK M,et al.File systems unfit as distributed storage backends:lessons from 10 years of Ceph evolution[C]//ACM SIGOPS 27th Symposium on Operating Systems Principles.ACM,2019:353-369. [12] CEPH DOCUMENTATION.Seastore [EB/OL].https://docs.ceph.com/docs/master/dev/seastore/. [13] CEPH COMMUNITY.Tuning for All Flash Deployments [EB/OL].https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Tuning-for-All-Flash-Deployments. [14] SATHIAMOORTHY M,ASTERIS M,PAPAILIOPOULOSD,et al.XORing Elephants:Novel Erasure Codes for Big Data[C]//39th International Conference on Very Large Data Bases (VLDB).VLDB Endowment,2013:325-336. [15] SUNGJOON K,ZHANG J,MIRYEONG K,et al.Understan-ding System Characteristics of Online Erasure Coding on Scalable,Distributed and Large-Scale SSD Array Systems[C]//2017 IEEE International Symposium on Workload Characterization (IISWC).IEEE,2017:76-86. [16] ZHOU Y.Ceph Erasure Coding Introduction [EB/OL].ht-tps://software.intel.com/content/www/us/en/develop/blogs/ceph-erasure-coding-introduction.html. [17] HAN Y,PARK S,LEE K.A dynamic message-Aware communication scheduler for Ceph storage system[C]//Proceedings-IEEE 1st International Workshops on Foundations and Applications of Self-Systems.IEEE,2016:60-65. [18] BODON J,AWAIS K,SUNGYONG P.Async-LCAM:a lockcontention aware messenger for Ceph distributed storage system[J].Cluster Computing,2018,22(2):1386-7857. [19] SONG U,JEONG B,PARK S,et al.Performance Optimization of Communication Subsystem in Scale-Out Distributed Storage[C]//2017 IEEE 2nd International Workshops on Foundations and Applications of Self Systems (FASW).IEEE,2017:263-268. [20] GITHUB.msg/async:ibverbs/rdma support [EB/OL].https://github.com/ceph/ceph/pull/11531. [21] WANG Y,YE M,HE Q,et al.A New Node Selecting Approach in Ceph Storage System Based on Software Defined Network and Multi-attributes Decision-making Model[J].Chinese Journal of Computers,2019,42(2):95-110. [22] SHA H M,LIANG Y,JIANG W,et al.Optimizing Data Placement of MapReduce on Ceph-Based Framework under Load-Ba-lancing Constraint[C]//2016 IEEE 22nd International Confe-rence on Parallel and Distributed Systems(ICPADS).IEEE,2016:585-592. [23] WANG L,ZHANG Y M,XU J W,et al.MAPX:Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems[C]//18th USENIX Conference on File and Storage Technologies.FAST 20,2020:1-12. [24] OH M,EOM J,YOON J,et al.Performance Optimization for All Flash Scale-Out Storage[C]//IEEE International Confe-rence on Cluster Computing.IEEE,2016:316-325. [25] MEYER S,MORRISON J P.Impact of Single Parameter Changes on Ceph Cloud Storage Performance[J].Scalable Computing:Practice and Experience,2016,17(4):285-298. [26] CAO Z,TARASOV V,TIWARI S.Towards better understan-ding of black-box auto-tuning:a comparative analysis for storage systems[C]//Proceedings of the 2018 Annual USENIX Technical Conference.Berkeley.USENIX Association,2018:893-907. [27] CHEN Y,MAO Y C.Automatic tuning of Ceph parametersbased on random forest and genetic algorithm[J].Journal of Computer Applications,2020,40(2):347-351. [28] INTEL.CeTunetools[EB/OL].https://github.com/intel/CeTune. [29] Flash Memory Summit 2018:Ceph Optimizations for NVMe[EB/OL].https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_FTEC-202-1_Ye.pdf. [30] CEPH COMMUNITY.Bluestore Advanced Performance Investigation[EB/OL].https://ceph.io/community/part-4-rhcs-3-2-bluestore-advanced-performance-investigation/. [31] LU Y,ZHANG J,YANG Z,et al.OCStore:Accelerating Distributed Object Storage with Open-Channel SSDs[C]// 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).IEEE,2019:271-281. [32] PYDIPATY R,GEORGE J,SAHA A,et al.The Effect of Non Volatile Memory on a Distributed Storage System[C]//IEEE International Conference on High Performance Computing Data and Analytics.IEEE,2017:11-17. [33] JIN Z S.Optimization of Distributed Storage on Commodity SSD using NVDIMM[D].Seoul:Graduate School of Seoul University,2017. [34] PETERSON S.Using persistent memory and RDMA for Ceph client write-back caching[C]//Storage Developer Conference.SNIA,2019:24-27. [35] WEIL S.Erasure Coding And Cache Tiering[EB/OL].https://www.slideshare.net/sageweil1/20150222-scale-sdc-tiering-and-ec. [36] STEFAN M,JOHN P M.Supporting Heterogeneous Pools in a Single Ceph Storage Cluster[C]//International Symposium on Symbolic & Numeric Algorithms for Scientific Computing.IEEE,2016:352-359. [37] WU L,ZHUGE Q,SHA H M,et al.BOSS:An Efficient DataDistribution Strategy for Object Storage Systems with HybridDevices[J].IEEE Access,2017,5(1):23979-23993. [38] LÜTTGAU J,KUHN M,DUWE K,et al.Survey of storagesystems for high performance computing[J].Supercomputing Frontiers and Innovations,2018,5(1):2313-8734. [39] LIU J,KOZIOL Q,BUTLER G F,et al.Evaluation of HPC Application I/O on Object Storage Systems[C]//IEEE/ACM International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems.IEEE,2018:24-34. [40] PATEL T,BYNA S,LOCKWOOD G K,et al.Uncovering Access,Reuse,and Sharing Characteristics of I/O-Intensive Files on Large-Scale Production HPC Systems[C]//18th Conference on File and Storage Technologies.Association,2020:91-101. [41] JEONG K,DUFFY C,KIM J,et al.Optimizing the Ceph Distri-buted File System for High Performance Computing[C]//2019 27th Euromicro International Conference on Parallel,Distributed and Network-Based Processing (PDP).IEEE,2019:446-451. [42] ZHAN L,FANG X,LI D,et al.The research and implementation of metadata cache backup technology based on CEPH file system[C]//International Conference on Cloud Computing.IEEE,2016:72-77. [43] WANG L,WEN Y C.Optimization on Small File Performance for CephFS Distributed File System[EB/OL].https://github.com/ceph/ceph/commit/f8316f1a1a9ecdaebd870ad85159d71ba-3429950. [44] ZHAN K,XU L,YUAN Z,et al.Performance Optimization of Large Files Writes to Ceph Based on Multiple Pipelines Algorithm[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:525-532. |
[1] | 陈钧吾, 余华山. 面向无尺度图的Δ-stepping算法改进策略 Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs 计算机科学, 2022, 49(6A): 594-600. https://doi.org/10.11896/jsjkx.210400062 |
[2] | 鄂海红, 张田宇, 宋美娜. 基于Web的数据可视化图表渲染优化方法 Web-based Data Visualization Chart Rendering Optimization Method 计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038 |
[3] | 徐江峰谭玉龙. 基于机器学习的HBase配置参数优化研究 Research on HBase Configuration Parameter Optimization Based on Machine Learning 计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046 |
[4] | 张彭奕, 宋杰. 区块链共识算法效能优化研究进展 Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms 计算机科学, 2020, 47(12): 296-303. https://doi.org/10.11896/jsjkx.200700020 |
[5] | 徐传福,王曦,刘舒,陈世钊,林玉. 基于Python的大规模高性能LBM多相流模拟 Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python 计算机科学, 2020, 47(1): 17-23. https://doi.org/10.11896/jsjkx.190500009 |
[6] | 王涛, 梁潇, 吴倩倩, 王彭, 曹伟, 孙建伶. 基于NVM的无日志哈希表 Logless Hash Table Based on NVM 计算机科学, 2019, 46(9): 66-72. https://doi.org/10.11896/j.issn.1002-137X.2019.09.008 |
[7] | 张凌浩, 桂盛霖, 穆逢君, 王胜. 基于后缀树的二进制可执行代码的克隆检测算法 Clone Detection Algorithm for Binary Executable Code with Suffix Tree 计算机科学, 2019, 46(10): 141-147. https://doi.org/10.11896/jsjkx.180801573 |
[8] | 徐启泽, 韩文廷, 陈俊仕, 安虹. 众核平台上广度优先搜索算法的优化 Optimization of Breadth-first Search Algorithm Based on Many-core Platform 计算机科学, 2019, 46(1): 314-319. https://doi.org/10.11896/j.issn.1002-137X.2019.01.049 |
[9] | 邱赐云, 李礼, 张欢, 吴佳. 大数据时代——从冯·诺依曼到计算存储融合 Age of Big Data:from Von Neumann to Computing Storage Fusion 计算机科学, 2018, 45(11A): 71-75. |
[10] | 孙涛, 张俊星. SDN性能优化技术研究综述 Review of SDN Performance Optimization Technology 计算机科学, 2018, 45(11A): 84-91. |
[11] | 孙志龙,沙行勉,诸葛晴凤,陈咸彰,吴剀劼. 面向内存文件系统的数据一致性更新机制研究 Research on Data Consistency for In-memory File Systems 计算机科学, 2017, 44(2): 222-227. https://doi.org/10.11896/j.issn.1002-137X.2017.02.036 |
[12] | 倪友聪,李松,叶鹏,杜欣. 基于随机搜索规则的软件体系结构层性能演化优化方法 Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level 计算机科学, 2017, 44(11): 156-163. https://doi.org/10.11896/j.issn.1002-137X.2017.11.023 |
[13] | 赵利伟,陈咸彰,诸葛晴凤. 连接操作在SIMFS和EXT4上的性能比较 Performance Comparison of Join Operations on SIMFS and EXT4 计算机科学, 2016, 43(6): 184-187. https://doi.org/10.11896/j.issn.1002-137X.2016.06.037 |
[14] | 柯叶青,马志柔,伍海江,刘 杰. 一种简历语义搜索系统的实现方法 SmartHR:A Resume Query and Management System Based on Semantic Web 计算机科学, 2015, 42(12): 56-59. |
[15] | 杜欣,汪春燕,倪友聪,叶 鹏,肖如良. 基于规则的软件体系结构层性能优化模型 Rule-based Performance Optimization Model at Software Architecture Level 计算机科学, 2015, 42(10): 189-192. |
|