计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 61-65.doi: 10.11896/j.issn.1002-137X.2018.07.009

• 第三十三届全国信息存储技术学术会议 • 上一篇    下一篇

基于SSD-SMR混合存储的LSM树键值存储系统的性能优化

王洋洋,韦皓诚,柴云鹏   

  1. 中国人民大学信息学院 北京100872
  • 收稿日期:2017-07-16 出版日期:2018-07-30 发布日期:2018-07-30
  • 作者简介:王洋洋(1995-),男,硕士生,主要研究方向为分布式系统、大规模存储系统;韦皓诚(1993-),男,硕士生,主要研究方向为分布式系统、大规模存储系统;柴云鹏(1983-),男,博士,副教授,CCF会员,主要研究方向为分布式系统、大规模存储系统、云存储、Flash存储、节能存储系统,E-mail:ypchai@ruc.edu.cn(通信作者)。
  • 基金资助:
    本文受国家重点研发计划“云计算和大数据”重点专项项目(2018YFB1004400),国家自然科学基金重点项目(61732014),北京市自然科学基金面上项目(4172031),中国人民大学预研委托项目(团队基金)(16XNLQ02),计算机体系结构国家重点实验室开放课题(CARCH201702)资助。

Performance Optimization of LSM Tree Key-value Storage System Based on SSD-SMR Hybrid Storage

WANG Yang-yang,WEI Hao-cheng,CHAI Yun-peng   

  1. School of Information,Renmin University of China,Beijing 100872,China
  • Received:2017-07-16 Online:2018-07-30 Published:2018-07-30

摘要: 大数据对存储系统的可扩展性、性能和成本等方面提出了更高的要求。瓦记录(Shingled Magnetic Recor-ding,SMR)硬盘由于存储密度高、价格便宜,正逐步被广泛应用于大数据存储系统。但是,SMR硬盘的随机写性能较差,与快速的基于闪存的固态硬盘(Solid State Drive,SSD)一起构成混合存储时可以显著提升性能。同时,基于写优化的日志结构合并(Log-Structured Merge,LSM)树的键值存储已被广泛应用于许多NoSQL系统,如BigTable,Cassandra和HBase等。因此,如何基于新型的SSD-SMR混合存储构建出高性能的LSM树键值存储系统是一个具有很大研究价值的问题。首先建立基于SSD-SMR混合存储的LSM树键值系统的性能模型,然后针对SSD和SMR的硬件特征以及LSM树键值存储的软件特点,设计了一套面向SSD-SMR混合存储进行性能优化的LSM树键值存储系统,并基于LevelDB实现了该系统。在仅仅使用0.4%~2%空间的SSD的情况下,所提方法可以使SSD-SMR混合存储方案比普通磁盘方案的随机写性能提高20%,随机读性能提高5倍。

关键词: 大数据, 混合存储, 日志合并树, 闪存, 瓦记录磁盘

Abstract: Because of the higher requirements on the scalability,performance,and cost for storage systems proposed by big data,Shingled Magnetic Recording (SMR) disks are widely used in big data storage systems due to the high storage density and low cost.However,since the random write performance of SMR disks are usually weak,the hybrid storage consisted of both SMR disks and the fast Flash-based Solid State Drives (SSDs) can promote the performance significantly.Meanwhile,the write-optimized Log-Structured Merge (LSM) Tree-based key-valuestorage system have been widely used in many NoSQL systems,such as BigTable,Cassandra,HBase,etc.Therefore,how to construct a fast LSM tree key-value storage system based on SSD-SMR hybrid storage is a research problem with great practical significance.This paper first modeled the performance model of LSM tree key-value storage system based on SSD-SMR hybrid sto-rage,and then designed a performance-optimized LSM tree key-value storage system and implemented it based on Le-velDB.The evaluation results indicate that the system based on SSD-SMR hybrid storage improves the random-write performance by 20% and improves random-read performance by 6 times coupled with only a very small SSD (i.e.,0.4%~2% of disk capacity) compared with the HDD-based solution.

Key words: Big data, Flash, Hybrid storage, LSM tree, SMR HDD

中图分类号: 

  • TP333
[1]TURNER V,GANTZ J F,REINSEL D,et al.The digital universe of opportunities:Rich data and the increasing value of the internet of things[Z].IDC Analyze the Future,2014.
[2]GRAWINKEL M,NAGEL L,PADUA F,et al.Analysis of the ECMWF storage landscape[C]∥Usenix Conference on File and Storage Technologies.USENIX Association,2015:15-27.
[3]Amazon Web service[EB/OL].https://aws.amazon.com.
[4]Microsoft azure[EB/OL].https://azure.microsoft.com.
[5]Aliyun.com:An integrated suite of cloud products,services and solutions[EB/OL].http://www.alicloud.com.
[6]Apache couchdb[EB/OL].http://couchdb.apache.org.
[7]Tokyo cabinet:A modern implementation of dbm[EB/OL].http://fallabs.com/tokyocabinet.
[8]Redis[EB/OL].https://redis.io.
[9]CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:a distri-buted storage system for structured data[J].Acm Transactions on Computer Systems,2008,26(2):1-26.
[10]Apache hbase[EB/OL].http://hbase.apache.org.
[11]LAKSHMAN A,MALIK P.Cassandra:a decentralized structured storage system[J].Acm Sigops Operating Systems Review,2010,44(2):35-40.
[12]Ssdb-a fast nosql database for storing big list of data[EB/OL].https://github.com/ideawu/ssdb.
[13]AMER A,LONG D D E,MILLER E L,et al.Design issues for a shingled write disk system[C]∥IEEE,Symposium on MASS Storage Systems and Technologies.IEEE Computer Society.2010:1-12.
[14]AMER A,HOLLIDAY J A,DE LONG D,et al.Data management and layout for shingled magnetic recording[J].IEEE Transactions on Magnetics,2011,47(10):3691-3697.
[15]Leveldb:a fast and lightweight key/value database library by google[EB/OL].http://code.google.com/p/leveldb.
[16]FCOOPER B,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with ycsb[C]∥ACM Symposium on Cloud Computing.2010:143-154.
[17]PITCHUMANI R,HUGHES J,MILLER E L.SMRDB:Key-Value Data Store for Shingled Magnetic Recording Disks[C]∥8th ACM International Systems and Storage Conference.2015.
[18]YAO T,WAN J G,HUANGY P,et al.A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores[C]∥33rd International Conference on Massive Storage Systems and Technology.2017.
[19]SAXENA M,SWIFT M M,ZHANG Y Y.Flashtier:a light-weight,consistent and durable storage cache[C]∥Proceedings of the 7th ACM european conference on Computer Systems.2012:267-280.
[20]KGIL T,ROBERTS D,MUDGE T.Improving NAND FlashBased Disk Caches[C]∥International Symposium on Computer Architecture.IEEE,2008:327-338.
[21]YANG Q,REN J.I-CASH:Intelligently Coupled Array of SSD and HDD[C]∥IEEE,International Symposium on High PERFORMANCE Computer Architecture.IEEE,2011:278-289.
[22]SRINIVASAN M,SAAB P.A general purpose,write-back block cache for linux[EB/OL].https://github.com/facebookarchive/flashcache.
[23]THORNBER M S J,MAUELSHAGEN H.dm-cache[EB/OL].https://en.wikipedia.org/wiki/Dm-cache.
[24]Emc fast cache:A detailed review[EB/OL].http://www.emc.com/collateral/software/white-papers/h8046-clariioncelerra-unified-fast-cache-wp.pdf.
[25]Exadata smart flash cache features and the oracle exadata databas machine[EB/OL].http://www.oracle.com/technetwork/serverstorage/engineer/d-systems/exadata/exadata-smart-flash-cache-366203.pdf.
[26]ZHOU Y Y,CHEN Z F,LI K.Second-level buffer cache management[J].IEEE Transactions on parallel and distributed systems,2004,15(6):505-519.
[27]JIANG S,ZHANG X.Lirs:An efficient low inter-reference recency set replacement policy to improve buffer cache performance[C]∥Proceeding of 2002 ACM SIGMETRICS.2002.
[28]NMEGIDDO,MODHA D.Arc:a self-tuning,low over-head replacement cach[C]∥Proceedings of the 2nd USENIX Sympo-sium on File and Storage Technologies.2003.
[29]PRITCHETT T,THOTTETHODI M.SieveStore:a highly-selective,ensemble-level disk cache for cost-performance[C]∥International Symposium on Computer Architecture.ACM,2010:163-174.
[30]HUANG S,WEI Q,CHEN J,et al.Improving flash-based disk cache with lazy adaptive replacement[C]∥Proceedings of the 29th International Conference on Massive Storage Systems and Technology.2013.
[31]GREGG B.L2arc[EB/OL].https://blogs.oracle.com/bren-dan/entry/test.
[32]Under the hood:Building and open-sourcing rocksdb[EB/OL].http://goo.gl/9xulVB.
[33]ZHANG Z,YUE Y,HE B,et al.Pipelined Compaction for the LSM-Tree[C]∥IEEE,International Parallel and Distributed Processing Symposium.IEEE Computer Society,2014:777-786.
[34]WANG P,SUN G Y,JIANG S,et al.An efficient design and implementation of lsm-tree based key-value store on open-channel ssd[C]∥Proceedings of the Ninth European Conference on Computer Systems.2014.
[1] 陈晶, 吴玲玲.
多源异构环境下的车联网大数据混合属性特征检测方法
Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment
计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273
[2] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[3] 孙轩, 王焕骁.
政务大数据安全防护能力建设:基于技术和管理视角的探讨
Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives
计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010
[4] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[5] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[6] 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳.
面向大数据分析的智能交互向导系统
Smart Interactive Guide System for Big Data Analytics
计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083
[7] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓.
基于深度学习的民事案件判决结果分类方法研究
Study on Judicial Data Classification Method Based on Natural Language Processing Technologies
计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130
[8] 王雪岑, 张昱, 刘迎婕, 于戈.
基于表示学习的在线学习交互质量评价方法
Evaluation of Quality of Interaction in Online Learning Based on Representation Learning
计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042
[9] 滕建, 滕飞, 李天瑞.
基于3D卷积和LSTM编码解码的出行需求预测
Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder
计算机科学, 2021, 48(12): 195-203. https://doi.org/10.11896/jsjkx.210400022
[10] 张育龙, 王强, 陈明康, 孙静涛.
图像去雨算法在云物联网应用中的研究综述
Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems
计算机科学, 2021, 48(12): 231-242. https://doi.org/10.11896/jsjkx.201000055
[11] 曹萌, 于洋, 梁英, 史红周.
基于区块链的大数据交易关键技术与发展趋势
Key Technologies and Development Trends of Big Data Trade Based on Blockchain
计算机科学, 2021, 48(11A): 184-190. https://doi.org/10.11896/jsjkx.210100163
[12] 刘亚臣, 黄雪莹.
卫星监测时空大数据蠕变特征提取及预警算法
Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data
计算机科学, 2021, 48(11A): 258-264. https://doi.org/10.11896/jsjkx.201000071
[13] 张光君, 张翔.
应用“大数据+区块链”优化立法评估制度的机理与路径
Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain”
计算机科学, 2021, 48(10): 324-333. https://doi.org/10.11896/jsjkx.201200105
[14] 叶雅珍, 刘国华, 朱扬勇.
数据产品流通的两阶段授权模式
Two-step Authorization Pattern of Data Product Circulation
计算机科学, 2021, 48(1): 119-124. https://doi.org/10.11896/jsjkx.191100217
[15] 赵会群, 吴凯锋.
一种大数据估价算法
Big Data Valuation Algorithm
计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!