计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 61-65.doi: 10.11896/j.issn.1002-137X.2018.07.009
王洋洋,韦皓诚,柴云鹏
WANG Yang-yang,WEI Hao-cheng,CHAI Yun-peng
摘要: 大数据对存储系统的可扩展性、性能和成本等方面提出了更高的要求。瓦记录(Shingled Magnetic Recor-ding,SMR)硬盘由于存储密度高、价格便宜,正逐步被广泛应用于大数据存储系统。但是,SMR硬盘的随机写性能较差,与快速的基于闪存的固态硬盘(Solid State Drive,SSD)一起构成混合存储时可以显著提升性能。同时,基于写优化的日志结构合并(Log-Structured Merge,LSM)树的键值存储已被广泛应用于许多NoSQL系统,如BigTable,Cassandra和HBase等。因此,如何基于新型的SSD-SMR混合存储构建出高性能的LSM树键值存储系统是一个具有很大研究价值的问题。首先建立基于SSD-SMR混合存储的LSM树键值系统的性能模型,然后针对SSD和SMR的硬件特征以及LSM树键值存储的软件特点,设计了一套面向SSD-SMR混合存储进行性能优化的LSM树键值存储系统,并基于LevelDB实现了该系统。在仅仅使用0.4%~2%空间的SSD的情况下,所提方法可以使SSD-SMR混合存储方案比普通磁盘方案的随机写性能提高20%,随机读性能提高5倍。
中图分类号:
[1]TURNER V,GANTZ J F,REINSEL D,et al.The digital universe of opportunities:Rich data and the increasing value of the internet of things[Z].IDC Analyze the Future,2014. [2]GRAWINKEL M,NAGEL L,PADUA F,et al.Analysis of the ECMWF storage landscape[C]∥Usenix Conference on File and Storage Technologies.USENIX Association,2015:15-27. [3]Amazon Web service[EB/OL].https://aws.amazon.com. [4]Microsoft azure[EB/OL].https://azure.microsoft.com. [5]Aliyun.com:An integrated suite of cloud products,services and solutions[EB/OL].http://www.alicloud.com. [6]Apache couchdb[EB/OL].http://couchdb.apache.org. [7]Tokyo cabinet:A modern implementation of dbm[EB/OL].http://fallabs.com/tokyocabinet. [8]Redis[EB/OL].https://redis.io. [9]CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:a distri-buted storage system for structured data[J].Acm Transactions on Computer Systems,2008,26(2):1-26. [10]Apache hbase[EB/OL].http://hbase.apache.org. [11]LAKSHMAN A,MALIK P.Cassandra:a decentralized structured storage system[J].Acm Sigops Operating Systems Review,2010,44(2):35-40. [12]Ssdb-a fast nosql database for storing big list of data[EB/OL].https://github.com/ideawu/ssdb. [13]AMER A,LONG D D E,MILLER E L,et al.Design issues for a shingled write disk system[C]∥IEEE,Symposium on MASS Storage Systems and Technologies.IEEE Computer Society.2010:1-12. [14]AMER A,HOLLIDAY J A,DE LONG D,et al.Data management and layout for shingled magnetic recording[J].IEEE Transactions on Magnetics,2011,47(10):3691-3697. [15]Leveldb:a fast and lightweight key/value database library by google[EB/OL].http://code.google.com/p/leveldb. [16]FCOOPER B,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with ycsb[C]∥ACM Symposium on Cloud Computing.2010:143-154. [17]PITCHUMANI R,HUGHES J,MILLER E L.SMRDB:Key-Value Data Store for Shingled Magnetic Recording Disks[C]∥8th ACM International Systems and Storage Conference.2015. [18]YAO T,WAN J G,HUANGY P,et al.A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores[C]∥33rd International Conference on Massive Storage Systems and Technology.2017. [19]SAXENA M,SWIFT M M,ZHANG Y Y.Flashtier:a light-weight,consistent and durable storage cache[C]∥Proceedings of the 7th ACM european conference on Computer Systems.2012:267-280. [20]KGIL T,ROBERTS D,MUDGE T.Improving NAND FlashBased Disk Caches[C]∥International Symposium on Computer Architecture.IEEE,2008:327-338. [21]YANG Q,REN J.I-CASH:Intelligently Coupled Array of SSD and HDD[C]∥IEEE,International Symposium on High PERFORMANCE Computer Architecture.IEEE,2011:278-289. [22]SRINIVASAN M,SAAB P.A general purpose,write-back block cache for linux[EB/OL].https://github.com/facebookarchive/flashcache. [23]THORNBER M S J,MAUELSHAGEN H.dm-cache[EB/OL].https://en.wikipedia.org/wiki/Dm-cache. [24]Emc fast cache:A detailed review[EB/OL].http://www.emc.com/collateral/software/white-papers/h8046-clariioncelerra-unified-fast-cache-wp.pdf. [25]Exadata smart flash cache features and the oracle exadata databas machine[EB/OL].http://www.oracle.com/technetwork/serverstorage/engineer/d-systems/exadata/exadata-smart-flash-cache-366203.pdf. [26]ZHOU Y Y,CHEN Z F,LI K.Second-level buffer cache management[J].IEEE Transactions on parallel and distributed systems,2004,15(6):505-519. [27]JIANG S,ZHANG X.Lirs:An efficient low inter-reference recency set replacement policy to improve buffer cache performance[C]∥Proceeding of 2002 ACM SIGMETRICS.2002. [28]NMEGIDDO,MODHA D.Arc:a self-tuning,low over-head replacement cach[C]∥Proceedings of the 2nd USENIX Sympo-sium on File and Storage Technologies.2003. [29]PRITCHETT T,THOTTETHODI M.SieveStore:a highly-selective,ensemble-level disk cache for cost-performance[C]∥International Symposium on Computer Architecture.ACM,2010:163-174. [30]HUANG S,WEI Q,CHEN J,et al.Improving flash-based disk cache with lazy adaptive replacement[C]∥Proceedings of the 29th International Conference on Massive Storage Systems and Technology.2013. [31]GREGG B.L2arc[EB/OL].https://blogs.oracle.com/bren-dan/entry/test. [32]Under the hood:Building and open-sourcing rocksdb[EB/OL].http://goo.gl/9xulVB. [33]ZHANG Z,YUE Y,HE B,et al.Pipelined Compaction for the LSM-Tree[C]∥IEEE,International Parallel and Distributed Processing Symposium.IEEE Computer Society,2014:777-786. [34]WANG P,SUN G Y,JIANG S,et al.An efficient design and implementation of lsm-tree based key-value store on open-channel ssd[C]∥Proceedings of the Ninth European Conference on Computer Systems.2014. |
[1] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[2] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[3] | 孙轩, 王焕骁. 政务大数据安全防护能力建设:基于技术和管理视角的探讨 Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives 计算机科学, 2022, 49(4): 67-73. https://doi.org/10.11896/jsjkx.211000010 |
[4] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[5] | 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究 Research on Big Data Governance for Science and Technology Forecast 计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207 |
[6] | 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统 Smart Interactive Guide System for Big Data Analytics 计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083 |
[7] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[8] | 王雪岑, 张昱, 刘迎婕, 于戈. 基于表示学习的在线学习交互质量评价方法 Evaluation of Quality of Interaction in Online Learning Based on Representation Learning 计算机科学, 2021, 48(2): 207-211. https://doi.org/10.11896/jsjkx.201000042 |
[9] | 滕建, 滕飞, 李天瑞. 基于3D卷积和LSTM编码解码的出行需求预测 Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder 计算机科学, 2021, 48(12): 195-203. https://doi.org/10.11896/jsjkx.210400022 |
[10] | 张育龙, 王强, 陈明康, 孙静涛. 图像去雨算法在云物联网应用中的研究综述 Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems 计算机科学, 2021, 48(12): 231-242. https://doi.org/10.11896/jsjkx.201000055 |
[11] | 曹萌, 于洋, 梁英, 史红周. 基于区块链的大数据交易关键技术与发展趋势 Key Technologies and Development Trends of Big Data Trade Based on Blockchain 计算机科学, 2021, 48(11A): 184-190. https://doi.org/10.11896/jsjkx.210100163 |
[12] | 刘亚臣, 黄雪莹. 卫星监测时空大数据蠕变特征提取及预警算法 Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data 计算机科学, 2021, 48(11A): 258-264. https://doi.org/10.11896/jsjkx.201000071 |
[13] | 张光君, 张翔. 应用“大数据+区块链”优化立法评估制度的机理与路径 Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain” 计算机科学, 2021, 48(10): 324-333. https://doi.org/10.11896/jsjkx.201200105 |
[14] | 叶雅珍, 刘国华, 朱扬勇. 数据产品流通的两阶段授权模式 Two-step Authorization Pattern of Data Product Circulation 计算机科学, 2021, 48(1): 119-124. https://doi.org/10.11896/jsjkx.191100217 |
[15] | 赵会群, 吴凯锋. 一种大数据估价算法 Big Data Valuation Algorithm 计算机科学, 2020, 47(9): 110-116. https://doi.org/10.11896/jsjkx.191000156 |
|