Computer Science ›› 2018, Vol. 45 ›› Issue (7): 61-65.doi: 10.11896/j.issn.1002-137X.2018.07.009

• NCIS 2017 • Previous Articles     Next Articles

Performance Optimization of LSM Tree Key-value Storage System Based on SSD-SMR Hybrid Storage

WANG Yang-yang,WEI Hao-cheng,CHAI Yun-peng   

  1. School of Information,Renmin University of China,Beijing 100872,China
  • Received:2017-07-16 Online:2018-07-30 Published:2018-07-30

Abstract: Because of the higher requirements on the scalability,performance,and cost for storage systems proposed by big data,Shingled Magnetic Recording (SMR) disks are widely used in big data storage systems due to the high storage density and low cost.However,since the random write performance of SMR disks are usually weak,the hybrid storage consisted of both SMR disks and the fast Flash-based Solid State Drives (SSDs) can promote the performance significantly.Meanwhile,the write-optimized Log-Structured Merge (LSM) Tree-based key-valuestorage system have been widely used in many NoSQL systems,such as BigTable,Cassandra,HBase,etc.Therefore,how to construct a fast LSM tree key-value storage system based on SSD-SMR hybrid storage is a research problem with great practical significance.This paper first modeled the performance model of LSM tree key-value storage system based on SSD-SMR hybrid sto-rage,and then designed a performance-optimized LSM tree key-value storage system and implemented it based on Le-velDB.The evaluation results indicate that the system based on SSD-SMR hybrid storage improves the random-write performance by 20% and improves random-read performance by 6 times coupled with only a very small SSD (i.e.,0.4%~2% of disk capacity) compared with the HDD-based solution.

Key words: Big data, Flash, Hybrid storage, LSM tree, SMR HDD

CLC Number: 

  • TP333
[1]TURNER V,GANTZ J F,REINSEL D,et al.The digital universe of opportunities:Rich data and the increasing value of the internet of things[Z].IDC Analyze the Future,2014.
[2]GRAWINKEL M,NAGEL L,PADUA F,et al.Analysis of the ECMWF storage landscape[C]∥Usenix Conference on File and Storage Technologies.USENIX Association,2015:15-27.
[3]Amazon Web service[EB/OL].
[4]Microsoft azure[EB/OL].
[5] integrated suite of cloud products,services and solutions[EB/OL].
[6]Apache couchdb[EB/OL].
[7]Tokyo cabinet:A modern implementation of dbm[EB/OL].
[9]CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:a distri-buted storage system for structured data[J].Acm Transactions on Computer Systems,2008,26(2):1-26.
[10]Apache hbase[EB/OL].
[11]LAKSHMAN A,MALIK P.Cassandra:a decentralized structured storage system[J].Acm Sigops Operating Systems Review,2010,44(2):35-40.
[12]Ssdb-a fast nosql database for storing big list of data[EB/OL].
[13]AMER A,LONG D D E,MILLER E L,et al.Design issues for a shingled write disk system[C]∥IEEE,Symposium on MASS Storage Systems and Technologies.IEEE Computer Society.2010:1-12.
[14]AMER A,HOLLIDAY J A,DE LONG D,et al.Data management and layout for shingled magnetic recording[J].IEEE Transactions on Magnetics,2011,47(10):3691-3697.
[15]Leveldb:a fast and lightweight key/value database library by google[EB/OL].
[16]FCOOPER B,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with ycsb[C]∥ACM Symposium on Cloud Computing.2010:143-154.
[17]PITCHUMANI R,HUGHES J,MILLER E L.SMRDB:Key-Value Data Store for Shingled Magnetic Recording Disks[C]∥8th ACM International Systems and Storage Conference.2015.
[18]YAO T,WAN J G,HUANGY P,et al.A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores[C]∥33rd International Conference on Massive Storage Systems and Technology.2017.
[19]SAXENA M,SWIFT M M,ZHANG Y Y.Flashtier:a light-weight,consistent and durable storage cache[C]∥Proceedings of the 7th ACM european conference on Computer Systems.2012:267-280.
[20]KGIL T,ROBERTS D,MUDGE T.Improving NAND FlashBased Disk Caches[C]∥International Symposium on Computer Architecture.IEEE,2008:327-338.
[21]YANG Q,REN J.I-CASH:Intelligently Coupled Array of SSD and HDD[C]∥IEEE,International Symposium on High PERFORMANCE Computer Architecture.IEEE,2011:278-289.
[22]SRINIVASAN M,SAAB P.A general purpose,write-back block cache for linux[EB/OL].
[24]Emc fast cache:A detailed review[EB/OL].
[25]Exadata smart flash cache features and the oracle exadata databas machine[EB/OL].
[26]ZHOU Y Y,CHEN Z F,LI K.Second-level buffer cache management[J].IEEE Transactions on parallel and distributed systems,2004,15(6):505-519.
[27]JIANG S,ZHANG X.Lirs:An efficient low inter-reference recency set replacement policy to improve buffer cache performance[C]∥Proceeding of 2002 ACM SIGMETRICS.2002.
[28]NMEGIDDO,MODHA D.Arc:a self-tuning,low over-head replacement cach[C]∥Proceedings of the 2nd USENIX Sympo-sium on File and Storage Technologies.2003.
[29]PRITCHETT T,THOTTETHODI M.SieveStore:a highly-selective,ensemble-level disk cache for cost-performance[C]∥International Symposium on Computer Architecture.ACM,2010:163-174.
[30]HUANG S,WEI Q,CHEN J,et al.Improving flash-based disk cache with lazy adaptive replacement[C]∥Proceedings of the 29th International Conference on Massive Storage Systems and Technology.2013.
[31]GREGG B.L2arc[EB/OL].
[32]Under the hood:Building and open-sourcing rocksdb[EB/OL].
[33]ZHANG Z,YUE Y,HE B,et al.Pipelined Compaction for the LSM-Tree[C]∥IEEE,International Parallel and Distributed Processing Symposium.IEEE Computer Society,2014:777-786.
[34]WANG P,SUN G Y,JIANG S,et al.An efficient design and implementation of lsm-tree based key-value store on open-channel ssd[C]∥Proceedings of the Ninth European Conference on Computer Systems.2014.
[1] CHEN Jing, WU Ling-ling. Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment [J]. Computer Science, 2022, 49(8): 108-112.
[2] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[3] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[4] SUN Xuan, WANG Huan-xiao. Capability Building for Government Big Data Safety Protection:Discussions from Technologicaland Management Perspectives [J]. Computer Science, 2022, 49(4): 67-73.
[5] WANG Jun, WANG Xiu-lai, PANG Wei, ZHAO Hong-fei. Research on Big Data Governance for Science and Technology Forecast [J]. Computer Science, 2021, 48(9): 36-42.
[6] YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang. Smart Interactive Guide System for Big Data Analytics [J]. Computer Science, 2021, 48(9): 110-117.
[7] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[8] WANG Xue-cen, ZHANG Yu, LIU Ying-jie, YU Ge. Evaluation of Quality of Interaction in Online Learning Based on Representation Learning [J]. Computer Science, 2021, 48(2): 207-211.
[9] TENG Jian, TENG Fei, LI Tian-rui. Travel Demand Forecasting Based on 3D Convolution and LSTM Encoder-Decoder [J]. Computer Science, 2021, 48(12): 195-203.
[10] ZHANG Yu-long, WANG Qiang, CHEN Ming-kang, SUN Jing-tao. Survey of Intelligent Rain Removal Algorithms for Cloud-IoT Systems [J]. Computer Science, 2021, 48(12): 231-242.
[11] LIU Ya-chen, HUANG Xue-ying. Research on Creep Feature Extraction and Early Warning Algorithm Based on Satellite MonitoringSpatial-Temporal Big Data [J]. Computer Science, 2021, 48(11A): 258-264.
[12] ZHANG Guang-jun, ZHANG Xiang. Mechanism and Path of Optimizing Institution of Legislative Evaluation by Applying “Big Data+Blockchain” [J]. Computer Science, 2021, 48(10): 324-333.
[13] YE Ya-zhen, LIU Guo-hua, ZHU Yang-yong. Two-step Authorization Pattern of Data Product Circulation [J]. Computer Science, 2021, 48(1): 119-124.
[14] ZHAO Hui-qun, WU Kai-feng. Big Data Valuation Algorithm [J]. Computer Science, 2020, 47(9): 110-116.
[15] MA Meng-yu, WU Ye, CHEN Luo, WU Jiang-jiang, LI Jun, JING Ning. Display-oriented Data Visualization Technique for Large-scale Geographic Vector Data [J]. Computer Science, 2020, 47(9): 117-122.
Full text



No Suggested Reading articles found!