计算机科学 ›› 2025, Vol. 52 ›› Issue (2): 33-41.doi: 10.11896/jsjkx.240200001
许睿达1, 李永坤1,2, 许胤龙1,2
XU Ruida1, LI Yongkun1,2, XU Yinlong1,2
摘要: 在数据量飞速增长的大数据时代背景下,基于日志结构合并树的(Log-Structured Merge-Tree-based,LSM-Tree-based)键值存储系统因其优秀的灵活性与扩展性被广泛应用于NoSQL系统。但是,传统的LSM-Tree结构键值存储系统在查询数据时,因搜索多个SSTable引起的读放大问题会产生额外的I/O开销,影响系统性能。针对这一问题,提出了一种新型键值存储系统优化设计方案——FCLI-LSM。FCLI-LSM结合了细粒度键值对缓存和学习型索引的优化方法,旨在提升基于LSM-Tree结构的键值存储系统的查询性能。通过对数据访问热点的分析,FCLI-LSM对数据进行热、温、冷数据的三级分级。FCLI-LSM为热数据设计了基于键值分离的细粒度缓存机制,有效减少了读放大问题带来的额外I/O开销;此外,还设计了一种针对学习型索引的缓存亲和优化,进一步提高了存储系统对温数据的查询效率。实验结果表明,与现有的查询优化方案相比,FCLI-LSM能带来超过40%的平均查询时延下降以及超过1.7倍的系统吞吐率提升。
中图分类号:
[1]REINSEL D,GANTZ J,RYDNING J.Data age 2025:the digitization of the world from edge to core[J/OL].https://www.seagate.com/files/www-content/our-story/trends/files/idc-sea-gate-dataage-whitepaper.pdf. [2]O'NEIL P,CHENG E,GAWLICK D,et al.The log-structured merge-tree(LSM-tree)[J].Acta Informatica,1996,33:351-385. [3]GHEMAWAT S,DEAN J.LevelDB[EB/OL].https://git-hub.com/google/leveldb. [4]FACEBOOK.RocksDB[EB/OL].https://github.com/facebook/rocksdb. [5]LU L,PILLAI T S,GOPALAKRISHNAN H,et al.Wisckey:Separating keys from values in ssd-conscious storage[J].ACM Transactions on Storage(TOS),2017,13(1):1-28. [6]KRASKA T,BEUTEL A,CHI E H,et al.The case for learned index structures[C]//Proceedings of the 2018 International Conference on Management of Data.2018:489-504. [7]DAI Y,XU Y,GANESAN A,et al.From {WiscKey} to Bourbon:A Learned Index for {Log-Structured} Merge Trees[C]//14th USENIX Symposium on Operating Systems Design and Implementation(OSDI 20).2020:155-171. [8]ZHANG J,WANG F,DONG C.HaLSM:A Hotspot-awareLSM-tree based Key-Value Storage Engine[C]//2022 IEEE 40th International Conference on Computer Design(ICCD).IEEE,2022:179-186. [9]CAO Z,DONG S,VEMURI S,et al.Characterizing,modeling,and benchmarking {RocksDB}{Key-Value} workloads at facebook[C]//18th USENIX Conference on File and Storage Technologies(FAST 20).2020:209-223. [10]ATIKOGLU B,XU Y,FRACHTENBERG E,et al.Workloadanalysis of a large-scale key-value store[C]//Proceedings of the 12th ACM Sigmetrics/Performance Joint International Confe-rence on Measurement and Modeling of Computer Systems.2012:53-64. [11]COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems with YCSB[C]//Proceedings of the 1st ACM Symposium on Cloud Computing.2010:143-154. [12]COMER D.Ubiquitous B-tree[J].ACM Computing Surveys,1979,11(2):121-137. [13]SIVASUBRAMANIAN S.Amazon dynamoDB:a seamlesslyscalable non-relational database service[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.2012:729-730. [14]CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:A distri-buted storage system for structured data[J].ACM Transactions on Computer Systems(TOCS),2008,26(2):1-26. [15]LAKSHMAN A,MALIK P.Cassandra:a decentralized struc-tured storage system[J].ACM SIGOPS Operating Systems Review,2010,44(2):35-40. [16]PUGH W.Skip lists:a probabilistic alternative to balanced trees[J].Communications of the ACM,1990,33(6):668-676. [17]DGRAPH.BadgerDB[EB/OL].https://github.com/dgraph-io/badger. [18]PINGCAP.Titan[EB/OL].https://github.com/tikv/titan. [19]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.Fiting-tree:A data-aware index structure[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1189-1206. [20]DING J,MINHAS U F,YU J,et al.ALEX:an updatable adaptive learned index[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:969-984. [21]XIE Q,PANG C,ZHOU X,et al.Maximum error-boundedpiecewise linear representation for online stream approximation[J].The VLDB Journal,2014,23:915-937. [22]OPENSTREETMAP.OpenStreetMap[EB/OL].https://www.openstreetmap.org/. [23]TENG D,GUO L,LEE R,et al.LSbM-tree:Re-enabling buffer caching in data management for mixed reads and writes[C]//2017 IEEE 37th International Conference on Distributed Computing Systems(ICDCS).IEEE,2017:68-79. [24]WU F,YANG M H,ZHANG B,et al.{AC-Key}:Adaptive Ca-ching for {LSM-based}{Key-Value} Stores[C]//2020 USENIX Annual Technical Conference(USENIX ATC 20).2020:603-615. [25]YANG L,WU H,ZHANG T,et al.Leaper:A learned prefe-tcher for cache invalidation in LSM-tree based storage engines[J].Proceedings of the VLDB Endowment,2020,13(12):1976-1989. [26]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.Fiting-tree:A data-aware index structure[C]//Proceedings of the 2019 International Conference on Management of Da Data.2019:1189-1206. [27]FERRAGINA P,VINCIGUERRA G.The PGM-index:a fully-dynamic compressed learned index with provable worst-case bounds[J].Proceedings of the VLDB Endowment,2020,13(8):1162-1175. [28]KIPF A,MARCUS R,VAN RENEN A,et al.RadixSpline:asingle-pass learned index[C]//Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management.2020:1-5. [29]DAYAN N,ATHANASSOULIS M,IDREOS S.Monkey:Optimal navigable key-value store[C]//Proceedings of the 2017 ACM International Conference on Management of Da Data.2017:79-94. [30]LI Y,TIAN C,GUO F,et al.{ElasticBF}:Elastic Bloom Filter with Hotness Awareness for Boosting Read Performance in Large {Key-Value} Stores[C]//2019 USENIX Annual Technical Conference(USENIX ATC 19).2019:739-752. |
|