计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220600256-6.doi: 10.11896/jsjkx.220600256
陈珊珊, 高隽, 马振禹
CHEN Shanshan1, GAO Jun2, MA Zhenyu3
摘要: 在大数据时代,数据访问速度是衡量大规模存储系统性能的一个重要指标,而索引是用于提升数据库系统中数据存取性能的主要技术之一。近几年,使用机器学习模型代替B+树等传统索引,拟合数据分布规律,将数据的间接查找优化为函数直接计算的学习索引(Learned Index,LI)被提出,LI提高了查询的速度,减少了索引空间开销。但是LI的拟合误差较大,不支持插入等修改性操作。文中提出了一种利用梯度下降算法拟合数据的学习索引模型GDLIN(A Learned Index By Gradient Descent)。GDLIN利用梯度下降算法更好地拟合数据,减少拟合误差,缩短本地查找的时间;同时递归调用数据拟合算法,充分利用键的分布规律,构建上层结构,避免索引结构随着数据量而增大。另外,GDLIN利用链表解决LI不支持数据插入的问题。实验结果表明,GDLIN在无新数据插入的情况下,吞吐量是B+树的2.1倍;在插入操作占比为50%的情况下,是LI的1.08倍。
中图分类号:
[1]BAYER R,UNTERAUER K.Prefix B-trees[J].ACM Transactions on Database Systems,1977,2(1):11-26. [2]STONEBRAKER M.The Case for Partial Indexes[J].SIGMOD Record,1989,18(4):4-11. [3]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.FI-Ting-Tree:A Data-aware Index Structure[C]//The 2019 International Conference.2019:1189-1206. [4]ZHOU X,CHAI C,LI G,et al.Database meets artificial intelligence:a survey[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(3):1096-1116. [5]CHEN L,GAO Y,LI X,et al.Efficient metric indexing for simi-larity search and similarity joins[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(3):556-571. [6]FU T C.A review on time series data mining[J].Engineering Applications of Artificial Intelligence,2011,24(1):164-181. [7]EICHINGER F,EFROS P,KARNOUSKOS S,et al.A time-series compression technique and its application to the smart grid[J].VLDB Journal,2015,24(2):193-218. [8]ESCH R E,EASTMAN W L.Computational methods for bestspline function approximation[J].Journal of Approximation Theory,1969,2(1):85-96. [9]KEOGH E,CHU S,HART D,et al.An online algorithm forsegmenting time series[C]//Proceedings 2001 IEEE International Conference on Data Mining.IEEE,2002. [10]LIU XY,LIN Z,WANG H.Novel online methods for time series segmentation[J].IEEE Transactions on Knowledge & Data Engineering,2008,20(12):1616-1626. [11]NEUMANN T,MICHEL S.Smooth interpolating histogramswith error guarantees[C]//British National Conference on Databases.2008. [12]SHATKAY H,ZDONIK S B.Approximate queries and repre-sentations for large data sequences[C]//ICDE.2019:536-545. [13]XU Z,ZHANG R,RAMAMOHANARAO K,et al.An adaptive algorithm for online time series segmentation with error bound guarantee[C]//EDBT.2012:192-203. [14]WANG L,WANG W J,JIANG G X.Optimization for Smoo-thing Parameter in Process of Data Fitting[J].Computer Science,2015,42(9):226-229. [15]KRASKA T,BEUTEL A,CHI E H,et al.The case for learned index structures[C]//Proceedings of the International Confe-rence on Management of Data.2018:489-504. [16]ZHANG H,ANDERSEN D G,PAVLO A,et al.Reducing the storage overhead of main-memory OLTP databases with hybrid indexes[C]//International Conference on Management of Data.ACM,2016. [17]ZUKOWSKI M,HEMAN S,NES N,et al.Super-scalar RAM-CPU cache compression[C]//International Conference on Data Engineering.IEEE,2006. [18]BÖHM M,SCHLEGEL B,VOLK P B,et al.Efficient In-Memory indexing with generalized prefix tees[C]//DBLP.2011. [19]RAO J,ROSS K A.Cache conscious indexing for Decision-Support in main memory[C]//Proceedings of the International Conference on Very Large Data(VLDB).Morgan Kaufmann/ACM,1999:78-89. [20]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.A-Tree:a bounded approximate index structure[J].arXiv:1801.10207,2018. [21]ATHANASSOULIS M,AILAMAKI A.BF-Tree:approximate tree indexing[C]//International Conference on Very Large Databases.2014. [22]HWANG D,KIM W H,WON Y,et al.Endurable transient inconsistency in byte-addressable persistent B+-tree[C]//16th USENIX Conference on File and Storage Technologies.2018:187. [23]RAO J,ROSS K A.Making B+-trees cache conscious in main memory[C]//Proceedings of the Conference on Management of Data(SIGMOD).ACM,2000:475-486. [24]CHEN S,GIBBONS P B,MOWRY T C.Improving index performance through prefetching[J].ACM SIGMOD Record,2002,30(2):235-246. [25]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:the sparsely-gated mixture-of-experts layer[C]//Proceedings of the International Conference.on Learning Representations.2017. [26]BRAESS D.Chebyshev approximation by spline functions with free knots[J].IMA J NUMER ANAL,1992,12(5):357-366. [27]COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems withYCSB[C]//Proceedings of the 1st ACM Symposium on Cloud Computing.2010:143-154. |
|