Computer Science ›› 2023, Vol. 50 ›› Issue (6A): 220600256-6.doi: 10.11896/jsjkx.220600256

• Big Data & Data Science • Previous Articles     Next Articles

GDLIN:A Learned Index By Gradient Descent

CHEN Shanshan1, GAO Jun2, MA Zhenyu3   

  1. School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:CHEN Shanshan,born in 1980,Ph.D,associate professor.Her main research interest is large-scale distributed sto-rage systems and architectures.
  • Supported by:
    National Natural Science Foundation of China(61972209).

Abstract: In the era of big data,data access speed is an important indicator to measure the performance of large-scale storage systems.Index is one of the main technologies to improve data access performance in database system.In recent years,learned index(LI) is proposed,which uses machine learning models instead of traditional B+-tree indexes,leverages pattern about the under-lying data distribution to train the models and optimize the indirect search of data query into the direct search of function calculation,learned index can speed up queries and reduce the size of an index.However,the fitting effect of LI is general,and it assumes that the data is static and read-only,it does not support modification operations such as insertion.This paper presents GDLIN,a novel form of a learned index,which uses gradient descent algorithm to fit the data.Gradient descent algorithm can reduce the error between the predict position and the actual position,which can reduce the cost of local research.Besides,GDLIN recursive calls the construction algorithm until only one model is created,which makes full use of keys’ distribution,and avoids the increase of the size of index with the data volume.In addition,GDLIN uses the sorted linked list to address the problem of data insertion.Experiment results demonstrate GDLIN improves the lookup throughput by 2.1× compared with the traditional B+-trees without insertion.Besides,GDLIN improves the lookup performance by 1.08× compared with the LI when the factor of insertion is 0.5.

Key words: Learned index, Gradient descent, Fitting data model, Linked list

CLC Number: 

  • TP391
[1]BAYER R,UNTERAUER K.Prefix B-trees[J].ACM Transactions on Database Systems,1977,2(1):11-26.
[2]STONEBRAKER M.The Case for Partial Indexes[J].SIGMOD Record,1989,18(4):4-11.
[3]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.FI-Ting-Tree:A Data-aware Index Structure[C]//The 2019 International Conference.2019:1189-1206.
[4]ZHOU X,CHAI C,LI G,et al.Database meets artificial intelligence:a survey[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(3):1096-1116.
[5]CHEN L,GAO Y,LI X,et al.Efficient metric indexing for simi-larity search and similarity joins[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(3):556-571.
[6]FU T C.A review on time series data mining[J].Engineering Applications of Artificial Intelligence,2011,24(1):164-181.
[7]EICHINGER F,EFROS P,KARNOUSKOS S,et al.A time-series compression technique and its application to the smart grid[J].VLDB Journal,2015,24(2):193-218.
[8]ESCH R E,EASTMAN W L.Computational methods for bestspline function approximation[J].Journal of Approximation Theory,1969,2(1):85-96.
[9]KEOGH E,CHU S,HART D,et al.An online algorithm forsegmenting time series[C]//Proceedings 2001 IEEE International Conference on Data Mining.IEEE,2002.
[10]LIU XY,LIN Z,WANG H.Novel online methods for time series segmentation[J].IEEE Transactions on Knowledge & Data Engineering,2008,20(12):1616-1626.
[11]NEUMANN T,MICHEL S.Smooth interpolating histogramswith error guarantees[C]//British National Conference on Databases.2008.
[12]SHATKAY H,ZDONIK S B.Approximate queries and repre-sentations for large data sequences[C]//ICDE.2019:536-545.
[13]XU Z,ZHANG R,RAMAMOHANARAO K,et al.An adaptive algorithm for online time series segmentation with error bound guarantee[C]//EDBT.2012:192-203.
[14]WANG L,WANG W J,JIANG G X.Optimization for Smoo-thing Parameter in Process of Data Fitting[J].Computer Science,2015,42(9):226-229.
[15]KRASKA T,BEUTEL A,CHI E H,et al.The case for learned index structures[C]//Proceedings of the International Confe-rence on Management of Data.2018:489-504.
[16]ZHANG H,ANDERSEN D G,PAVLO A,et al.Reducing the storage overhead of main-memory OLTP databases with hybrid indexes[C]//International Conference on Management of Data.ACM,2016.
[17]ZUKOWSKI M,HEMAN S,NES N,et al.Super-scalar RAM-CPU cache compression[C]//International Conference on Data Engineering.IEEE,2006.
[18]BÖHM M,SCHLEGEL B,VOLK P B,et al.Efficient In-Memory indexing with generalized prefix tees[C]//DBLP.2011.
[19]RAO J,ROSS K A.Cache conscious indexing for Decision-Support in main memory[C]//Proceedings of the International Conference on Very Large Data(VLDB).Morgan Kaufmann/ACM,1999:78-89.
[20]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.A-Tree:a bounded approximate index structure[J].arXiv:1801.10207,2018.
[21]ATHANASSOULIS M,AILAMAKI A.BF-Tree:approximate tree indexing[C]//International Conference on Very Large Databases.2014.
[22]HWANG D,KIM W H,WON Y,et al.Endurable transient inconsistency in byte-addressable persistent B+-tree[C]//16th USENIX Conference on File and Storage Technologies.2018:187.
[23]RAO J,ROSS K A.Making B+-trees cache conscious in main memory[C]//Proceedings of the Conference on Management of Data(SIGMOD).ACM,2000:475-486.
[24]CHEN S,GIBBONS P B,MOWRY T C.Improving index performance through prefetching[J].ACM SIGMOD Record,2002,30(2):235-246.
[25]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:the sparsely-gated mixture-of-experts layer[C]//Proceedings of the International Conference.on Learning Representations.2017.
[26]BRAESS D.Chebyshev approximation by spline functions with free knots[J].IMA J NUMER ANAL,1992,12(5):357-366.
[27]COOPER B F,SILBERSTEIN A,TAM E,et al.Benchmarking cloud serving systems withYCSB[C]//Proceedings of the 1st ACM Symposium on Cloud Computing.2010:143-154.
[1] WANG Yitan, WANG Yishu, YUAN Ye. Survey of Learned Index [J]. Computer Science, 2023, 50(1): 1-8.
[2] YANG Li, LI Xin-yu, SHI Huai-feng, PAN Cheng-sheng. Task Intelligent Identification Method for Spatial Information Network [J]. Computer Science, 2020, 47(4): 262-269.
[3] LIU Xiao-tong,WANG Wei,LI Ze-yu,SHEN Si-wan,JIANG Xiao-ming. Recognition Algorithm of Red and White Cells in Urine Based on Improved BP Neural Network [J]. Computer Science, 2020, 47(2): 102-105.
[4] FENG Jin-zhan, CAI Shu-qin. Helpfulness Degree Prediction Model of Online Reviews Fusing Information Gain and Gradient Decline Algorithms [J]. Computer Science, 2020, 47(10): 69-74.
[5] ZHANG Xuan, JIANG Chao, LI Xiao-qiang, YAN Sha. Gradient Descent Bit-flipping Decoding Algorithm Based on Updating of Variable Nodes [J]. Computer Science, 2018, 45(8): 80-83.
[6] TAO Bing-mo,LU Shu-xia. Adaptive Stochastic Gradient Descent for Imbalanced Data Classification [J]. Computer Science, 2018, 45(6A): 487-492.
[7] HU Wen-jun,WANG Juan,WANG Pei-liang and WANG Shi-tong. Fast Model of Ensembling Linear Support Vector Machines Suitable for Large Datasets [J]. Computer Science, 2014, 41(5): 245-249.
[8] YE Di-qiu,CHENG Dong-nian and LI Yu-feng. NetFlow Based on Parallel Single Multi-linked Lists [J]. Computer Science, 2013, 40(9): 73-77.
[9] SHAO Jie,DU Li-juan and YANG Jing-yu. Applications of XCSG in Multi-robot Reinforcement Learning [J]. Computer Science, 2013, 40(8): 249-251.
[10] DANG Jian-wu,HANG Li-hua,WANG Yang-ping and DU Xiao-gang. 2D-3D Medical Image Registration Based on GPU [J]. Computer Science, 2013, 40(4): 306-309.
[11] . Optimization Model for Performance of Molecular Dynamics Simulation Based on Modified Cell-linked List Method [J]. Computer Science, 2013, 40(2): 12-15.
[12] YU Qing-jun,YU Bao-chu,TANG Zhen-an. Fast Color Quantization Algorithm Based on Linked List [J]. Computer Science, 2011, 38(2): 264-266.
[13] . [J]. Computer Science, 2008, 35(7): 96-98.
[14] . [J]. Computer Science, 2007, 34(3): 181-185.
[15] GUO Fang-Fang ,YANG Yong-Tian (School of Computer Science and Technology, Harbin Engineering University, Harbin 150001). [J]. Computer Science, 2006, 33(11): 34-37.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!