Computer Science ›› 2023, Vol. 50 ›› Issue (1): 1-8.doi: 10.11896/jsjkx.211000149

• Database & Big Data & Data Science • Previous Articles     Next Articles

Survey of Learned Index

WANG Yitan1, WANG Yishu1, YUAN Ye2   

  1. 1 School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
    2 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2021-10-20 Revised:2022-06-07 Online:2023-01-15 Published:2023-01-09
  • About author:WANG Yitan,born in 1997,postgra-duate.Her main research interests include data structure and temporal graph.
    YUAN Ye,born in 1981,Ph.D,professor,Ph.D supervisor.His main research interests include graph databases,pro-babilistic databases,data privacy-preserving,and cloud computing.
  • Supported by:
    National Key Research and Development Program of China(2022YFB2702100) and National Natural Science Foundation of China(61932004,62225203,U21A20516).

Abstract: Due to the explosive growth of data in the era of big data,it is difficult for the traditional index structures to handle this huge and complex data.In order to solve this problem,the learned index has emerged and become one of the most popular research topics in the database.Learned indexes employ machine learning models for index construction.By training and learning the relationship between data and physical location,the learning model can be obtained so as to master the distribution characte-ristics between the two to realize the improvement and optimization of the traditional index.Extensive experiments show that learned indexes can adapt to large-scale data sets,and provide better search performance with lower memory requirements than traditional indexes.This paper introduces the applications of learned indexes and reviews the existing learned index models.According to data types,learned indexes are divided into two categories:one-dimensional and multi-dimensional.The advantages,disadvantages,and supported searches of learned index models in each category are also introduced and analyzed in detail.Finally,some future research directions of learned indexes are prospected to provide references for related researches.

Key words: Learned index, Machine learning, Index construction, Data structure, Database

CLC Number: 

  • TP311
[1]PAGH R,RODLER F F.Cuckoo Hashing[J].Journal of Algorithms,2004,51(2):122-144.
[2]ATHANASSOULIS M,AILAMAKI A.BF-Tree:ApproximateTree Indexing[C]//International Conference on Very Large Databases.2014:1881-1892.
[3]WITTEN I H,MOFFAT A,BELL T C.Managing Gigabytes:Compressing and Indexing Documents and Images[J].SIGMOD Record,1999,33(2):113-114.
[4]AREF W G,BARBARÁ D,VALLABHANENI P,et al.Thehandwritten trie:indexing electronic ink[C]//ACM Sigmod International Conference on Management of Data.1995:151-162.
[5]WANG J,LIN C,PAPAKONSTANTINOU Y,et al.An Expe-rimental Study of Bitmap Compression vs.Inverted List Compression[C]//the 2017 ACM International Conference.2017:993-1008.
[6]BENDER M A,DEMAINE E D,FARACH-COLTON M.Cache-oblivious B-trees[J].Siam Journal on Computing,2000,35(2):300-409.
[7]JENSEN C S,LIN D,OOI B C.Query and update efficient B+-tree based indexing of moving objects[C]//Proceedings of the Thirtieth International Conference on Very Large Data bases-Volume 30.2004:768-779.
[8]O'NEIL P,CHENG E,GAWLICK D,et al.The log-structured merge-tree(LSM-tree)[J].Acta Informatica,1996,33(4):351-385.
[9]KRASKA T,BEUTEL A,CHI E H,et al.The Case for Lear-ned Index Structures[C]//Proceedings of the 2018 International Conference on Management of Data.2018:489-504.
[10]GALAKATOS A,MARKOVITCH M,BINNIG C,et al.FI-Ting-Tree:A Data-aware Index Structure[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1189-1206.
[11]LI P,HUA Y,ZUO P,et al.A Scalable Learned Index Scheme in Storage Systems[J].arXiv:2102.06789,2019.
[12]LI P,LU H,ZHENG Q,et al.LISA:A Learned Index Structure for Spatial Data[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:2119-2133.
[13]HIGUCHI S,TAKEMASA J,KOIZUMI Y,et al.Feasibility of longest prefix matching using learned index structures[J].ACM SIGMETRICS Performance Evaluation Review,2021,48(4):45-48.
[14]RAMADHAN H,KWON J.Enhancing Learned Index for aHigher Recall Trajectory K-Nearest Neighbor Search[C]//2021 IEEE International Conference on Big Data(Big Data).2021:6006-6007.
[15]KIPF A,MARCUS R,VAN RENEN A,et al.RadixSpline:asingle-pass learned index[C]//Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management.2020:1-5.
[16]STOIAN M,KIPF A,MARCUS R,et al.PLEX:Towards Practical Learned Indexing[J].arXiv:2108.05117,2021.
[17]KRASKA T,ALIZADEH M.SageDB:A Learned Database System[C]//9th Biennial Conference on Innovative Data Systems Research.2019:13-16.
[18]WANG H X,FU X Y,XU J L,et al.:Learned Index for Spatial Queries[C]//2019 20th International Conference on Mobile Data Management.2019:569-574.
[19]DAVITKOVA A,MILCHEVSKI E,MICHEL S.The ML-Index:A Multidimensional,Learned Index for Point,Range,and Nearest-Neighbor Queries[C]//EDBT.2020:407-410.
[20]NATHAN V,DING J,ALIZADEH M,et al.Learning Multi-dimensional Indexes[C]//Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.2020:985-1000.
[21]DING J,NATHAN V,ALIZADEH M,et al.Tsunami:A Lear-ned Multi-dimensional Index for Correlated Data and Skewed Workloads[J].Proceedings of the VLDB Endowment,2020,14(2):74-86.
[22]ZHANG S,RAY S,LU R,et al.Spatial Interpolation-based Learned Index for Range and kNN Queries[J].arXiv:2102.06789,2012.
[23]FOLMER M,NEUMANN R,KARUNANITHI T.A LearnedBucket Index Supporting Spatial Queries[D].Aalborg:Aalborg University,2019.
[24]DING J,MINHAS U F,YU J,et al.ALEX:An UpdatableAdaptive Learned Index[C]//International Conference on Ma-nagement of Data Conference.2020:969-984.
[25]WU J,ZHANG Y,CHEN S,et al.Updatable Learned Indexwith Precise Positions[J].Proceedings of the VLDB Endowment,2021,14(8):1276-1288.
[26]WU Y J,YU J,TIAN Y Y,et al.:Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1223-1240.
[27]ZHANG J,GAO Y.CARMI:A Cache-Aware Learned Indexwith a Cost-based Construction Algorithm[J].arXiv:2103.00858,2021.
[28]TANG C,WANG Y,DONG Z,et al.XIndex:A Scalable Lear-ned Index for Multicore Data Storage[C]//Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.2020:308-320.
[29]FERRAGINA P,VINCIGUERRA G.The PGM-index:a fully-dynamic compressed learned index with provable worst-case bounds[J].Proceedings of the VLDB Endowment,2020,13(8):1162-1175.
[30]WANG Y,TANG C,WANG Z,et al.SIndex:a scalable learned index for string keys[C]//Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems.2020:17-24.
[31]MCKENNEY P,KLEEN A,KRIEGER O,et al.Read-Copy Update[C]//Ottawa Linux Symposium.2001:175-184.
[32]BRONSON N G,CASPER J,CHAFI H,et al.A practical concurrent binary search tree[J].ACM Sigplan Notices,2010,45(5):257-268.
[33]SANG K C,HWANG S,KIM K,et al.Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems[C]//Proceedings of 27th International Conference on Very Large Data Bases.2001:181-190.
[34]WU X,NI F,JIANG S,et al.Wormhole:A Fast Ordered Index for In-memory Data Management[C]//Proceedings of the Fourteenth EuroSys Conference.2019:1-16.
[35]MAO Y,KOHLER E,MORRIS R.Cache Craftiness for FastMulticore Key-Value Storage[C]//Proceedings of the 7th ACM European Confe-rence on Computer Systems.2012:183-196.
[36]ILYAS I F,MARKL V,HAAS P,et al.CORDS:Automatic discovery of correlations and soft functional dependencies[C]//Proceedings of the 2004 ACM SIGMOD International Confe-rence on Management of Data.2004:647-658.
[37]KIMURA H,HUO G,RASIN A,et al.Correlation maps:acompressed access method for exploiting soft functional depen-dencies[J].Proceedings of the VLDB Endowment,2009,2(1):1222-1233.
[38]LIU X,LIN Z,WANG H.Novel Online Methods for Time Series Segmentation[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(12):1616-1626.
[39]XU Z,ZHANG R,KOTAGIRI R,et al.An adaptive algorithm for online time series segmentation with error bound guarantee[C]//Proceedings of the 15th International Conference on Extending Database Technology.2012:192-203.
[40]O'ROURKE J.An on-line algorithm for fitting straight lines between data ranges[J].Communications of the ACM,1981,24(9):574-578.
[41]OKANOHARA D,SADAKANE K.Practical entropy-com-pressed rank/select dictionary[C]//2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments(ALENEX).2007:60-70.
[42]MOFFAT A,TURPIN A.Compression and coding algorithms[M].Berlin:Springer Science & Business Media,2002.
[43]NEUMANN T,MICHEL S.Smooth interpolating histogramswith error guarantees[C]//British National Conference on Databases.2008:126-138.
[44]MARCUS R,KIPF A,VAN RENEN A,et al.Benchmarking learned indexes[J].arXiv:2006.12804,2020.
[45]CROTTY A.Hist-Tree:Those Who Ignore It Are Doomed to Learn[C]//CIDR.2021:11-15.
[46]RAMSAK F,MARKL V,FENK R,et al.Integrating the UB-Tree into a Database System Kernel[C]//VLDB.2000:263-272.
[47]JAGADISH H V,OOI B C,TAN K L,et al.iDistance:An adaptive B+-tree based indexing method for nearest neighbor search[J].ACM Transactions on Database Systems(TODS),2005,30(2):364-397.
[48]SCHUH M A,WYLIE T,LIU C,et al.Approximating High-Dimensional Range Queries with kNN Indexing Techniques[C]//Berlin:Springer International Publishing.2014:369-380.
[49]BREIMAN L.Random forests[J].Machine Learning,2001,45(1):5-32.
[50]GAEDE V,GÜNTHER O.Multidimensional access methods[J].ACM Computing Surveys(CSUR),1998,30(2):170-231.
[51]NIEVERGELT J,HINTERBERGER H,SEVCIK K C.Thegrid file:An adaptable,symmetric multikey file structure[J].ACM Transactions on Database Systems(TODS),1984,9(1):38-71.
[52]MITAS L,MITASOVA H.Spatial interpolation[J].Geo-graphical Information Systems:Principles,Techniques,Management and Applications,1999,1(2):481-492.
[53]GUTTMAN A.R-trees:A dynamic index structure for spatial searching[C]//Proceedings of the 1984 ACM SIGMOD International Conference on Management of data.1984:47-57.
[54]ROYDEN H L,FITZPATRICK P.Real analysis[M].New York:Macmillan,1988.
[55]GARCIA E,GUPTA M.Lattice regression[J].Advances in Neural Information Processing Systems,2009,22:594-602.
[1] XU Xia, ZHANG Hui, YANG Chunming, LI Bo, ZHAO Xujian. Fair Method for Spectral Clustering to Improve Intra-cluster Fairness [J]. Computer Science, 2023, 50(2): 158-165.
[2] CHEN Depeng, LIU Xiao, CUI Jie, HE Daojing. Survey of Membership Inference Attacks for Machine Learning [J]. Computer Science, 2023, 50(1): 302-317.
[3] LENG Dian-dian, DU Peng, CHEN Jian-ting, XIANG Yang. Automated Container Terminal Oriented Travel Time Estimation of AGV [J]. Computer Science, 2022, 49(9): 208-214.
[4] NING Han-yang, MA Miao, YANG Bo, LIU Shi-chang. Research Progress and Analysis on Intelligent Cryptology [J]. Computer Science, 2022, 49(9): 288-296.
[5] ZHANG Guang-hua, GAO Tian-jiao, CHEN Zhen-guo, YU Nai-wen. Study on Malware Classification Based on N-Gram Static Analysis Technology [J]. Computer Science, 2022, 49(8): 336-343.
[6] HE Qiang, YIN Zhen-yu, HUANG Min, WANG Xing-wei, WANG Yuan-tian, CUI Shuo, ZHAO Yong. Survey of Influence Analysis of Evolutionary Network Based on Big Data [J]. Computer Science, 2022, 49(8): 1-11.
[7] WANG Run-an, ZOU Zhao-nian. Query Performance Prediction Based on Physical Operation-level Models [J]. Computer Science, 2022, 49(8): 49-55.
[8] LI Yao, LI Tao, LI Qi-fan, LIANG Jia-rui, Ibegbu Nnamdi JULIAN, CHEN Jun-jie, GUO Hao. Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network [J]. Computer Science, 2022, 49(8): 257-266.
[9] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[10] XIAO Zhi-hong, HAN Ye-tong, ZOU Yong-pan. Study on Activity Recognition Based on Multi-source Data and Logical Reasoning [J]. Computer Science, 2022, 49(6A): 397-406.
[11] YAO Ye, ZHU Yi-an, QIAN Liang, JIA Yao, ZHANG Li-xiang, LIU Rui-liang. Android Malware Detection Method Based on Heterogeneous Model Fusion [J]. Computer Science, 2022, 49(6A): 508-515.
[12] LI Ya-ru, ZHANG Yu-lai, WANG Jia-chen. Survey on Bayesian Optimization Methods for Hyper-parameter Tuning [J]. Computer Science, 2022, 49(6A): 86-92.
[13] ZHAO Lu, YUAN Li-ming, HAO Kun. Review of Multi-instance Learning Algorithms [J]. Computer Science, 2022, 49(6A): 93-99.
[14] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[15] XU Jie, ZHU Yu-kun, XING Chun-xiao. Application of Machine Learning in Financial Asset Pricing:A Review [J]. Computer Science, 2022, 49(6): 276-286.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!