Computer Science ›› 2023, Vol. 50 ›› Issue (4): 32-39.doi: 10.11896/jsjkx.220600078

• Database & Big Data & Data Science • Previous Articles     Next Articles

LayerLSB:Nearest Neighbors Search Based on Layered Locality Sensitive B-tree

DING Jiwen, LIU Zhuojin, WANG Jiaxing, ZHANG Yanfeng, YU Ge   

  1. School of Computer Science and Engineering,Northeastern University,Shenyang 110167,China
  • Received:2022-06-08 Revised:2022-10-19 Online:2023-04-15 Published:2023-04-06
  • About author:DING Jiwen,born in 1982,postgra-duate,is a member of China Computer Federation.His main research interests include big data analysis and database.
    ZHANG Yanfeng,born in 1982,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include big data mining,large scale machine learning,and distributed systems.
  • Supported by:
    National Natural Science Foundation of China(62072082) and Fundamental Research Funds for the Central Universities(N2216015).

Abstract: Nearest neighbor search has become a significant research problem due to its wide applications.Traditional spatial index structures such as R-tree and KD-tree can efficiently return accurate nearest neighbor search results in low-dimensional space,but they are not suitable for high-dimensional space.Locality sensitive B-tree(LSB) hashes data points to the sortable one-dimension values and arranges them in a tree-like structure,which dramatically improves the space and query efficiency of the previous locality sensitive hashing(LSH) implementations,without compromising the resulting quality.However,LSB fails to take data distribution into account.It performs well in a uniform data distribution setting,but exhibits unstable performance when the data are skewed.In response to this problem,this paper proposes LayerLSB,which reconstructs the hash values in a dense range by exploring the density of the hash values to make the distribution more uniform,so as to improve the query efficiency.Compared to LSB,LayerLSB indices become more targeted in terms of data distribution,and a multi-layered structure is constructed.Compared with the simple rehashing method,the multi-layered approach will still guarantee the search quality by carefully choosing the number of groups and hash functions.The results show that the query cost can be reduced to 44.6% of the original at most when achieving the same query accuracy.

Key words: Nearest neighbor search, Layered structure, Locality sensitive hashing, Locality sensitive B-tree

CLC Number: 

  • TP319
[1]GUTTMAN A.R-trees:a dynamic index structure for spatialsearching[C]//Proceedings of Acm Sigmod International Conference on Management of Data.New York,1984:47-57.
[2]KATAYAMA N.The SR-tree:an index structure for high-dimensional nearest neighbor queries[J].ACM Sigmod Record,1997,26(2):369-380.
[3]BENTLEY J L.Multidimensional binary search trees used for associative searching[J].Communications of the ACM,1975,18(9):509-517.
[4]WEBERR.A Quantitative Analysis and Performance Study for Similarity-Search Methods in high-Dimensional Spaces[C]//Proceedings of the VLDB Conference.New York,1998:194-205.
[5]TAO Y,KE Y,CHENG S,et al.Quality and efficiency in high dimensional nearest neighbor search[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.Rhode Island,USA,2009:563-576.
[6]YAO B,LI F,KUMAR P.K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free[C]//Proceedings of the 2010 IEEE 26th International Conference on Data Engineering.2010:1103-1114.
[7]CHI Z,LI F,JESTES J.Efficient parallel kNN joins for largedata in MapReduce[C]//Proceedings of the Extending Database Technology.ACM,2012:38-49.
[8]INDYK P.Approximate Nearest Neighbors:Towards Removing the Curse of Dimensionality[C]//Proceedings of the 30th ACM Symposium on Theory of Computing(STOC’98).1998.
[9]DATAR M.Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the 20th ACM Symposium on Computational Geometry.2004:253-262.
[10]ZOLOTAREV V M.One-dimensional stable distributions[M].American Mathematical Society,1986.
[11]GAEDE V,GUNTHER O.Multidimensional Access Methods[J].ACM Computing Surveys,1998,30(2):170-231.
[12]DATAR M,IMMORLICA N,INDYK P,et al.Locality sensitive hashing scheme based on p-stable distributions[C]//Procee-dings of the Twentieth Annual Symposium on Computational Geometry(Scg ’04).2004:253-262.
[13]SHRIVASTAVA A,LI P.Asymmetric LSH(ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)[C]//Proceedings of the MIT Press.MIT Press,2014:2321-2329.
[14]ANDONI A,INDYK P,LAARHOVEN T,et al.Practical and optimal LSH for angular distance[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1.2015:1225-1233.
[15]LV Q,JOSEPHSON W,ZHE W,et al.Multi-probe LSH:efficient indexing for high-dimensional similarity search[C]//Proceedings of the International Conference on Very Large Data Bases.VLDB Endowment,2007:950-961.
[16]PANIGRAHY R.Entropy based Nearest Neighbor Search inHigh Dimensions[C]//Proceedings of the Seventeenth Annual {ACM-SIAM} Symposium on Discrete Algorithms.2005:1186-1195.
[17]LIU W,WANG H,ZHANG Y,et al.I-LSH:I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space[C]//Proceedings of the 2019 IEEE 35th International Confe-rence on Data Engineering(ICDE).IEEE,2019:1670-1673.
[18]LIU W,WANG H,ZHANG Y,et al.EI-LSH:An early-termination driven I/O efficient incremental c-approximate nearest neighbor search[J].The VLDB Journal,2021,30(2):215-235.
[19]BAWA M,CONDIE T,GANESAN P.LSH forest:Self-tuningindexes for similarity search[C]//Proceedings of the 14th International Conference on World Wide Web.Chiba,Japan,2005:651-660.
[20]PAN J,MANOCHA D.Bi-level locality sensitive hashing for k-nearest neighbor computation[C]//2012 IEEE 28th International Conference on Data Engineering.IEEE,2012:378-389.
[21]SUN Y,WEI W,QIN J,et al.SRS:Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index[C]//Proceedings of the VLDB Endowment.VLDB Endowment,2014:1-12.
[22]LIU Z Y,CUI Z J,HUANG Z,et al.SK-LSH:An efficient index structure for approximate nearest neighbor search[J].Procee-dings of the VLDB Endowment,2014,7(9):745-756.
[23]ZHENG B,ZHAO X,WENG L,et al.PM-LSH:A fast and accurate LSH framework for high-dimensional approximate NN search[C]//Proceedings of the VLDB Endowment.2020:643-665.
[24]LU K,KUDO M.R2LSH:A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces[C]//Proceedings of the 2020 IEEE 36th International Conference on Data Engineering(ICDE).IEEE,2020:1045-1056.
[1] LI Rui-xiang, MAO Ying-chi, HAO Shuai. Cache Management Method in Mobile Edge Computing Based on Approximate Matching [J]. Computer Science, 2021, 48(1): 96-102.
[2] CHEN Jing-jie, CHE Jie. IK-medoids Based Aircraft Fuel Consumption Clustering Algorithm [J]. Computer Science, 2018, 45(8): 306-309.
[3] TANG Jia-lin, ZHENG Jie-feng, LI Xi-ying and SU Bing-hua. Research on Detecting Algorithm of Moving Target in Aerial Video [J]. Computer Science, 2017, 44(Z11): 175-177.
[4] LIU Yan and ZHANG Lin. Improved Location Anonymous Technology for Big Data Based on Bloom Filter [J]. Computer Science, 2017, 44(6): 144-149.
[5] XU Zhe,CHEN Fu-cai,LI Shao-mei and LI Xing. Image Retrieval Based on Multi-probe Locality Sensitive Hashing and Word Map Chain Voting [J]. Computer Science, 2014, 41(5): 82-85.
[6] QU Wu,WANG Li-jun and HAN Xiao-guang. Distributed Data Stream Clustering Based on LSH on Cloud Environments [J]. Computer Science, 2014, 41(11): 195-202.
[7] CAI Heng, LI Zhou-Jun,SUN Jian,LI Yang. Fast Chinese Text Search Based on LSH [J]. Computer Science, 2009, 36(8): 201-204.
[8] XU Juan,LI Zhan-lin,KE Xi-lin. Novel Prefix Encoding Scheme Based on Layered Structure [J]. Computer Science, 2009, 36(7): 145-149.
[9] . [J]. Computer Science, 2006, 33(3): 212-214.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!