计算机科学 ›› 2023, Vol. 50 ›› Issue (4): 32-39.doi: 10.11896/jsjkx.220600078
丁际文, 刘卓锦, 王家兴, 张岩峰, 于戈
DING Jiwen, LIU Zhuojin, WANG Jiaxing, ZHANG Yanfeng, YU Ge
摘要: 最近邻搜索由于其广泛的应用已成为一个重要的研究课题。传统的空间索引结构,如R-tree和KD-tree,可以在低维空间中高效地返回准确的最近邻搜索结果,但不适用于高维空间。局部敏感B树(LSB)将数据点哈希到可排序的一维值,并将它们排列成树状结构,这在不影响结果质量的前提下极大地提高了传统局部敏感哈希(LSH)所需的空间和查询效率。但是,LSB并没有考虑到数据分布,它在均匀的数据分布设置中表现良好,但在数据倾斜时表现出了不稳定的性能。针对这个问题,文中提出了LayerLSB,通过探索哈希值的密度对密集范围内的哈希值进行重建,使其分布更均匀,从而提高查询效率。 相比LSB,LayerLSB索引在数据分布方面变得更有针对性,并构建了多层结构,与简单的重新哈希方法相比,多层方法会通过仔细选择组数和哈希函数来保证搜索质量。实验结果表明,在达到相同查询精度的情况下,查询成本最多可降低为原来的44.6%。
中图分类号:
[1]GUTTMAN A.R-trees:a dynamic index structure for spatialsearching[C]//Proceedings of Acm Sigmod International Conference on Management of Data.New York,1984:47-57. [2]KATAYAMA N.The SR-tree:an index structure for high-dimensional nearest neighbor queries[J].ACM Sigmod Record,1997,26(2):369-380. [3]BENTLEY J L.Multidimensional binary search trees used for associative searching[J].Communications of the ACM,1975,18(9):509-517. [4]WEBERR.A Quantitative Analysis and Performance Study for Similarity-Search Methods in high-Dimensional Spaces[C]//Proceedings of the VLDB Conference.New York,1998:194-205. [5]TAO Y,KE Y,CHENG S,et al.Quality and efficiency in high dimensional nearest neighbor search[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.Rhode Island,USA,2009:563-576. [6]YAO B,LI F,KUMAR P.K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free[C]//Proceedings of the 2010 IEEE 26th International Conference on Data Engineering.2010:1103-1114. [7]CHI Z,LI F,JESTES J.Efficient parallel kNN joins for largedata in MapReduce[C]//Proceedings of the Extending Database Technology.ACM,2012:38-49. [8]INDYK P.Approximate Nearest Neighbors:Towards Removing the Curse of Dimensionality[C]//Proceedings of the 30th ACM Symposium on Theory of Computing(STOC’98).1998. [9]DATAR M.Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the 20th ACM Symposium on Computational Geometry.2004:253-262. [10]ZOLOTAREV V M.One-dimensional stable distributions[M].American Mathematical Society,1986. [11]GAEDE V,GUNTHER O.Multidimensional Access Methods[J].ACM Computing Surveys,1998,30(2):170-231. [12]DATAR M,IMMORLICA N,INDYK P,et al.Locality sensitive hashing scheme based on p-stable distributions[C]//Procee-dings of the Twentieth Annual Symposium on Computational Geometry(Scg ’04).2004:253-262. [13]SHRIVASTAVA A,LI P.Asymmetric LSH(ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)[C]//Proceedings of the MIT Press.MIT Press,2014:2321-2329. [14]ANDONI A,INDYK P,LAARHOVEN T,et al.Practical and optimal LSH for angular distance[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1.2015:1225-1233. [15]LV Q,JOSEPHSON W,ZHE W,et al.Multi-probe LSH:efficient indexing for high-dimensional similarity search[C]//Proceedings of the International Conference on Very Large Data Bases.VLDB Endowment,2007:950-961. [16]PANIGRAHY R.Entropy based Nearest Neighbor Search inHigh Dimensions[C]//Proceedings of the Seventeenth Annual {ACM-SIAM} Symposium on Discrete Algorithms.2005:1186-1195. [17]LIU W,WANG H,ZHANG Y,et al.I-LSH:I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space[C]//Proceedings of the 2019 IEEE 35th International Confe-rence on Data Engineering(ICDE).IEEE,2019:1670-1673. [18]LIU W,WANG H,ZHANG Y,et al.EI-LSH:An early-termination driven I/O efficient incremental c-approximate nearest neighbor search[J].The VLDB Journal,2021,30(2):215-235. [19]BAWA M,CONDIE T,GANESAN P.LSH forest:Self-tuningindexes for similarity search[C]//Proceedings of the 14th International Conference on World Wide Web.Chiba,Japan,2005:651-660. [20]PAN J,MANOCHA D.Bi-level locality sensitive hashing for k-nearest neighbor computation[C]//2012 IEEE 28th International Conference on Data Engineering.IEEE,2012:378-389. [21]SUN Y,WEI W,QIN J,et al.SRS:Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index[C]//Proceedings of the VLDB Endowment.VLDB Endowment,2014:1-12. [22]LIU Z Y,CUI Z J,HUANG Z,et al.SK-LSH:An efficient index structure for approximate nearest neighbor search[J].Procee-dings of the VLDB Endowment,2014,7(9):745-756. [23]ZHENG B,ZHAO X,WENG L,et al.PM-LSH:A fast and accurate LSH framework for high-dimensional approximate NN search[C]//Proceedings of the VLDB Endowment.2020:643-665. [24]LU K,KUDO M.R2LSH:A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces[C]//Proceedings of the 2020 IEEE 36th International Conference on Data Engineering(ICDE).IEEE,2020:1045-1056. |
[1] | 郦睿翔, 毛莺池, 郝帅. 基于近似匹配的移动边缘计算缓存管理方法 Cache Management Method in Mobile Edge Computing Based on Approximate Matching 计算机科学, 2021, 48(1): 96-102. https://doi.org/10.11896/jsjkx.200800215 |
[2] | 李红梅,郝文宁,陈刚. 基于改进LSH的协同过滤推荐算法 Collaborative Filtering Recommendation Algorithm Based on Improved Locality-sensitive Hashing 计算机科学, 2015, 42(10): 256-261. |
[3] | 许喆,陈福才,李邵梅,李星. 基于多探寻局部敏感哈希和单词映射链投票的图像检索方法 Image Retrieval Based on Multi-probe Locality Sensitive Hashing and Word Map Chain Voting 计算机科学, 2014, 41(5): 82-85. https://doi.org/10.11896/j.issn.1002-137X.2014.05.018 |
[4] | 徐娟,李战怀,柯希林. 基于分层结构的前缀编码方案研究 Novel Prefix Encoding Scheme Based on Layered Structure 计算机科学, 2009, 36(7): 145-149. https://doi.org/10.11896/j.issn.1002-137X.2009.07.034 |
[5] | 裘君 吴朝晖 徐昭. 基于OWL本体论映射的数据库网格语义模式集成研究 计算机科学, 2005, 32(5): 4-7. |
[6] | 王斌君 郝克刚. 工作流过程定义中的分层结构与正则Petri网 计算机科学, 2003, 30(11): 157-159. |
|