计算机科学 ›› 2012, Vol. 39 ›› Issue (8): 182-185.

• 数据库与数据挖掘 • 上一篇    下一篇

最优分数位minwise哈希算法的研究

袁鑫攀,龙 军,张祖平,罗跃逸,桂卫华   

  1. (中南大学信息科学与工程学院 长沙410083)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Research on Optimal Fractional Bit Minwise Hashing

  • Online:2018-11-16 Published:2018-11-16

摘要: 摘要在信息检索中,minwise哈希算法用于估值集合的相似度;b位minwise哈希算法则通过存储哈希值的b位来佑算相似度,从而节省了存储空间和计算时间。分数位minwise哈希算法对各种精度和存储空间需求有着更加广泛的可选择性。对于给定的分数位f,构建f的方式有很多。分析了有限的分数位组合方式,给出最优化分数位的理论分析。大量的实验验证了此方法的有效性。

关键词: 相似度佑值,哈希,最优分数位

Abstract: In information retrieval,minwise hashing algorithm is often used to estimate similarities among documents,and frbit minwise hashing is capable of gaining substantial advantages in terms of computational efficiency and storage space by only storing the lowest h bits of each(minwise) hashed value(e. g. ,b=1 or 2). Fractional bit minwise hashing has a wider range of selectivity for accuracy and storage space requirements. For the fixed fraction f,there are so many combinations of f. We theoretically analyzed limited combinations of fractional bit hhe optimal fractional bit was found. Experimental results demonstrate the effectiveness of this method.

Key words: Similarity estimation, Hasing, Optimal fractional bit

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!