计算机科学 ›› 2016, Vol. 43 ›› Issue (Z6): 410-412.doi: 10.11896/j.issn.1002-137X.2016.6A.097

• 数据挖掘 • 上一篇    下一篇

连接位极大似然动态过滤算法

曹阳,袁鑫攀,龙军   

  1. 中南大学信息科学与工程学院 长沙410083,湖南工业大学计算机与通信学院 株洲412000,中南大学信息科学与工程学院 长沙410083
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61472450,61379110),湖南省自然科学青年基金项目(2015JJ3058),湖南工业大学自然科学基金(2014HZX17),湖南省教育厅科技研究项目(14C0325),国家级大学生创新创业训练计划项目(201511535006),湖南省研究生科研创新项目(CX2015B567)资助

Dynamic Filtering Algorithm of Connected Bit Maximum Likelihood Minwise Hash

CAO Yang, YUAN Xin-pan and LONG Jun   

  • Online:2018-11-14 Published:2018-11-14

摘要: Minwise Hash极大似然估计子RMle综合考虑所有事件的发生概率,可以提高估计精度,但降低了估计的效率。连接位Minwise Hash估计子RMinwise,c可以成倍减少比对次数,动态阈值过滤器能够进一步提高Minwise Hash算法和其变种算法的效率。结合连接位极大似然估计子和动态阈值过滤器,提出了连接位极大似然动态过滤算法R(TMle,c)。实验表明,R(TMle,c)具有精度和效率兼顾的特性,计算时间最少,并且在k>300的条件下,其准确度与RMle的近乎相等。

关键词: 极大似然,相似性检测,哈希,连接位,动态阈值

Abstract: Maximum likelihood estimator (RMle) can improve the average accuracy.Connected bit Minwise Hash (RMinwise,c) can exponentially improve efficiency of similarity estimation.Dynamic threshold filter makes further improvement on efficiency.Combining RMinwise,c and dynamic threshold filter,the maximum likelihood dynamic filtering algorithm of Connected bit Minwise Hash was proposed.Experimental results demonstrate that R(TMle,c) can get second-best precision and efficiency,and is the most cost effective business in the estimator options (RMle,RMle,c,RMinwise,c,R(TMle,c).

Key words: Maximum likelihood estimator,Similarity measure,Hash,Connected bit,Dynamic threshold

[1] Broder A Z,Charikar M,Frieze A M,et al.Min-wise indepen-dent permutations [J].Journal of Computer Systems and Scie-nces,2000,60(3):630-659
[2] Kalpakis K,Tang S.Collaborative data gathering in wirelesssensor networks using measurement co-occurrence [J].ComputerCommunications,2008,31(10):1979-1992
[3] Dourisboure Y,Geraci F,Pellegrini M.Extraction and classification of dense implicit communities in the web graph [J].ACM Transactions on the Web (TWEB),2009,3(2):1-36
[4] Bendersky M,Croft W B.Finding text reuse on the Web [C]∥Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM’09).New York,USA:ACM,2009:262-271
[5] Buehrer G,Chellapilla K.A scalable pattern mining approach to web graph compression with communities [C]∥Proceedings of the International Conference on Web Search and Web Data Mi-ning(WSDM’08).New York,USA:ACM,2008:95-106
[6] Indyk P.A small approximately min-wise independent family of Hash functions [J].Journal of Algorithm,2001,38(1):84-90
[7] Charikar M S.Similarity estimation techniques from roundingalgorithms [C]∥Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing(STOC’02).New York,USA:ACM,2002:380-388
[8] Li P,Knig A C.b-bit minwise hashing [C]∥Proceedings of the 19th International Conference on World Wide Web(WWW’10).New York,USA:ACM,2010:671-680
[9] Li P,Knig A C.Theory and applications of b-bit minwise hashing [J].Communications of the ACM,2011,54(8):101-109
[10] Yuan Xin-pan,Long Jun,Zhang Zu-ping,et al.Connected bitminwise hashing [J].Journal of Computer Research and Deve-lopment,2013,50(4):883-890

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!