计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 41-49.doi: 10.11896/jsjkx.231000202

• 创刊五十周年特别专题 • 上一篇    下一篇

学习型过滤器综述

李猛, 戴海鹏, 眭永熙, 顾荣, 陈贵海   

  1. 计算机软件新技术国家重点实验室(南京大学) 南京210023
  • 收稿日期:2023-10-27 修回日期:2023-12-20 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 戴海鹏(haipengdai@nju.edu.cn)
  • 作者简介:(meng@nju.edu.cn)
  • 基金资助:
    国家自然科学基金(62272223)

Survey of Learning-based Filters

LI Meng, DAI Haipeng, SUI Yongxi, GU Rong, CHEN Guihai   

  1. State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023,China
  • Received:2023-10-27 Revised:2023-12-20 Online:2024-01-15 Published:2024-01-12
  • About author:LI Meng,born in 1993,Ph.D,is a member of CCF(No.B5508M).His main research interests include data-intensive IOT systems and so on.
    DAI Haipeng,born in 1985,Ph.D,associate professor,Ph.D supervisor,is a distinguished member of CCF(No.19625S).His main research interests include data mining and mobile computing.
  • Supported by:
    National Natural Science Foundation of China(62272223).

摘要: 作为一种高效的概率性结构,过滤器可以高效地解决近似集合成员查询问题。近年来,随着机器学习技术的发展,一些学习型过滤器表现出色,超越了传统的过滤器。这些学习型过滤器考虑数据分布信息,将集合成员查询问题视为二分类问题,实现了超越传统过滤器的性能。受此启发,学习型过滤器研究领域迅速发展,出现了多个变种。然而,目前还缺乏对近些年相关工作的系统性回顾和比较。为了填补上述空缺,文中全面回顾了近年来的学习型过滤器相关工作,并展望了未来的发展方向。

关键词: 近似成员资格查询, 机器学习, Bloom过滤器, 学习型过滤器, 假阳率

Abstract: As a space-efficient probability structure,filters can efficiently solve approximate set membership queries.In recent years,with the development of machine learning technology,some learning-based filters have exceeded traditional filters in performance.These learning-based filters propose to consider data distribution information and treat set membership queries as a binary classification problem,achieving superior performance compared to traditional filters.Inspired by this,the research field of learning-based filters has progressed rapidly,and several variants have emerged.However,there is still a lack of a systematic review and comparison of recent related work.In order to fill this gap,this paper comprehensively reviews recent related work on learning-based filters,analyzes their structure design and theoretical analysis,and predicts the future development direction.

Key words: Approximate membership query, Machine learning, Bloom filter, Learning-based filter, False positive rate

中图分类号: 

  • TP391
[1]FAN B,ANDERSEN D G,KAMINSKY M,et al.Cuckoo Filter:Practically Better Than Bloom[C]//Proceedings of the International Conference on emerging Networking Experiments and Technologies.ACM,2014:75-88.
[2]DUTTA S,NARANG A,BERA S K.Streaming quotient filter:a near optimal approximate duplicate detection approach for data streams[C]//Proceedings of the VLDB Endowment.VLDB,2013:589-600.
[3]LI M,CHEN D,DAI H,et.al.Seesaw Counting Filter:An Efficient Guardian for Vulnerable Negative Keys During Dynamic Filtering[C]//Proceedings of the Web Conference 2022.ACM,2022:2759-2767.
[4]DAI H,SHAHZAD M,LIU A X,et al.Identifying and Estimating Persistent Items in Data Streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442.
[5]WANG H,DAI H,LI M,et al.Bamboo Filters:Make Resizing Smooth[C]//Proceedings of the International Conference on Data Engineering.IEEE,2022:979-991.
[6]LI M,DAI H,WANG X,et al.Thresholded Monitoring in Distributed Data Streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046.
[7]ABDENNEBI A,KAYA K.A Bloom Filter Survey:Variants for Different Domain Applications[J].arXiv:2106.12189,2021.
[8]DAI H,YU J,LI M,et al.Bloom Filter With Noisy CodingFramework for Multi-Set Membership Testing[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(7):6710-6724.
[9]DAI H,ZHONG Y,LIU A X,et al.Noisy Bloom Filters for Multi-Set Membership Testing[C]//Proceedings of the International Conference on Measurement and Modeling of Computer Science.ACM,2016:139-151.
[10]CHEN Z,HUANG J,WANG Q,et al.MEB:an Efficient and Accurate Multicast using Bloom Filter with Customized Hash Function[C]//Proceedings of the 7th Asia-Pacific Workshop on Networking.ACM,2023:157-163.
[11]EVEN T,EVEN G,MORRISON A.Prefix filter:practically and theoretically better than bloom[C]//Proceedings of the VLDB Endowment.VLDB,2022:1311-1323.
[12]DAYAN N,TWITTO M.Chucky:A Succinct Cuckoo Filter for LSM-Tree[C]//Proceedings of the International Conference on Management of Data.ACM,2021:365-378.
[13]GUBNER T,TOMÉ D,LANG H,et al.Fluid Co-processing:GPU Bloom-filters for CPU Joins[C]//Proceedings of the 15th International Workshop on Data Management on New Hardware.ACM,2019:1-10.
[14]LI Y,WANG Z,YANG R,et al.Learned Bloom Filter for Multi-key Membership Testing[C]//Proceedings of the Database Systems for Advanced Applications.Cham:Springer Nature Switzerland,2023:62-79.
[15]FU P,LUO L,LI S,et al.The Vertical Cuckoo Filters:A Family of Insertion-friendly Sketches for Online Applications[C]//Proceedings of the International Conference on Distributed Computing Systems.IEEE,2021:57-67.
[16]DAI M,XU S,SHAO S,et al.Blockchain-Based Reliable Fog-Cloud Service Solution for IIoT [J].Chinese Journal of Electronics,2021,30(2):359-366.
[17]LI P,YU X,XU H,et al.Secure Localization Technology Based on Dynamic Trust Management in Wireless Sensor Networks [J].Chinese Journal of Electronics,2021,30(4):759-768.
[18]LI L,WU J,HU H,et al.Secure Cloud Architecture for 5G Core Network [J].Chinese Journal of Electronics,2021,30(3):516-522.
[19]KRASKA T,BEUTEL A,CHI E H,et al.The Case for Learned Index Structures[C]//Proceedings of the International Confe-rence on Management of Data.ACM,2018:489-504.
[20]MITZENMACHER M.A Model for Learned Bloom Filters and Optimizing by Sandwiching[C]//Proceedings of the Advances in Neural Information Processing Systems.Curran Associates,Inc.,2018:462-471.
[21]DAI Z,SHRIVASTAVA A.Adaptive Learned Bloom Filter(Ada-BF):Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web[C]//Proceedings of the Advances in Neural Information Processing Systems.ACM,2020:11700-11710.
[22]VAIDYA K,KNORR E,KRASKA T,et al.Partitioned Learned Bloom Filter[J].arXiv:2006.03176,2020.
[23]LIU Q,ZHENG L,SHEN Y,et al.Stable learned bloom filters for data streams[C]//Proceedings of the VLDB Endowment.VLDB.2020:2355-2367.
[24]LI M,XIE R,CHEN D,et al.A Pareto optimal Bloom filterfamily with hash adaptivity[J].The Springer International Journal on Very Large Data Bases,2022,32(3):525-548.
[25]DEEDS K,HENTSCHEL B,IDREOS S.Stacked filters:lear-ning to filter by structure[C]//Proceedings of the VLDB Endowment.VLDB,2020:600-612.
[26]BLOOM B H.Space/time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[27]BHATTACHARYA A,GUDESA C,BAGCHI A,et al.Newwine in an old bottle:data-aware hash functions for bloom filters[C]//Proceedings of the VLDB Endowment.VLDB,2022:1924-1936.
[28]DAYAN N,ATHANASSOULIS M,IDREOS S.Optimal Bloom Filters and Adaptive Merging for LSM-Trees[J].ACM Transactions on Database Systems,2018,43(4):1-48.
[29]CHEN H,LIAO L,JIN H,et al.The dynamic cuckoo filter[C]//Proceedings of the International Conference on Network Protocols.IEEE,2017:1-10.
[30]BRESLOW A D,JAYASENA N S.Morton filters:faster,space-efficient cuckoo filters via biasing,compression,and decoupled logical sparsity[C]//Proceedings of the VLDB Endowment.VLDB,2018:1041-1055.
[31]GRAF T M,LEMIRE D.Binary Fuse Filters:Fast and Smaller Than Xor Filters[J].ACM Journal of Experimental Algorithmics,2022,27:1-15.
[32]FICARA D,DI PIETRO A,GIORDANO S,et al.Enhancingcounting bloom filters through Huffman-coded multilayer structures[J].IEEE/ACM Transactions on Networking,2010,18(6):1977-1987.
[33]LAUFER R P,VELLOSO P B,DUARTE O C M B.A Genera-lized Bloom Filter to Secure Distributed Network Applications[J].Computer Networks,2011,55(8):1804-1819.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!