Computer Science ›› 2024, Vol. 51 ›› Issue (1): 41-49.doi: 10.11896/jsjkx.231000202

• Special Issue on the 54th Anniversary of Computer Science • Previous Articles     Next Articles

Survey of Learning-based Filters

LI Meng, DAI Haipeng, SUI Yongxi, GU Rong, CHEN Guihai   

  1. State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023,China
  • Received:2023-10-27 Revised:2023-12-20 Online:2024-01-15 Published:2024-01-12
  • About author:LI Meng,born in 1993,Ph.D,is a member of CCF(No.B5508M).His main research interests include data-intensive IOT systems and so on.
    DAI Haipeng,born in 1985,Ph.D,associate professor,Ph.D supervisor,is a distinguished member of CCF(No.19625S).His main research interests include data mining and mobile computing.
  • Supported by:
    National Natural Science Foundation of China(62272223).

Abstract: As a space-efficient probability structure,filters can efficiently solve approximate set membership queries.In recent years,with the development of machine learning technology,some learning-based filters have exceeded traditional filters in performance.These learning-based filters propose to consider data distribution information and treat set membership queries as a binary classification problem,achieving superior performance compared to traditional filters.Inspired by this,the research field of learning-based filters has progressed rapidly,and several variants have emerged.However,there is still a lack of a systematic review and comparison of recent related work.In order to fill this gap,this paper comprehensively reviews recent related work on learning-based filters,analyzes their structure design and theoretical analysis,and predicts the future development direction.

Key words: Approximate membership query, Machine learning, Bloom filter, Learning-based filter, False positive rate

CLC Number: 

  • TP391
[1]FAN B,ANDERSEN D G,KAMINSKY M,et al.Cuckoo Filter:Practically Better Than Bloom[C]//Proceedings of the International Conference on emerging Networking Experiments and Technologies.ACM,2014:75-88.
[2]DUTTA S,NARANG A,BERA S K.Streaming quotient filter:a near optimal approximate duplicate detection approach for data streams[C]//Proceedings of the VLDB Endowment.VLDB,2013:589-600.
[3]LI M,CHEN D,DAI H,et.al.Seesaw Counting Filter:An Efficient Guardian for Vulnerable Negative Keys During Dynamic Filtering[C]//Proceedings of the Web Conference 2022.ACM,2022:2759-2767.
[4]DAI H,SHAHZAD M,LIU A X,et al.Identifying and Estimating Persistent Items in Data Streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442.
[5]WANG H,DAI H,LI M,et al.Bamboo Filters:Make Resizing Smooth[C]//Proceedings of the International Conference on Data Engineering.IEEE,2022:979-991.
[6]LI M,DAI H,WANG X,et al.Thresholded Monitoring in Distributed Data Streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046.
[7]ABDENNEBI A,KAYA K.A Bloom Filter Survey:Variants for Different Domain Applications[J].arXiv:2106.12189,2021.
[8]DAI H,YU J,LI M,et al.Bloom Filter With Noisy CodingFramework for Multi-Set Membership Testing[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(7):6710-6724.
[9]DAI H,ZHONG Y,LIU A X,et al.Noisy Bloom Filters for Multi-Set Membership Testing[C]//Proceedings of the International Conference on Measurement and Modeling of Computer Science.ACM,2016:139-151.
[10]CHEN Z,HUANG J,WANG Q,et al.MEB:an Efficient and Accurate Multicast using Bloom Filter with Customized Hash Function[C]//Proceedings of the 7th Asia-Pacific Workshop on Networking.ACM,2023:157-163.
[11]EVEN T,EVEN G,MORRISON A.Prefix filter:practically and theoretically better than bloom[C]//Proceedings of the VLDB Endowment.VLDB,2022:1311-1323.
[12]DAYAN N,TWITTO M.Chucky:A Succinct Cuckoo Filter for LSM-Tree[C]//Proceedings of the International Conference on Management of Data.ACM,2021:365-378.
[13]GUBNER T,TOMÉ D,LANG H,et al.Fluid Co-processing:GPU Bloom-filters for CPU Joins[C]//Proceedings of the 15th International Workshop on Data Management on New Hardware.ACM,2019:1-10.
[14]LI Y,WANG Z,YANG R,et al.Learned Bloom Filter for Multi-key Membership Testing[C]//Proceedings of the Database Systems for Advanced Applications.Cham:Springer Nature Switzerland,2023:62-79.
[15]FU P,LUO L,LI S,et al.The Vertical Cuckoo Filters:A Family of Insertion-friendly Sketches for Online Applications[C]//Proceedings of the International Conference on Distributed Computing Systems.IEEE,2021:57-67.
[16]DAI M,XU S,SHAO S,et al.Blockchain-Based Reliable Fog-Cloud Service Solution for IIoT [J].Chinese Journal of Electronics,2021,30(2):359-366.
[17]LI P,YU X,XU H,et al.Secure Localization Technology Based on Dynamic Trust Management in Wireless Sensor Networks [J].Chinese Journal of Electronics,2021,30(4):759-768.
[18]LI L,WU J,HU H,et al.Secure Cloud Architecture for 5G Core Network [J].Chinese Journal of Electronics,2021,30(3):516-522.
[19]KRASKA T,BEUTEL A,CHI E H,et al.The Case for Learned Index Structures[C]//Proceedings of the International Confe-rence on Management of Data.ACM,2018:489-504.
[20]MITZENMACHER M.A Model for Learned Bloom Filters and Optimizing by Sandwiching[C]//Proceedings of the Advances in Neural Information Processing Systems.Curran Associates,Inc.,2018:462-471.
[21]DAI Z,SHRIVASTAVA A.Adaptive Learned Bloom Filter(Ada-BF):Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web[C]//Proceedings of the Advances in Neural Information Processing Systems.ACM,2020:11700-11710.
[22]VAIDYA K,KNORR E,KRASKA T,et al.Partitioned Learned Bloom Filter[J].arXiv:2006.03176,2020.
[23]LIU Q,ZHENG L,SHEN Y,et al.Stable learned bloom filters for data streams[C]//Proceedings of the VLDB Endowment.VLDB.2020:2355-2367.
[24]LI M,XIE R,CHEN D,et al.A Pareto optimal Bloom filterfamily with hash adaptivity[J].The Springer International Journal on Very Large Data Bases,2022,32(3):525-548.
[25]DEEDS K,HENTSCHEL B,IDREOS S.Stacked filters:lear-ning to filter by structure[C]//Proceedings of the VLDB Endowment.VLDB,2020:600-612.
[26]BLOOM B H.Space/time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[27]BHATTACHARYA A,GUDESA C,BAGCHI A,et al.Newwine in an old bottle:data-aware hash functions for bloom filters[C]//Proceedings of the VLDB Endowment.VLDB,2022:1924-1936.
[28]DAYAN N,ATHANASSOULIS M,IDREOS S.Optimal Bloom Filters and Adaptive Merging for LSM-Trees[J].ACM Transactions on Database Systems,2018,43(4):1-48.
[29]CHEN H,LIAO L,JIN H,et al.The dynamic cuckoo filter[C]//Proceedings of the International Conference on Network Protocols.IEEE,2017:1-10.
[30]BRESLOW A D,JAYASENA N S.Morton filters:faster,space-efficient cuckoo filters via biasing,compression,and decoupled logical sparsity[C]//Proceedings of the VLDB Endowment.VLDB,2018:1041-1055.
[31]GRAF T M,LEMIRE D.Binary Fuse Filters:Fast and Smaller Than Xor Filters[J].ACM Journal of Experimental Algorithmics,2022,27:1-15.
[32]FICARA D,DI PIETRO A,GIORDANO S,et al.Enhancingcounting bloom filters through Huffman-coded multilayer structures[J].IEEE/ACM Transactions on Networking,2010,18(6):1977-1987.
[33]LAUFER R P,VELLOSO P B,DUARTE O C M B.A Genera-lized Bloom Filter to Secure Distributed Network Applications[J].Computer Networks,2011,55(8):1804-1819.
[1] WANG Hancheng, DAI Haipeng, CHEN Shusen, CHEN Zhipeng, CHEN Guihai. Filter Data Structures:A Survey [J]. Computer Science, 2024, 51(1): 35-40.
[2] ZHANG Wenqiong, LI Yun. Fairness Metrics of Machine Learning:Review of Status,Challenges and Future Directions [J]. Computer Science, 2024, 51(1): 266-272.
[3] FU Jianming, JIANG Yuqian, HE Jia, ZHENG Rui, SURI Guga, PENG Guojun. Cryptocurrency Mining Malware Detection Method Based on Sample Embedding [J]. Computer Science, 2024, 51(1): 327-334.
[4] LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[5] HUANG Shuxin, ZHANG Quanxin, WANG Yajie, ZHANG Yaoyuan, LI Yuanzhang. Research Progress of Backdoor Attacks in Deep Neural Networks [J]. Computer Science, 2023, 50(9): 52-61.
[6] WANG Yao, LI Yi. Termination Analysis of Single Path Loop Programs Based on Iterative Trajectory Division [J]. Computer Science, 2023, 50(9): 108-116.
[7] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[8] LI Yang, LI Zhenhua, XIN Xianlong. Attack Economics Based Fraud Detection for MVNO [J]. Computer Science, 2023, 50(8): 260-270.
[9] ZHU Boyu, CHEN Xiao, SHA Letian, XIAO Fu. Two-layer IoT Device Classification Recognition Model Based on Traffic and Text Fingerprints [J]. Computer Science, 2023, 50(8): 304-313.
[10] LU Xingyuan, CHEN Jingwei, FENG Yong, WU Wenyuan. Privacy-preserving Data Classification Protocol Based on Homomorphic Encryption [J]. Computer Science, 2023, 50(8): 321-332.
[11] LIU Xiang, ZHU Jing, ZHONG Guoqiang, GU Yongjian, CUI Liyuan. Quantum Prototype Clustering [J]. Computer Science, 2023, 50(8): 27-36.
[12] WANG Xiya, ZHANG Ning, CHENG Xin. Review on Methods and Applications of Text Fine-grained Emotion Recognition [J]. Computer Science, 2023, 50(6A): 220900137-7.
[13] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
[14] WANG Jinjin, CHENG Yinhui, NIE Xin, LIU Zheng. Fast Calculation Method of High-altitude Electromagnetic Pulse Environment Based on Machine Learning [J]. Computer Science, 2023, 50(6A): 220500046-5.
[15] YIN Xingzi, PENG Ningning, ZHAN Xueyan. Filtered Feature Selection Algorithm Based on Persistent Homology [J]. Computer Science, 2023, 50(6): 159-166.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!