Computer Science ›› 2024, Vol. 51 ›› Issue (1): 35-40.doi: 10.11896/jsjkx.231000193

• Special Issue on the 53th Anniversary of Computer Science • Previous Articles     Next Articles

Filter Data Structures:A Survey

WANG Hancheng, DAI Haipeng, CHEN Shusen, CHEN Zhipeng, CHEN Guihai   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2023-10-26 Revised:2023-11-23 Online:2024-01-15 Published:2024-01-12
  • About author:WANG Hancheng,born in 1998,postgraduate,is a student member of CCF(No.F2622G).His main research in-terests include data indexing and query optimization.
    DAI Haipeng,born in 1985,Ph.D,associate professor,Ph.D supervisor,is a distinguished member of CCF(No.19625S).His main research interests include data mining and mobile computing.
  • Supported by:
    National Natural Science Foundation of China(62272223).

Abstract: Filter data structures can approximately determine whether an element exists in a given set.Typical filter data structures,such as Bloom filters,cuckoo filters,and quotient filters,sacrifice query accuracy for lower memory space consumption and lower query time overhead.Due to their spatial and temporal efficiency,filter data structures are now widely used in approximate membership query operations in computer networks,the Internet of Things,database systems,file systems,bioinformatics,machine learning,and other fields.Since the 1970s,filters have been extensively studied.Their research ideas are constantly changing.This paper compiles the classic studies on filter data structures in the past fifty years,summarizes existing studies based on the mechanism of filter data structures and analyze the relationship between different studies.Finally,future research directions in filter data structures are discussed.

Key words: Filter, Approximate membership query, Probabilistic data structure, Bloom filter, Cuckoo filter, Quotient filter

CLC Number: 

  • TP391
[1]YU M,FABRIKANT A,REXFORD J.BUFFALO:Bloom filter forwarding architecture for large organizations[C]//Proceedings of International Conference on Emerging Networking Experiments and Technologies.2009:313-324.
[2]ATHANASSOULIS M,AILAMAKI A.BF-Tree:Approximate tree indexing[J].Proceedings of the VLDB Endowment,2014,7(14):1881-1892.
[3]LI P,LUO B,ZHU W,et al.Cluster-based distributed dynamic cuckoo filter system forRedis[J].International Journal of Parallel,Emergent and Distributed Systems,2020,35(3):340-353.
[4]WANG F,CHEN H,LIAO L,et al.The power of betterchoice:Reducing relocations in cuckoo filter[C]//Proceedings of International Conference on Distributed Computing Systems.2019:358-367.
[5]GU R,LI S,DAI H,et al.Adaptive online cache capacity optimi-zation via lightweight working set size estimation atscale[C]//Proceedings of USENIX Annual Technical Conference.2023:467-484.
[6]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[J].IEEE/ACM Transactions on Networking,2020,28(1):1-14.
[7]DAI H,SHAHZAD M,LIU A X,et al.Identifying and estimating persistent items in data streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442.
[8]XIA R,DAI H,DU Z,et al.Mining robust frequent items in data streams[C]//Proceedings of International Conference on Joint Cloud Computing.2020:110-117.
[9]DAI H,LI M,WANG H,et al.Distantly supervised entity lin-king with selection consistency constraint[C]//Proceedings of International Conference on Database Systems for Advanced Applications.2023:784-799.
[10]LIU J,DAI H,XIA R,et al.DUET:A generic framework for finding special quadratic elements in data streams[C]//Procee-dings of ACM Web Conference.2022:2989-2997.
[11]YU S,LI X,WANG H,et al.C_CART:An instance confidence-based decision tree algorithm for classification[J].Intelligent Data Analysis,2021,25(4):929-948.
[12]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[C]//Proceedings of International Confe-rence on Distributed Computing Systems.2019:218-227.
[13]YU S,LI X,WANG H,et al.BIDI:A classification algorithm with instance difficulty invariance[J].Expert Systems with Applications,2021,165(1):1-13.
[14]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046.
[15]DAI H,SHAHZAD M,LIU A X,et al.Finding persistent items in data streams[J].Proceedings of the VLDB Endowment,2016,10(4):289-300.
[16]DAI M,XU S,SHAO S,et al.Blockchain-based reliable fog-cloud service solution for IIoT[J].Chinese Journal of Electro-nics,2021,30(2):359-366.
[17]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[C]//Proceedings of International Conference on Computer Communications.2018:1403-1411.
[18]WANG H,DAI H,LI M,et al.Bamboo filters:Make resizingsmooth[C]//Proceedings of International Conference on Data Engineering.2022:979-991.
[19]LI M,CHEN D,DAI H,et al.Seesaw counting filter:An efficient guardian for vulnerable negative keys during dynamic filtering[C]//Proceedings of the ACM Web Conference.2022:2759-2767.
[20]LI M,XIE R,CHEN D,et al.A pareto optimal bloom filter fa-mily with hash adaptivity[J].The VLDB Journal,2023,32(3):525-548.
[21]WANG H,DAI H,GU R,et al.Wormhole filters:Caching your hash on persistent memory[C]//Proceedings of European Conference on Computer Systems.2024:1-16.
[22]LI Y,WANG Z,YANG R,et al.Learned bloom filter for multi-key membership testing[C]//Proceedings of the International Conference on Database Systems for Advanced Applications.2023:62-79.
[23]DAI H,YU J,LI M,et al.Bloom filter with noisy coding framework for multi-set membership testing[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(7):6710-6724.
[24]LI L,WU J,HU H,et al.Secure cloud architecture for 5G core network[J].Chinese Journal of Electronics,2021,30(3):516-522.
[25]DAI H,ZHONG Y,LIU A X,et al.Noisy bloom filters formulti-set membership testing[C]//Proceedings of International Conference on Measurement and Modeling of Computer Science.2016:139-151.
[26]LI P,YU X,XU H,et al.Secure localization technology based on dynamic trust management in wireless sensor networks[J].Chinese Journal of Electronics,2021,30(4):759-768.
[27]BLOOM B H.Space/time tradeoffs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[28]FAN L,CAO P,ALMEIDA J M,et al.Summary cache:A scalable wide area web cache sharing protocol[J].IEEE/ACM Transactions on Networking,2000,8(3):281-293.
[29]BONOMI F,MITZENMACHER M,PANIGRAHY R,et al.An improved construction for counting bloom filters[C]//Procee-dings of the Annual European Symposium.2006:684-695.
[30]MITZENMACHER M.Compressed bloom filters[J].IEEE/ACM Transactions on Networking,2002,10(5):604-612.
[31]GUO D,WU J,CHEN H,et al.The dynamic bloom filters[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(1):120-133.
[32]WU Y,HE J,YAN S,et al.Elastic bloom filter:Deletable andexpandable filter using elastic fingerprints[J].IEEE Transactions on Computers,2021,71(4):984-991.
[33]KRASKA T,BEUTEL A,CHI E H,et al.The case for learned index structures[C]//Proceedings of the 2018 International Conference on Management of Data.2018:489-504.
[34]PENG Y,GUO J,LI F,et al.Persistent bloom filter:Membership testing for the entire history[C]//Proceedings of the 2018 International Conference on Management of Data.2018:1037-1052.
[35]XIE R,LI M,MIAO Z,et al.Hash adaptive bloom filter[C]//Proceedings of the 37th IEEE International Conference on Data Engineering.2021:636-647.
[36]ROTHENBERG C E,MACAPUNA C A B,VERDI F L,et al.The deletable bloom filter:a new member of the bloom family[J].IEEE Communications Letters,2010,14(6):557-559.
[37]MOSHARRAF N,JAYASUMANA A P,RAY I.Compactedbloom filter[C]//Proceedings of the 2nd IEEE International Conference on Collaboration and Internet Computing.2016:304-311.
[38]ALMEIDA P S,BAQUERO C,PREGUIÇA N M,et al.Scalable bloom filters[J].Information Processing Letters,2007,101(6):255-261.
[39]LU J,YANG T,WANG Y,et al.One hashing bloom filter[C]//Proceedings of the International Symposium on Quality of Ser-vice.2015:289-298.
[40]BHATTACHARYA A,BEDATHUR S,BAGCHI A.Adaptive learned bloom filters under incremental workloads[C]//Proceedings of the India Joint International Conference on Data Science and Management of Data.2020:107-115.
[41]LIU Q,ZHENG L,SHEN Y,et al.Stable learnedbloom filters for data streams[J].Proceedings of the VLDB Endowment,2020,13(11):2355-2367.
[42]GOSWAMI M,PAGH R,SILVESTRI F,et al.Distance sensitive bloom filters without false negatives[C]//Proceedings of the Annual Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics.2017:257-269.
[43]COHEN S,MATIAS Y.Spectral bloom filters[C]//Proceedings of the International Conference on Management of Data.2003:241-252.
[44]MATSUMOTO Y,HAZEYAMA H,KADOBAYASHI Y.Adaptive bloom filter:A space-efficient counting algorithm for unpredictable network traffic[J].IEICE Transactions on Information and Systems,2008,91(5):1292-1299.
[45]FAN B,ANDERSEN D G,KAMINSKY M,et al.Cuckoo filter:Practically better than bloom[C]//Proceedings of International Conference on Emerging Networking Experiments and Techno-logies.2014:75-88.
[46]BENDER M A,FARACHCOLTON M,JOHNSON R,et al.Don’t thrash:How to cache your hash on flash[J].Proceedings of the VLDB Endowment,2012,5(11):1627-1637.
[47]CHEN H,LIAO L,JIN H,et al.The dynamic cuckoo filter[C]//Proceedings of the International Conference on Network Protocols.2017:1-10.
[48]BRESLOW A D,JAYASENA N.Morton filters:Faster,space-efficient cuckoo filters via biasing,compression,and decoupled logical sparsity[J].Proceedings of the VLDB Endowment,2018,11(9):1041-1055.
[49]BRESLOW A D,JAYASENA N.Morton filters:fast,com-pressed sparse cuckoo filters[J].The VLDB Journal,2020,29(2):731-754.
[50]WANG M,ZHOU M,SHI S,et al.Vacuum filters:More space-efficient and faster replacement for bloom and cuckoo filters[J].Proceedings of the VLDB Endowment,2019,13(2):197-210.
[51]GRAF T M,LEMIRE D.Xor filters:Faster and smaller thanbloom and cuckoo filters[J].ACM Journal of Experimental Algorithmics,2020,25(1):1-16.
[52]ZHANG F,CHEN H,JIN H,et al.The logarithmic dynamic cuckoo filter[C]//Proceedings of the 37th IEEE International Conference on Data Engineering.2021:948-959.
[53]FU P,LUO L,GUO D,et al.Jump filter:Dynamic sketch design for big data governance[J].Journal of Software,2023,34(3):1193-1212.
[54]HUANG K,YANG T.Additive and subtractive cuckoo filters[C]//Proceedings of the International Symposium on Quality of Service.2020:1-10.
[55]HUANG K,YANG T.Tagged cuckoo filters[C]//Proceedings of the International Conference on Computer Communications and Networks.2021:1-10.
[56]PANDEY P,BENDER M A,JOHNSON R,et al.A general-purpose counting filter:Making every bit count[C]//Proceedings of the International Conference on Management of Data.2017:775-787.
[57]PANDEY P,CONWAY A,DURIE J,et al.Vector quotient filters:Overcoming the time/space trade-off in filter design[C]//Proceedings of the International Conference on Management of Data.2021:1386-1399.
[58]DAYAN N,BERCEA I,REVIRIEGO P,et al.InfiniFilter:Expanding filters to infinity and beyond[J].Proceedings of the ACM on Management of Data,2023,1(2):1-27.
[1] LI Meng, DAI Haipeng, SUI Yongxi, GU Rong, CHEN Guihai. Survey of Learning-based Filters [J]. Computer Science, 2024, 51(1): 41-49.
[2] QIU Zhen, ZHENG Zhaohui. Graph Similarity Search with High Efficiency and Low Index [J]. Computer Science, 2023, 50(9): 130-138.
[3] HUANG Lu, NI Lyu, JIN Cheqing. Rectifying Dual Bias for Recommendation [J]. Computer Science, 2023, 50(9): 152-159.
[4] WANG Luo, LI Biao, FU Ruigang. Infrared Ground Multi-object Tracking Method Based on Improved ByteTrack Algorithm [J]. Computer Science, 2023, 50(9): 176-183.
[5] BAI Zhengyao, XU Zhu, ZHANG Yihan. Deep Artificial Correspondence Generation for 3D Point Cloud Registration [J]. Computer Science, 2023, 50(9): 210-219.
[6] CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[7] SHAO Peng. FIR Low Pass Digital Filter Based on Multi-strategy Discrete Artificial Bee Colony Algorithm [J]. Computer Science, 2023, 50(6A): 220700026-5.
[8] JIANG Gaoxia, QIN Pei, WANG Wenjian. Noise Estimation and Filtering Methods with Limit Distance [J]. Computer Science, 2023, 50(6): 151-158.
[9] LIU Zejing, WU Nan, HUANG Fuqun, SONG You. Hybrid Programming Task Recommendation Model Based on Knowledge Graph and Collaborative Filtering for Online Judge [J]. Computer Science, 2023, 50(2): 106-114.
[10] ZHANG Qi, YU Shuangyuan, YIN Hongfeng, XU Baomin. Neural Collaborative Filtering for Social Recommendation Algorithm Based on Graph Attention [J]. Computer Science, 2023, 50(2): 115-122.
[11] LI Chen, WAN Yuan. Study on Time Series Shapelets Extraction Based on Optimization and Two-phase Filtering [J]. Computer Science, 2023, 50(2): 146-157.
[12] YANG Xin, LI Gengxin, LI Hui. EHFM:An Efficient Hierarchical Filtering Method for Multi-source Network Malicious Alerts [J]. Computer Science, 2023, 50(2): 324-332.
[13] FAN Hongyu, ZHANG Yongku, MENG Xiangfu. Recommendation Method Based on Knowledge Graph Residual Attention Networks [J]. Computer Science, 2023, 50(11A): 220900180-7.
[14] JIANG Binze, DENG Xin, DU Yulu, ZHANG Heng. Next-basket Recommendation Algorithm Based on Correlation Between Items Collaborative Filtering [J]. Computer Science, 2023, 50(11A): 221000076-6.
[15] MA Handa, FANG Yuqing. Dynamic Negative Sampling for Graph Convolution Network Based Collaborative Filtering Recommendation Model [J]. Computer Science, 2023, 50(11A): 230200149-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!