计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 35-40.doi: 10.11896/jsjkx.231000193
王瀚橙, 戴海鹏, 陈树森, 陈志鹏, 陈贵海
WANG Hancheng, DAI Haipeng, CHEN Shusen, CHEN Zhipeng, CHEN Guihai
摘要: 过滤器数据结构可以近似地判断某个元素是否属于给定集合。典型的过滤器数据结构,如布隆过滤器、布谷鸟过滤器、商过滤器,以牺牲查询准确性为代价换取更低的内存空间消耗和查询时间开销。因此,得益于空间时间高效性,过滤器数据结构现已被广泛应用于计算机网络、物联网、数据库系统、文件系统、生物信息学、机器学习等领域的近似成员资格查询操作中。自20世纪70年代以来,过滤器数据结构受到了广泛的研究,在诸多领域取得了重要的进展,其研究思路也在不断变化。文中整理了近五十年来关于过滤器数据结构的经典研究成果,从过滤器数据结构的原理出发对已有工作进行分类总结,并比较不同工作之间的引证关系和改进思路,最后讨论了过滤器数据结构的未来研究方向。
中图分类号:
[1]YU M,FABRIKANT A,REXFORD J.BUFFALO:Bloom filter forwarding architecture for large organizations[C]//Proceedings of International Conference on Emerging Networking Experiments and Technologies.2009:313-324. [2]ATHANASSOULIS M,AILAMAKI A.BF-Tree:Approximate tree indexing[J].Proceedings of the VLDB Endowment,2014,7(14):1881-1892. [3]LI P,LUO B,ZHU W,et al.Cluster-based distributed dynamic cuckoo filter system forRedis[J].International Journal of Parallel,Emergent and Distributed Systems,2020,35(3):340-353. [4]WANG F,CHEN H,LIAO L,et al.The power of betterchoice:Reducing relocations in cuckoo filter[C]//Proceedings of International Conference on Distributed Computing Systems.2019:358-367. [5]GU R,LI S,DAI H,et al.Adaptive online cache capacity optimi-zation via lightweight working set size estimation atscale[C]//Proceedings of USENIX Annual Technical Conference.2023:467-484. [6]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[J].IEEE/ACM Transactions on Networking,2020,28(1):1-14. [7]DAI H,SHAHZAD M,LIU A X,et al.Identifying and estimating persistent items in data streams[J].IEEE/ACM Transactions on Networking,2018,26(6):2429-2442. [8]XIA R,DAI H,DU Z,et al.Mining robust frequent items in data streams[C]//Proceedings of International Conference on Joint Cloud Computing.2020:110-117. [9]DAI H,LI M,WANG H,et al.Distantly supervised entity lin-king with selection consistency constraint[C]//Proceedings of International Conference on Database Systems for Advanced Applications.2023:784-799. [10]LIU J,DAI H,XIA R,et al.DUET:A generic framework for finding special quadratic elements in data streams[C]//Procee-dings of ACM Web Conference.2022:2989-2997. [11]YU S,LI X,WANG H,et al.C_CART:An instance confidence-based decision tree algorithm for classification[J].Intelligent Data Analysis,2021,25(4):929-948. [12]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[C]//Proceedings of International Confe-rence on Distributed Computing Systems.2019:218-227. [13]YU S,LI X,WANG H,et al.BIDI:A classification algorithm with instance difficulty invariance[J].Expert Systems with Applications,2021,165(1):1-13. [14]LI M,DAI H,WANG X,et al.Thresholded monitoring in distributed data streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046. [15]DAI H,SHAHZAD M,LIU A X,et al.Finding persistent items in data streams[J].Proceedings of the VLDB Endowment,2016,10(4):289-300. [16]DAI M,XU S,SHAO S,et al.Blockchain-based reliable fog-cloud service solution for IIoT[J].Chinese Journal of Electro-nics,2021,30(2):359-366. [17]DAI H,LI M,LIU A X,et al.Finding persistent items in distributed datasets[C]//Proceedings of International Conference on Computer Communications.2018:1403-1411. [18]WANG H,DAI H,LI M,et al.Bamboo filters:Make resizingsmooth[C]//Proceedings of International Conference on Data Engineering.2022:979-991. [19]LI M,CHEN D,DAI H,et al.Seesaw counting filter:An efficient guardian for vulnerable negative keys during dynamic filtering[C]//Proceedings of the ACM Web Conference.2022:2759-2767. [20]LI M,XIE R,CHEN D,et al.A pareto optimal bloom filter fa-mily with hash adaptivity[J].The VLDB Journal,2023,32(3):525-548. [21]WANG H,DAI H,GU R,et al.Wormhole filters:Caching your hash on persistent memory[C]//Proceedings of European Conference on Computer Systems.2024:1-16. [22]LI Y,WANG Z,YANG R,et al.Learned bloom filter for multi-key membership testing[C]//Proceedings of the International Conference on Database Systems for Advanced Applications.2023:62-79. [23]DAI H,YU J,LI M,et al.Bloom filter with noisy coding framework for multi-set membership testing[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(7):6710-6724. [24]LI L,WU J,HU H,et al.Secure cloud architecture for 5G core network[J].Chinese Journal of Electronics,2021,30(3):516-522. [25]DAI H,ZHONG Y,LIU A X,et al.Noisy bloom filters formulti-set membership testing[C]//Proceedings of International Conference on Measurement and Modeling of Computer Science.2016:139-151. [26]LI P,YU X,XU H,et al.Secure localization technology based on dynamic trust management in wireless sensor networks[J].Chinese Journal of Electronics,2021,30(4):759-768. [27]BLOOM B H.Space/time tradeoffs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426. [28]FAN L,CAO P,ALMEIDA J M,et al.Summary cache:A scalable wide area web cache sharing protocol[J].IEEE/ACM Transactions on Networking,2000,8(3):281-293. [29]BONOMI F,MITZENMACHER M,PANIGRAHY R,et al.An improved construction for counting bloom filters[C]//Procee-dings of the Annual European Symposium.2006:684-695. [30]MITZENMACHER M.Compressed bloom filters[J].IEEE/ACM Transactions on Networking,2002,10(5):604-612. [31]GUO D,WU J,CHEN H,et al.The dynamic bloom filters[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(1):120-133. [32]WU Y,HE J,YAN S,et al.Elastic bloom filter:Deletable andexpandable filter using elastic fingerprints[J].IEEE Transactions on Computers,2021,71(4):984-991. [33]KRASKA T,BEUTEL A,CHI E H,et al.The case for learned index structures[C]//Proceedings of the 2018 International Conference on Management of Data.2018:489-504. [34]PENG Y,GUO J,LI F,et al.Persistent bloom filter:Membership testing for the entire history[C]//Proceedings of the 2018 International Conference on Management of Data.2018:1037-1052. [35]XIE R,LI M,MIAO Z,et al.Hash adaptive bloom filter[C]//Proceedings of the 37th IEEE International Conference on Data Engineering.2021:636-647. [36]ROTHENBERG C E,MACAPUNA C A B,VERDI F L,et al.The deletable bloom filter:a new member of the bloom family[J].IEEE Communications Letters,2010,14(6):557-559. [37]MOSHARRAF N,JAYASUMANA A P,RAY I.Compactedbloom filter[C]//Proceedings of the 2nd IEEE International Conference on Collaboration and Internet Computing.2016:304-311. [38]ALMEIDA P S,BAQUERO C,PREGUIÇA N M,et al.Scalable bloom filters[J].Information Processing Letters,2007,101(6):255-261. [39]LU J,YANG T,WANG Y,et al.One hashing bloom filter[C]//Proceedings of the International Symposium on Quality of Ser-vice.2015:289-298. [40]BHATTACHARYA A,BEDATHUR S,BAGCHI A.Adaptive learned bloom filters under incremental workloads[C]//Proceedings of the India Joint International Conference on Data Science and Management of Data.2020:107-115. [41]LIU Q,ZHENG L,SHEN Y,et al.Stable learnedbloom filters for data streams[J].Proceedings of the VLDB Endowment,2020,13(11):2355-2367. [42]GOSWAMI M,PAGH R,SILVESTRI F,et al.Distance sensitive bloom filters without false negatives[C]//Proceedings of the Annual Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics.2017:257-269. [43]COHEN S,MATIAS Y.Spectral bloom filters[C]//Proceedings of the International Conference on Management of Data.2003:241-252. [44]MATSUMOTO Y,HAZEYAMA H,KADOBAYASHI Y.Adaptive bloom filter:A space-efficient counting algorithm for unpredictable network traffic[J].IEICE Transactions on Information and Systems,2008,91(5):1292-1299. [45]FAN B,ANDERSEN D G,KAMINSKY M,et al.Cuckoo filter:Practically better than bloom[C]//Proceedings of International Conference on Emerging Networking Experiments and Techno-logies.2014:75-88. [46]BENDER M A,FARACHCOLTON M,JOHNSON R,et al.Don’t thrash:How to cache your hash on flash[J].Proceedings of the VLDB Endowment,2012,5(11):1627-1637. [47]CHEN H,LIAO L,JIN H,et al.The dynamic cuckoo filter[C]//Proceedings of the International Conference on Network Protocols.2017:1-10. [48]BRESLOW A D,JAYASENA N.Morton filters:Faster,space-efficient cuckoo filters via biasing,compression,and decoupled logical sparsity[J].Proceedings of the VLDB Endowment,2018,11(9):1041-1055. [49]BRESLOW A D,JAYASENA N.Morton filters:fast,com-pressed sparse cuckoo filters[J].The VLDB Journal,2020,29(2):731-754. [50]WANG M,ZHOU M,SHI S,et al.Vacuum filters:More space-efficient and faster replacement for bloom and cuckoo filters[J].Proceedings of the VLDB Endowment,2019,13(2):197-210. [51]GRAF T M,LEMIRE D.Xor filters:Faster and smaller thanbloom and cuckoo filters[J].ACM Journal of Experimental Algorithmics,2020,25(1):1-16. [52]ZHANG F,CHEN H,JIN H,et al.The logarithmic dynamic cuckoo filter[C]//Proceedings of the 37th IEEE International Conference on Data Engineering.2021:948-959. [53]FU P,LUO L,GUO D,et al.Jump filter:Dynamic sketch design for big data governance[J].Journal of Software,2023,34(3):1193-1212. [54]HUANG K,YANG T.Additive and subtractive cuckoo filters[C]//Proceedings of the International Symposium on Quality of Service.2020:1-10. [55]HUANG K,YANG T.Tagged cuckoo filters[C]//Proceedings of the International Conference on Computer Communications and Networks.2021:1-10. [56]PANDEY P,BENDER M A,JOHNSON R,et al.A general-purpose counting filter:Making every bit count[C]//Proceedings of the International Conference on Management of Data.2017:775-787. [57]PANDEY P,CONWAY A,DURIE J,et al.Vector quotient filters:Overcoming the time/space trade-off in filter design[C]//Proceedings of the International Conference on Management of Data.2021:1386-1399. [58]DAYAN N,BERCEA I,REVIRIEGO P,et al.InfiniFilter:Expanding filters to infinity and beyond[J].Proceedings of the ACM on Management of Data,2023,1(2):1-27. |
|