计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 393-402.doi: 10.11896/jsjkx.240900085
孙靖宇1, 黄河1, 孙玉娥2, 张博宇1
SUN Jingyu1, HUANG He1, SUN Yu'e2, ZHANG Boyu1
摘要: 在高速网络环境中,超级传播者被界定为那些具有大量连接的主机或设备。准确的超级传播者检测在网络监控、安全分析及流量管理等多种应用中起着至关重要的作用。基于Sketch的可逆算法因具有卓越的内存效率与从内部结构中恢复超级传播者ID的能力,受到了广泛关注。根据应用需要,通常将同一台主机或设备发出或接收的所有数据包抽象为一条流。在高速网络中,流的分布通常高度偏斜,仅有少部分流为大流,绝大多数是小流。然而,现有研究的内存结构设计无法高效地适应高度偏斜的流分布,使得内存资源利用率较为低下。为此,设计了一种基于自适应采样的超级传播者检测算法AS-SSD(Adaptive Sampling Based Super Spreader Detection),该算法通过一种基于寄存器共享的自适应采样策略,弥补了上述不足。AS-SSD首先将到达的流元素映射到一个寄存器数组中,使得小流仅使用少量寄存器,而越大的流使用越多的寄存器,从而适应偏斜的流分布;接着,将所有流的元素映射到一个寄存器数组中,使得小流仅使用少量寄存器,大流使用更多的寄存器,从而适应偏斜的流分布;然后,利用自适应采样策略动态调整不同规模流的元素采样概率,在保证精度的前提下,减少大流对寄存器的占用,进一步提升内存资源的利用效率。实验评估显示,AS-SSD在维持高吞吐量的同时,在超级传播者检测任务中展现出了更高的检测准确度,相比目前最先进的算法,最高可以将F1值提高0.609以上。
中图分类号:
[1]SEN S,WANG J.Analyzing peer-to-peer traffic across largenetworks[C]//Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement.2002:137-150. [2]JING X,YAN Z,PEDRYCZ W.Security data collection and data analytics in the internet:A survey[J].IEEE Communications Surveys & Tutorials,2018,21(1):586-618. [3]DURUMERIC Z,BAILEY M,HALDERMAN J A.An {Internet-Wide} view of {Internet-Wide} scanning[C]//23rd USENIX Security Symposium(USENIX Security 14).2014:65-78. [4]SINGH S,ESTAN C,VARGHESE G,et al.Automated Worm Fingerprinting[C]//OSDI.2004. [5]JuniperNetworks.Qfx5110 ethernet switch datasheet[EB/OL].https://www.juniper.net/us/en/products/switches/qfx-series/. [6]YOON M K,CHEN S.Detecting stealthy spreaders by random aging streaming filters[J].IEICE Transactions Cn communications,2011,94(8):2274-2281. [7]WANG H,MA C,ODEGBILE O O,et al.Randomized error removal for online spread estimation in data streaming[C]//Proceedings of the VLDB Endowment.2021. [8]TANG L,XIAO Y,HUANG Q,et al.A high-performance invertible sketch for network-wide superspreader detection[J].IEEE/ACM Transactions on Networking,2022,31(2):724-737. [9]BRUSCHI V,PONTARELLI S,TOLLET J,et al.FlowFight:High performance-low memory top-k spreader detection[J].Computer Networks,2021,196:108239. [10]JING X,YAN Z,HAN H,et al.ExtendedSketch:Fusing network traffic for super host identification with a memory efficient sketch[J].IEEE Transactions on Dependable and Secure Computing,2021,19(6):3913-3924. [11]HAN H,JING X,YAN Z,et al.ExtendedSketch+:Super host identification and network host trust evaluation with memory efficiency and high accuracy[J].Information Fusion,2023,92:300-312. [12]JIA P,WANG P,ZHANG Y,et al.Accurately estimating user cardinalities and detecting super spreaders over time[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):92-106. [13]CAIDA.Anonymized internet traces 2019 [EB/OL].https://catalog.caida.org/dataset/ passive_2019_pcap. [14]CAO J,JIN Y,CHEN A,et al.Identifying high cardinality internet hosts[C]//IEEE INFOCOM 2009.IEEE,2009:810-818. [15]KAMIYAMA N,MORI T,KAWAHARA R.Simple and adaptive identification of superspreaders by flow sampling[C]//IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications.IEEE,2007:2481-2485. [16]SHI X,CHIU D M,LUI J C S.An online framework for cat-ching top spreaders and scanners[J].Computer Networks,2010,54(9):1375-1388. [17]VENKATARAMAN S,SONG D X,GIBBONS P B,et al.New streaming algorithms for fast detection of superspreaders[C]//NDSS.2005:149-166. [18]LI Y,MIAO R,KIM C,et al.{FlowRadar}:A better {NetFlow} for data centers[C]//13th USENIX Symposium on Networked Systems Design and Implementation(NSDI 16).2016:311-324. [19]YANG T,GAO S,SUN Z,et al.Diamond sketch:Accurate per-flow measurement for bigstreaming data[J].IEEE Transactions on Parallel and Distributed Systems,2019,30(12):2650-2662. [20]HUANG H,SUN Y E,MA C,et al.Spread estimation withnon-duplicate sampling in high-speed networks[J].IEEE/ACM Transactions on Networking,2021,29(5):2073-2086. [21]DU Y,HUANG H,SUN Y E,et al.Self-adaptive sampling for network traffic measurement[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10. [22]DU Y,HUANG H,SUN Y E,et al.Short-term memory sampling for spread measurement in high-speed networks[C]//IEEE INFOCOM 2022-IEEE Conference on Computer Communications.IEEE,2022:470-479. [23]KUMAR K,XU J,WANG J,et al.Spacecode bloom filter for efficient per-flow traffic measurement[C]//Infocom Twenty-Third Joint Conference of the IEEE Computer & Communications Societies.2004. [24]DING D,SAVI M,PEDERZOLLI F,et al.In-network volumetric DDoS victim identification using programmable commodity switches[J].IEEE Transactions on Network and Service Management,2021,18(2):1191-1202. [25]CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:the count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58-75. [26]ESTAN C,VARGHESE G,FISK M.Bitmap algorithms forcounting active flows on high speed links[C]//Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement.2003:153-166. [27]MA C,CHEN S,ZHANG Y,et al.Super spreader identification using geometric-min filter[J].IEEE/ACM Transactions on Networking,2021,30(1):299-312. [28]WANG P,GUAN X,QIN T,et al.A data streaming method for monitoring host connection degrees of high-speed links[J].IEEE Transactions on Information Forensics and Security,2011,6(3):1086-1098. [29]LIU W,QU W,GONG J,et al.Detection of superpoints using a vector bloom filter[J].IEEE Transactions on Information Forensics and Security,2015,11(3):514-527. [30]LIU Y,CHEN W,GUAN Y.Identifying high-cardinality hosts from network-wide traffic measurements[J].IEEE Transactions on Dependable and Secure Computing,2015,13(5):547-558. [31]FLAJOLET P,FUSY É,GANDOUET O,et al.Hyperloglog:the analysis of a near-optimal cardinality estimation algorithm[C]//Discrete Mathematics & Theoretical Computer Science.2007. [32]METWALLY A,AGRAWAL D,EL ABBADI A.Efficient computation of frequent and top-k elements in data streams[C]//International Conference on Database Theory.Berlin:Springer,2005:398-412. [33]XIAO Q,CHEN S,CHEN M,et al.Hyper-compact virtual estimators for big network data based on register sharing[C]//Proceedings of the 2015 ACM SIGMETRICS International Confe-rence on Measurement and Modeling of Computer Systems.2015:417-428. |
|