计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 393-402.doi: 10.11896/jsjkx.240900085

• 信息安全 • 上一篇    下一篇

基于自适应采样的超级传播者检测算法

孙靖宇1, 黄河1, 孙玉娥2, 张博宇1   

  1. 1 苏州大学计算机科学与技术学院 江苏 苏州 215006
    2 苏州大学轨道交通学院 江苏 苏州 215131
  • 收稿日期:2024-09-12 修回日期:2024-12-08 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 孙玉娥(sunye12@suda.edu.cn)
  • 作者简介:(20235227128@stu.suda.edu.cn)
  • 基金资助:
    国家自然科学基金(62332013,62072322,U20A20182,62202322)

Super Spreader Detection Algorithm Based on Adaptive Sampling

SUN Jingyu1, HUANG He1, SUN Yu'e2, ZHANG Boyu1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 School of Rail Transportation,Soochow University,Suzhou,Jiangsu 215131,China
  • Received:2024-09-12 Revised:2024-12-08 Online:2025-08-15 Published:2025-08-08
  • About author:SUN Jingyu,born in 2001,postgra-duate.His main research interest is network traffic measurement.
    SUN Yu'e,born in 1983,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.28402M).Her main research interests include traffic measurement,Internet of Things,privacy presrving and crowd sensing.
  • Supported by:
    National Natural Science Foundation of China(62332013,62072322,U20A20182,62202322).

摘要: 在高速网络环境中,超级传播者被界定为那些具有大量连接的主机或设备。准确的超级传播者检测在网络监控、安全分析及流量管理等多种应用中起着至关重要的作用。基于Sketch的可逆算法因具有卓越的内存效率与从内部结构中恢复超级传播者ID的能力,受到了广泛关注。根据应用需要,通常将同一台主机或设备发出或接收的所有数据包抽象为一条流。在高速网络中,流的分布通常高度偏斜,仅有少部分流为大流,绝大多数是小流。然而,现有研究的内存结构设计无法高效地适应高度偏斜的流分布,使得内存资源利用率较为低下。为此,设计了一种基于自适应采样的超级传播者检测算法AS-SSD(Adaptive Sampling Based Super Spreader Detection),该算法通过一种基于寄存器共享的自适应采样策略,弥补了上述不足。AS-SSD首先将到达的流元素映射到一个寄存器数组中,使得小流仅使用少量寄存器,而越大的流使用越多的寄存器,从而适应偏斜的流分布;接着,将所有流的元素映射到一个寄存器数组中,使得小流仅使用少量寄存器,大流使用更多的寄存器,从而适应偏斜的流分布;然后,利用自适应采样策略动态调整不同规模流的元素采样概率,在保证精度的前提下,减少大流对寄存器的占用,进一步提升内存资源的利用效率。实验评估显示,AS-SSD在维持高吞吐量的同时,在超级传播者检测任务中展现出了更高的检测准确度,相比目前最先进的算法,最高可以将F1值提高0.609以上。

关键词: 流量测量, 超级传播者检测, 高速网络, 自适应采样, Sketch

Abstract: In high-speed network environments,super-spreaders are defined as hosts or devices with a large number of connections.Accurate super spreader detection is crucial for many applications such as network monitoring,security analysis,and traffic management.Invertible algorithms based on sketches have received extensive attention due to their excellent memory efficiency and the ability to directly recover the super spreader identity from the internal structure.According to application interests,the packets sent or received from the same host or device are abstracted as a flow.The flow distribution in high-speed network is usually highly skewed,only a few flows are large flows,and the vast majority are small flows.However,the memory structure design of the existing research cannot efficiently adapt to the highly skewed flow distribution,which makes the memory resource utilization low.Therefore,this paper designs an adaptive sampling based super spreader detection algorithm AS-SSD.The algorithm proposes an adaptive sampling strategy based on register sharing,which makes up for the above shortcomings.AS-SSD first maps the elements of all flows into a register array.Small flows only consume a few registers,while larger flows consume more,which adapts to the skewed flow distribution.Then,AS-SSD uses an adaptive sampling strategy to dynamically adjust the element sampling probability of flows of different sizes,which reduces the occupation of registers by large flows and further improves the utilization efficiency of memory resources under the premise of ensuring accuracy.Experimental evaluations show that compared with the previous work,AS-SSD shows higher detection accuracy in the super spreader detection task while maintaining high throughput processing capability.Compared with the most advanced algorithms,it can increase the F1 value by more than 0.609.

Key words: Traffic measurement, Super spreader detection, High-speed network, Adaptive sampling, Sketch

中图分类号: 

  • TP391
[1]SEN S,WANG J.Analyzing peer-to-peer traffic across largenetworks[C]//Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement.2002:137-150.
[2]JING X,YAN Z,PEDRYCZ W.Security data collection and data analytics in the internet:A survey[J].IEEE Communications Surveys & Tutorials,2018,21(1):586-618.
[3]DURUMERIC Z,BAILEY M,HALDERMAN J A.An {Internet-Wide} view of {Internet-Wide} scanning[C]//23rd USENIX Security Symposium(USENIX Security 14).2014:65-78.
[4]SINGH S,ESTAN C,VARGHESE G,et al.Automated Worm Fingerprinting[C]//OSDI.2004.
[5]JuniperNetworks.Qfx5110 ethernet switch datasheet[EB/OL].https://www.juniper.net/us/en/products/switches/qfx-series/.
[6]YOON M K,CHEN S.Detecting stealthy spreaders by random aging streaming filters[J].IEICE Transactions Cn communications,2011,94(8):2274-2281.
[7]WANG H,MA C,ODEGBILE O O,et al.Randomized error removal for online spread estimation in data streaming[C]//Proceedings of the VLDB Endowment.2021.
[8]TANG L,XIAO Y,HUANG Q,et al.A high-performance invertible sketch for network-wide superspreader detection[J].IEEE/ACM Transactions on Networking,2022,31(2):724-737.
[9]BRUSCHI V,PONTARELLI S,TOLLET J,et al.FlowFight:High performance-low memory top-k spreader detection[J].Computer Networks,2021,196:108239.
[10]JING X,YAN Z,HAN H,et al.ExtendedSketch:Fusing network traffic for super host identification with a memory efficient sketch[J].IEEE Transactions on Dependable and Secure Computing,2021,19(6):3913-3924.
[11]HAN H,JING X,YAN Z,et al.ExtendedSketch+:Super host identification and network host trust evaluation with memory efficiency and high accuracy[J].Information Fusion,2023,92:300-312.
[12]JIA P,WANG P,ZHANG Y,et al.Accurately estimating user cardinalities and detecting super spreaders over time[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):92-106.
[13]CAIDA.Anonymized internet traces 2019 [EB/OL].https://catalog.caida.org/dataset/ passive_2019_pcap.
[14]CAO J,JIN Y,CHEN A,et al.Identifying high cardinality internet hosts[C]//IEEE INFOCOM 2009.IEEE,2009:810-818.
[15]KAMIYAMA N,MORI T,KAWAHARA R.Simple and adaptive identification of superspreaders by flow sampling[C]//IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications.IEEE,2007:2481-2485.
[16]SHI X,CHIU D M,LUI J C S.An online framework for cat-ching top spreaders and scanners[J].Computer Networks,2010,54(9):1375-1388.
[17]VENKATARAMAN S,SONG D X,GIBBONS P B,et al.New streaming algorithms for fast detection of superspreaders[C]//NDSS.2005:149-166.
[18]LI Y,MIAO R,KIM C,et al.{FlowRadar}:A better {NetFlow} for data centers[C]//13th USENIX Symposium on Networked Systems Design and Implementation(NSDI 16).2016:311-324.
[19]YANG T,GAO S,SUN Z,et al.Diamond sketch:Accurate per-flow measurement for bigstreaming data[J].IEEE Transactions on Parallel and Distributed Systems,2019,30(12):2650-2662.
[20]HUANG H,SUN Y E,MA C,et al.Spread estimation withnon-duplicate sampling in high-speed networks[J].IEEE/ACM Transactions on Networking,2021,29(5):2073-2086.
[21]DU Y,HUANG H,SUN Y E,et al.Self-adaptive sampling for network traffic measurement[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10.
[22]DU Y,HUANG H,SUN Y E,et al.Short-term memory sampling for spread measurement in high-speed networks[C]//IEEE INFOCOM 2022-IEEE Conference on Computer Communications.IEEE,2022:470-479.
[23]KUMAR K,XU J,WANG J,et al.Spacecode bloom filter for efficient per-flow traffic measurement[C]//Infocom Twenty-Third Joint Conference of the IEEE Computer & Communications Societies.2004.
[24]DING D,SAVI M,PEDERZOLLI F,et al.In-network volumetric DDoS victim identification using programmable commodity switches[J].IEEE Transactions on Network and Service Management,2021,18(2):1191-1202.
[25]CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:the count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58-75.
[26]ESTAN C,VARGHESE G,FISK M.Bitmap algorithms forcounting active flows on high speed links[C]//Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement.2003:153-166.
[27]MA C,CHEN S,ZHANG Y,et al.Super spreader identification using geometric-min filter[J].IEEE/ACM Transactions on Networking,2021,30(1):299-312.
[28]WANG P,GUAN X,QIN T,et al.A data streaming method for monitoring host connection degrees of high-speed links[J].IEEE Transactions on Information Forensics and Security,2011,6(3):1086-1098.
[29]LIU W,QU W,GONG J,et al.Detection of superpoints using a vector bloom filter[J].IEEE Transactions on Information Forensics and Security,2015,11(3):514-527.
[30]LIU Y,CHEN W,GUAN Y.Identifying high-cardinality hosts from network-wide traffic measurements[J].IEEE Transactions on Dependable and Secure Computing,2015,13(5):547-558.
[31]FLAJOLET P,FUSY É,GANDOUET O,et al.Hyperloglog:the analysis of a near-optimal cardinality estimation algorithm[C]//Discrete Mathematics & Theoretical Computer Science.2007.
[32]METWALLY A,AGRAWAL D,EL ABBADI A.Efficient computation of frequent and top-k elements in data streams[C]//International Conference on Database Theory.Berlin:Springer,2005:398-412.
[33]XIAO Q,CHEN S,CHEN M,et al.Hyper-compact virtual estimators for big network data based on register sharing[C]//Proceedings of the 2015 ACM SIGMETRICS International Confe-rence on Measurement and Modeling of Computer Systems.2015:417-428.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!