计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 28-38.doi: 10.11896/jsjkx.231000119

• 紧凑数据结构 • 上一篇    下一篇

分布式网络中连续时间周期的全局top-K频繁流测量

毛晨宇1, 黄河1, 孙玉娥2, 杜扬1   

  1. 1 苏州大学计算机科学与技术学院 江苏 苏州215006
    2 苏州大学轨道交通学院 江苏 苏州215131
  • 收稿日期:2023-10-18 修回日期:2023-11-27 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 孙玉娥(sunye12@suda.edu.cn)
  • 作者简介:(20225227129@stu.suda.edu.cn)
  • 基金资助:
    国家自然科学基金(62332013,62072322,U20A20182,62202322)

Global Top-K Frequent Flow Measurement for Continuous Periods in Distributed Networks

MAO Chenyu1, HUANG He1, SUN Yu'e2, DU Yang1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 School of Rail Transportation,Soochow University,Suzhou,Jiangsu 215131,China
  • Received:2023-10-18 Revised:2023-11-27 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62332013,62072322,U20A20182,62202322).

摘要: 在分布式网络中,测量top-K频繁流对资源分配、安全监控等应用至关重要。现有的top-K频繁流测量工作存在不适用于测量分布式网络流量或只考虑单时间周期等局限。为此,提出了分布式网络中连续时间周期的全局top-K频繁流测量方案,在分布节点中布置了紧凑的概率数据结构来记录网络流信息,每个时间周期结束后分布节点向中心节点发送必要信息,中心节点汇聚得到从测量开始至当前时间周期的全局top-K频繁流。考虑到每条流可能出现在一个或多个测量节点,使用了不同的方法来减少传输开销。对于每条流只会出现在单一节点的情况,采用传输分段最小值的方法来获得阈值,实验结果表明这种方法减少了全量传输超过50%的传输开销。对于每条流会出现在多个节点的情况,提出了多阶段无误差处理方法和单阶段快速处理方法,分别应对不能容忍误差的场景和实际高速网络流量,相比每个时间周期都使用已有单周期方法,传输开销的实验表现降低了两个数量级。最后还提出了一种利用历史平均增值信息降低通信延迟的方法,实验结果表明该方法有效降低了限制信息的平均相对误差。

关键词: 流量测量, top-K频繁流, 分布式网络, 连续时间周期, sketch

Abstract: In distributed networks,the measurment of the top-K frequent flows is crucial for applications like resource allocation and security monitoring.Existing works on top-K frequent flow measurement have limitations such as being unsuitable for distributed network traffic measurement or only considering single time periods.To address these problems,this paper proposes a scheme for measuring global top-K frequent flows over continuous time periods in distributed networks.This involves deploying compact probabilistic data structures at distributed nodes to record network flow information.At the end of each time period,distributed nodes send necessary information to a central node,which aggregates this to identify the global top-K frequent flows from the start of measurement to the current time period.Considering that each flow may appear at one or multiple measurement nodes,different methods are used to reduce transmission overhead.For flows appearing at a single node,a method of transmitting segmented minimum values is used to obtain a threshold.Experiments show that this method reduces the transmission overhead by over 50% compared to full transmission.For flows appearing at multiple nodes,a multi-stage error-free processing method and a single-stage fast processing method are proposed,catering to scenarios that cannot tolerate errors and actual high-speed network traffic,respectively.Compared to using existing single-period methods in each time period,experimental performance of transmission overhead reduced by two orders of magnitude.Finally,a method using historical average increment information to reduce communication delay is also proposed,and experimental results show that it effectively reduces the average relative error of constraint information.

Key words: Traffic measurement, top-K frequent flows, Distributed network, Continuous time period, sketch

中图分类号: 

  • TP391
[1]SHARMA H,THAKKAR P,BHARADWAJ S,et al.Optimi-zing Network Provisioning through Cooperation[C]//19th USENIX Symposium on Networked Systems Design and Implementation(NSDI 22).Renton:USENIX Association,2022:1179-1194.
[2]HUANG H,SUN Y E,MA C Y,et al.Spread estimation with non-duplicate sampling in high-speed networks[J].IEEE/ACM Transactions on Networking,2021,29(5):2073-2086.
[3]CAIDA.Anonymized internet traces 2019 [EB/OL].(2019-01)[2023-10-08].https://catalog.caida.org/details/dataset/passive_2019_pcap.
[4]ZHAO Y K,HAN W C,ZHONG Z,et al.Double-Anonymous Sketch:Achieving Top-K-fairness for Finding Global Top-K Frequent Items[J].Proceedings of the ACM on Management of Data,2023,1(1):1-26.
[5]SONG W,BESHLEY M,PRZYSTUPA K,et al.A softwaredeep packet inspection system for network traffic analysis and anomaly detection[J].Sensors,2020,20(6):1637.
[6]JIN K,XIE K,TIAN J Z,et al.Low cost network traffic mea-surement and fast recovery via redundant row subspace-based matrix completion[J].Connection Science,2023,35(1):2218069.
[7]YANG T,ZHANG H W,LI J Y,et al.HeavyKeeper:an accurate algorithm for finding Top-k elephant flows[J].IEEE/ACM Transactions on Networking,2019,27(5):1845-1858.
[8]YANG T,JIANG J,LIU P,et al.Elastic sketch:Adaptive and fast network-wide measurements[C]//Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.New York:Association for Computing Machinery,2018:561-575.
[9]LIU L T,SHEN Y L,YAN Y B,et al.Sf-sketch:a two-stage sketch for data streams[J].IEEE Transactions on Parallel and Distributed Systems,2020,31(10):2263-2276.
[10]DAI H P,LI M,LIU A,et al.Finding persistent items in distri-buted datasets[J].IEEE/ACM Transactions on Networking,2019,28(1):1-14.
[11]LI M,DAI H P,WANG X Y,et al.Thresholded monitoring in distributed data streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046.
[12]BASAT R,EINZIGER G,TAYH B.Cooperative network-wide flow selection[C]//2020 IEEE 28th International Conference on Network Protocols(ICNP).Madrid:IEEE Press,2020:1-11.
[13]CAO P,WANG Z.Efficient top-k query calculation in distributed networks[C]//Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing.New York:Association for Computing Machinery,2004:206-215.
[14]MICHEL S,TRIANTAFILLOU P,WEIKUM G.Klee:Aframework for distributed top-k query algorithms[C]//Procee-dings of the 31st International Conference on Very Large Data Bases.New York:ACM,2005:637-648.
[15]ANCEAUME E,BUSNEL Y,RIVETTI N,et al.Identifying global icebergs in distributed streams[C]//2015 IEEE 34th Symposium on Reliable Distributed Systems(SRDS).Montreal:IEEE Press,2015:266-275.
[16]TARKOMA S,ROTHENBERG C,LAGERSPETZ E.Theory and practice of bloom filters for distributed systems[J].IEEE Communications Surveys & Tutorials,2011,14(1):131-155.
[17]TING D.Data sketches for disaggregated subset sum and frequent item estimation[C]//Proceedings of the 2018 Interna-tional Conference on Management of Data.New York:Association for Computing Machinery,2018:1129-1140.
[18]LI J Z,LI Z K,XU Y F,et al.Wavingsketch:An unbiased and generic sketch for finding top-k items in data streams[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:Association for Computing Machinery,2020:1574-1584.
[19]CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:the count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58-75.
[20]ESTAN C,VARGHESE G.New directions in traffic measurement and accounting[C]//Proceedings of the 2002 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.New York:New Directions in Traffic Measurement and Accounting,2002:323-336.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!