Computer Science ›› 2024, Vol. 51 ›› Issue (4): 28-38.doi: 10.11896/jsjkx.231000119

• Compact Data Structure • Previous Articles     Next Articles

Global Top-K Frequent Flow Measurement for Continuous Periods in Distributed Networks

MAO Chenyu1, HUANG He1, SUN Yu'e2, DU Yang1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China
    2 School of Rail Transportation,Soochow University,Suzhou,Jiangsu 215131,China
  • Received:2023-10-18 Revised:2023-11-27 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundation of China(62332013,62072322,U20A20182,62202322).

Abstract: In distributed networks,the measurment of the top-K frequent flows is crucial for applications like resource allocation and security monitoring.Existing works on top-K frequent flow measurement have limitations such as being unsuitable for distributed network traffic measurement or only considering single time periods.To address these problems,this paper proposes a scheme for measuring global top-K frequent flows over continuous time periods in distributed networks.This involves deploying compact probabilistic data structures at distributed nodes to record network flow information.At the end of each time period,distributed nodes send necessary information to a central node,which aggregates this to identify the global top-K frequent flows from the start of measurement to the current time period.Considering that each flow may appear at one or multiple measurement nodes,different methods are used to reduce transmission overhead.For flows appearing at a single node,a method of transmitting segmented minimum values is used to obtain a threshold.Experiments show that this method reduces the transmission overhead by over 50% compared to full transmission.For flows appearing at multiple nodes,a multi-stage error-free processing method and a single-stage fast processing method are proposed,catering to scenarios that cannot tolerate errors and actual high-speed network traffic,respectively.Compared to using existing single-period methods in each time period,experimental performance of transmission overhead reduced by two orders of magnitude.Finally,a method using historical average increment information to reduce communication delay is also proposed,and experimental results show that it effectively reduces the average relative error of constraint information.

Key words: Traffic measurement, top-K frequent flows, Distributed network, Continuous time period, sketch

CLC Number: 

  • TP391
[1]SHARMA H,THAKKAR P,BHARADWAJ S,et al.Optimi-zing Network Provisioning through Cooperation[C]//19th USENIX Symposium on Networked Systems Design and Implementation(NSDI 22).Renton:USENIX Association,2022:1179-1194.
[2]HUANG H,SUN Y E,MA C Y,et al.Spread estimation with non-duplicate sampling in high-speed networks[J].IEEE/ACM Transactions on Networking,2021,29(5):2073-2086.
[3]CAIDA.Anonymized internet traces 2019 [EB/OL].(2019-01)[2023-10-08].https://catalog.caida.org/details/dataset/passive_2019_pcap.
[4]ZHAO Y K,HAN W C,ZHONG Z,et al.Double-Anonymous Sketch:Achieving Top-K-fairness for Finding Global Top-K Frequent Items[J].Proceedings of the ACM on Management of Data,2023,1(1):1-26.
[5]SONG W,BESHLEY M,PRZYSTUPA K,et al.A softwaredeep packet inspection system for network traffic analysis and anomaly detection[J].Sensors,2020,20(6):1637.
[6]JIN K,XIE K,TIAN J Z,et al.Low cost network traffic mea-surement and fast recovery via redundant row subspace-based matrix completion[J].Connection Science,2023,35(1):2218069.
[7]YANG T,ZHANG H W,LI J Y,et al.HeavyKeeper:an accurate algorithm for finding Top-k elephant flows[J].IEEE/ACM Transactions on Networking,2019,27(5):1845-1858.
[8]YANG T,JIANG J,LIU P,et al.Elastic sketch:Adaptive and fast network-wide measurements[C]//Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.New York:Association for Computing Machinery,2018:561-575.
[9]LIU L T,SHEN Y L,YAN Y B,et al.Sf-sketch:a two-stage sketch for data streams[J].IEEE Transactions on Parallel and Distributed Systems,2020,31(10):2263-2276.
[10]DAI H P,LI M,LIU A,et al.Finding persistent items in distri-buted datasets[J].IEEE/ACM Transactions on Networking,2019,28(1):1-14.
[11]LI M,DAI H P,WANG X Y,et al.Thresholded monitoring in distributed data streams[J].IEEE/ACM Transactions on Networking,2020,28(3):1033-1046.
[12]BASAT R,EINZIGER G,TAYH B.Cooperative network-wide flow selection[C]//2020 IEEE 28th International Conference on Network Protocols(ICNP).Madrid:IEEE Press,2020:1-11.
[13]CAO P,WANG Z.Efficient top-k query calculation in distributed networks[C]//Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing.New York:Association for Computing Machinery,2004:206-215.
[14]MICHEL S,TRIANTAFILLOU P,WEIKUM G.Klee:Aframework for distributed top-k query algorithms[C]//Procee-dings of the 31st International Conference on Very Large Data Bases.New York:ACM,2005:637-648.
[15]ANCEAUME E,BUSNEL Y,RIVETTI N,et al.Identifying global icebergs in distributed streams[C]//2015 IEEE 34th Symposium on Reliable Distributed Systems(SRDS).Montreal:IEEE Press,2015:266-275.
[16]TARKOMA S,ROTHENBERG C,LAGERSPETZ E.Theory and practice of bloom filters for distributed systems[J].IEEE Communications Surveys & Tutorials,2011,14(1):131-155.
[17]TING D.Data sketches for disaggregated subset sum and frequent item estimation[C]//Proceedings of the 2018 Interna-tional Conference on Management of Data.New York:Association for Computing Machinery,2018:1129-1140.
[18]LI J Z,LI Z K,XU Y F,et al.Wavingsketch:An unbiased and generic sketch for finding top-k items in data streams[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:Association for Computing Machinery,2020:1574-1584.
[19]CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:the count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58-75.
[20]ESTAN C,VARGHESE G.New directions in traffic measurement and accounting[C]//Proceedings of the 2002 Conference on Applications,Technologies,Architectures,and Protocols for Computer Communications.New York:New Directions in Traffic Measurement and Accounting,2002:323-336.
[1] CHEN Xinyang, CHEN Hanze, ZHOU Jiasheng, HUANG Jiaqing, YU Jiashuo, ZHU Longlong, ZHANG Dong. IntervalSketch:Approximate Statistical Method for Interval Items in Data Stream [J]. Computer Science, 2024, 51(4): 4-10.
[2] WU Yanni, ZHOU Zhengyan, CHEN Hanze, ZHANG Dong. RBFRadar:Detecting Remarkable Burst Flows with Programmable Data Plane [J]. Computer Science, 2024, 51(4): 48-55.
[3] DOU Zhi, WANG Ning, WANG Shi-jie, WANG Zhi-hui, LI Hao-jie. Sketch Colorization Method with Drawing Prior [J]. Computer Science, 2022, 49(4): 195-202.
[4] TAN Ling-ling, YANG Fei, YI Jun-kai. Optimization Study of Sketch Algorithm Based on AVX Instruction Set [J]. Computer Science, 2021, 48(11A): 585-587.
[5] LI Zong-min, LI Si-yuan, LIU Yu-jie, LI Hua. Sketch-based Image Retrieval Based on Attention Model [J]. Computer Science, 2020, 47(11): 199-204.
[6] YANG Li-peng, ZHANG Yang-sen, ZHANG Wen, WANG Jian, ZENG Jian-rong. Web Log Analysis Method Based on Storm Real-time Streaming Computing Framework [J]. Computer Science, 2019, 46(9): 176-183.
[7] LI De-quan, DONG Qiao, ZHOU Yue-jin. Distributed Online Conditional Gradient Optimization Algorithm [J]. Computer Science, 2019, 46(3): 332-337.
[8] YU Mei-yu, WU Hao, GUO Xiao-yan, JIA Qi GUO He. Sequential Feature Based Sketch Recognition [J]. Computer Science, 2018, 45(11A): 198-202.
[9] JI Hai-feng and TIAN Huai-wen. Sketch Recognition Method of Combined Graphs for Conceptual Design [J]. Computer Science, 2016, 43(Z6): 134-138.
[10] JIA Jian-wei and CHEN Ling. Set Similarity Approximation Algorithm Based on Parity of Data Sketch [J]. Computer Science, 2016, 43(6): 254-256.
[11] CHEN Quan, SHI Da-peng, FENG Gui-huan, ZHAO Xiao-yan and LUO Bin. On-line Handwritten Flowchart Recognition Based on Grammar Description Language [J]. Computer Science, 2015, 42(Z11): 113-118.
[12] WANG Guan-feng,WANG Shu-xia,YU Sui-huai and GAO Man-tun. Extraction and Correction of Feature Parameter for Online Freehand Conic Section [J]. Computer Science, 2014, 41(1): 297-299.
[13] DU Cui-lan,TAN Jian-long,WANG Xiao-yan,ZHANG Yu,LIU Ping and FAN Dong-jin. Fault Location Technology Based on the Distributed Event Processing System [J]. Computer Science, 2013, 40(Z6): 302-306.
[14] WU Xiao-liang and TIAN Huai-wen. Method of 3D Reconstruction Based on Given Axonometric Sketching [J]. Computer Science, 2013, 40(9): 275-278.
[15] ZHANG Jin,ZHAO Wen-dong,PENG Lai-xian and WU Ze-min. Per-Flow Traffic Measurement Algorithm with Restrained Relative Error [J]. Computer Science, 2013, 40(6): 80-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!