计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 4-10.doi: 10.11896/jsjkx.231000226
陈昕杨1,2, 陈翰泽1, 周嘉晟1, 黄家卿1, 余佳硕1, 朱龙隆1,2, 张栋2,3
CHEN Xinyang1,2, CHEN Hanze1, ZHOU Jiasheng1, HUANG Jiaqing1, YU Jiashuo1, ZHU Longlong1,2, ZHANG Dong2,3
摘要: 流式数据库在数据库中的占比逐渐增加,在流式数据库的数据流中提取所需信息是一项重要任务。文中研究了数据流的间隔项,并将其应用到了网络场景中。其中间隔项指在数据流中以固定时间间隔到达的元素对,这是第一项在数据流中定义和统计间隔项的工作。为了高效统计间隔项的top-K,提出了IntervalSketch。IntervalSketch首先基于模拟退火对数据流分块以加快统计速度,其次利用Sketch进行间隔项的存储,最后通过特征分组存储策略降低Sketch存储间隔项的空间开销,提升了统计间隔项的精度。IntervalSketch在两个真实数据集上进行了大量对比实验,实验结果表明,在同样内存的情况下,IntervalSketch明显优于基线方案,其中处理时间为基线方案的1/3~1/2,平均绝对误差、平均相对误差约为基线方案的1/3。
中图分类号:
[1]CORMODE G,MUTHUKRISHNAN S.An improved datastream summary:The count-min sketch and its applications[C]//LATIN 2004:Theoretical Informatics:6th Latin American Symposium.Berlin Heidelberg:Springer,2004:29-38. [2]YANG T,JIANG J,LIU P,et al.Elastic sketch:Adaptive and fast network-wide measurements[C]//Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication.2018:561-575. [3]TANG L,HUANG Q,LEE P P C.Mv-sketch:A fast and compact invertible sketch for heavy flow detection in network data streams[C]//IEEE Conference on Computer Communications(IEEE 2019).2019:2026-2034. [4]FAN Z,ZHANG Y,YANG T,et al.Periodicsketch:Finding periodic items in data streams[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).IEEE,2022:96-109. [5]LIU Z,KONG C,YANG K,et al.Hypercalm sketch:One-pass mining periodic batches in data streams[C]//2023 IEEE 39th International Conference on Data Engineering(ICDE).IEEE.2023. [6]FAN Z,GUO J,LI X,et al.Finding Simplex Items in DataStreams[C]//2023 IEEE 39th International Conference on Data Engineering(ICDE).IEEE,2023:1953-1966. [7]LI X,FAN Z,LI H,et al.SteadySketch:Finding Steady Flows in Data Streams[C]//2023 IEEE/ACM 31st International Symposium on Quality of Service(IWQoS).IEEE,2023:1-9. [8]ZHONG Z,YAN S,LI Z,et al.Burstsketch:Finding bursts in data streams[C]//Proceedings of the 2021 International Confe-rence on Management of Data.2021:2375-2383. [9]FAN Z,HU Z,WU Y,et al.Pisketch:finding persistent and infrequent flows[C]//Proceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures.2022:8-14. [10]WANG J,ZHANG Y.Opportunity model for e-commerce rec-ommendation:right product;right time[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval.2013:303-312. [11]JAIN V R,BAGREE R,KUMAR A,et al.wildCENSE:GPS based animal tracking system[C]//2008 International Confe-rence on Intelligent Sensors Networks and Information Proces-sing.IEEE,2008:617-622. [12]ROYH A,ZENG J,BAGGAG P,et al.C.Snoeren“Inside the social network's(datacenter) network[C]//Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication.2015:23-137. [13]TAN L,LYANG F,YI J K.Optimisation of Sketch algorithm based on AVX instruction set[J].Computer Science,2021,48(11A):585-587. [14]AGRAWALR,SRIKANT R.Fast algorithms for mining asso-ciation rules[C]//Proceedings 20th International Conference.Very Large Data Bases VLDB,1994,1215:487-499. [15]YUAN Q,SHANG J,CAO X,et al.Detecting multiple periods and periodic patterns in event time sequences[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:617-626. [16]CHANG J H,PAR K N H.A novel weighting technique formining sequence data streams[C]//IT Convergence and Security 2012.Springer Netherlands,2013:929-936.. [17]CHEN Y L,CHIANG M C,KOM T.Discovering time-interval sequential patterns in sequence databases[J].Expert Systems with Applications,2003,25(3):343-354. [18]AGRAWAL R,IMIELIŃSKI T,SWAMI A.Mining asso-ciation rules between sets of items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.1993:207-216. [19]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[J].ACM Sigmod Record,2000,29(2):1-12. [20]Anonymized Internet Traces 2016[OL].https://catalog.caida.org/dataset/passive_2016_pcap. [21]Taobao user behavior data set[OL].ttps://tianchi.aliyun.com/dataset/649. |
|