在分布式数据流中查找近期频繁项方法的研究

计算机科学 ›› 2008, Vol. 35 ›› Issue (3): 206-208.

在分布式数据流中查找近期频繁项方法的研究

出版日期:2018-11-16 发布日期:2018-11-16

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 传统的分布式数据流挖掘模型是一种挖掘结果中逐层进行的层次模型，通信带宽是一个瓶颈。为了减少分布式数据流结点的通信，本文采用一种基于数据密度的偏倚抽样方法对分布式数据流组中的每个流进行抽样，只维护抽样数据中最近期的元素。在频繁项挖掘过程中，设计了一种哈希计数方法（不同于传统哈希计数算法），可以同时对数据的计数进行增加和删减，计数的值是有一定误差保证的近似值，算法称为FFIDDS算法。实验结果证明，通信负担和处理时间均明显比传统HCS模型的算法优秀。

关键词: 分布式数据流频繁项算法

Abstract: Traditional method of mining frequent elements in distributed data stream tends to result in excessively communication within layers , and bandwidth is bottleneck. To minimize communication requirements, we propose a method of sampling from distributed da

Key words: Distributed data stream, Frequent items, Algorithm

. 在分布式数据流中查找近期频繁项方法的研究[J]. 计算机科学, 2008, 35(3): 206-208. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed