一种基于Chernoff Bound的数据流上近似频繁项集的挖掘方法

计算机科学 ›› 2011, Vol. 38 ›› Issue (5): 164-168.

一种基于Chernoff Bound的数据流上近似频繁项集的挖掘方法

李海峰,章宁

(中央财经大学信息学院北京100081)

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受中央财经大学“211工程”三期资助。

Chernoff Bound Based Approximate Frequent Itemset Mining Method over Streams

LI Hai-feng,ZHANG Ning

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 数据流高速、无限和动态的特点决定了必须在有限的内存中以尽快的计算速度完成流数据上的频繁项集挖掘。将数据流中的数据按照段进行划分，采用二元组列表的数据结构进行保存，提出了一种基于滑动窗口的近似频繁项集挖掘方法AFIoDS，以实时获取频繁项集集合的真子集，并引入了概率参数，利用Chernoff Bound来动态改变支持度的近似值，保证真子集中的频繁项集被限制在一定的误差范围之内。此外，为了进一步节省内存，AFIoDS采用闭合项集的形式压缩每个段中获取的频繁项集。通过在3种真实数据集上的实验表明，AFIoDS算法与现有算法相比，在精度没有下降的情况下，具有更快的处理速度，同时其存储开销大大降低。

关键词: Chernoff Bound，数据流，频繁项集

Abstract: A data stream is fast, unlimited and dynamic, these characteristics constraint the computational resources and storages when mining frectuent itemsets. This paper addressed this problem and proposed a simple and effective algorithm AFIoDS, AFIoDS is an approximate algorithm based on sliding window model, which splits stream data into batches and maintains them with 2-tuple lists;thus,a false negative result can be obtained using a probabilistic parameterbased on chernoff bound. The approximation will be changed dynamically to guarantee the mining frequent itemsets are error controllable. Plus, a compression of frequent itemsets, the closed frequent itemsets, arc employed to represent the results of each batch for further memory saving. Our experimental results on 3 real world data show that without precision reduction, AFIoDS achieves a faster speed and a much reduced memory cost in comparison with the statcof-thcart algorithms.

Key words: Chernoff bound, Data stream, Frequent itemset

李海峰,章宁. 一种基于Chernoff Bound的数据流上近似频繁项集的挖掘方法[J]. 计算机科学, 2011, 38(5): 164-168. https://doi.org/

LI Hai-feng,ZHANG Ning. Chernoff Bound Based Approximate Frequent Itemset Mining Method over Streams[J]. Computer Science, 2011, 38(5): 164-168. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed