计算机科学 ›› 2009, Vol. 36 ›› Issue (11): 148-151.

• 软件工程与数据库技术 • 上一篇    下一篇

基于摘要技术的混合模型流数据聚类算法

刘建伟,李卫民   

  1. (中国石油大学(北京)自动化研究所 北京102249);(上海大学计算机工程与科学学院 上海200072)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Synopsis Data Structure Based Mixture Probabilistic Density Data Stream Clustering Approach

LIU Jian-wei, LI Wei-ming   

  • Online:2018-11-16 Published:2018-11-16

摘要: 传统的数据库管理系统和数据查询算法不能很好地支持对流数据的查询已经被广泛认识,因而需要研究新的流数据模式查询算法。提出了一种基于摘要技术的在线快速混合模型流数据聚类算法,该算法为分阶段混合模型 聚类过程。算法首先对最初到达的流数据用多维网格结构进行划分,对划分形成的每一个单元进行数据摘要,提取足够的统计信息。对该摘要运行基于模型的贪心聚类算法,聚类形成的混合模型的摘要信息存储在永久摘要数据库中,从而形成初始聚类混合模型;在聚类模型的维持过程中,当不断有流数据到达时,对到达的数据块用多维网格结构进行划分,对划分形成的每一个单元提取足够的摘要信息。对该摘要运行基于模型的贪心聚类算法形成聚类混合模型。在判断是否可以把新到达的模型合并到现有的混合模型中去时,提出了三种合并标准。实验表明,该算法减少了分类误差,其速度也比传统的基于模型的贪心聚类算法大大加快。

关键词: 流数据,混合模型,聚类,模式

Abstract: Many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DI3MSs and data mining in supporting streaming applications have been recognized,prompting research to augment existing technologies and build new systems to manage streaming data and propose new algorithm for mining data stream A synopsis data structure based mixture probabilistic density data stream clustering approach was proposed,which rectuires only the newly arrived data,not the entire historical data,to be saved in memory. This approach incrementally updates the density estimate taking only the newly arrived data and the previously estimated density. I}his method uses three distance metric criteria for judging if merging new arriving component into a component of existing Gaussian mixture model or as a new model is added existing Uaussian mixture model. The experimental results have demonstrated that the algorithm is feasible and fulfill high quality clustering results.

Key words: Data stream,Mixture model clustering,Patterns

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!