计算机科学 ›› 2010, Vol. 37 ›› Issue (5): 157-162.

• 数据库与数据挖掘 • 上一篇    下一篇

基于衰减模型的混合属性数据流离群检测

苏晓坷,兰洋,秦玉明,程耀东   

  1. (东华大学信息科学与技术学院 上海201620);(信阳师范学院计算机与信息技术学院 信阳464000);(中国科学院高能物理研究所计算中心 北京100049)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家863高技术研究发展计划(2006AA01A120),国家自然科学基金(10871040)资助。

Outlier Detection Based on the Damped Model in Mixed Data Streams

SIJ Xiao-ke,LAN Yang,QIN Yu-ming,CHENG Yao-dong   

  • Online:2018-12-01 Published:2018-12-01

摘要: 数据流离群检测因内存容量限制和实时检测需求而成为离群检测的一个难点。介绍了一种快速混合属性数据流离群检测算法。在衰减模型下增量聚类数据流,生成代表数据分布的聚类特征集合,半径值动态变化;当接收到检测请求时,计算满足条件的每个簇的离群因子,具有高离群因子的簇作为结果输出。同时提出了一种可有效区分离群簇与数据进化初始阶段的方法。算法的时间与空间复杂度同数据流规模近似成线性关系,在真实数据集上的实验结果显示,该算法可有效检测混合属性数据流中的离群点。

关键词: 混合属性,数据流,增量聚类,离群检测,衰减模型

Abstract: Outlier detection in data streams poses great challenges due to the limited memory availability and real time detection rectuirement. A fast outlier detection algorithm in mixed data streams was introduced by clustering the data streams incrementally based on the damped model and generating the cluster features on behalf of the data distribution.The radius threshold value changed dynamically. When detection requirement was received the outlier factor of specified clusters was calculated and the clusters with high outlier factor were taken as the abnormal clusters. At the same time the method is proposed to distinguish between the abnormal cluster and the initial stage of data evolution. The complexity of the time and space were nearly linear with the size of data streams. The experimental results on the KDDCUP99 dataset demonstrate that the method can effectively detect the outliers in mixed data streams.

Key words: Mixed attribute, Data streams, Incremental clustering, Outlier detection, Damped model

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!