计算机科学 ›› 2012, Vol. 39 ›› Issue (1): 152-155.

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于混合集成方法的数据流概念漂移检测方法

桂林 张玉红 胡学钢   

  1. (合肥工业大学计算机与信息学院 合肥230009)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Data Stream Concept Drift Detection Method Based on Mixture Ensemble Method

  • Online:2018-11-16 Published:2018-11-16

摘要: 近年来,数据流分类问题研究受到了普遍关注,而漂移检测是其中一个重要的研究问题。已有的分类模型有单一集成模型和混合模型,其漂移检测机制多基于理想的分布假设。单一模型集成可能导致分类误差扩大,噪音环境下分类效果受到了一定影响,而混合集成模型多存在分类精度和时间性能难以两者兼顾的问题。为此,基于简单的WE集成框架,构建了基于决策树和bayes混合模型的集成分类方法WE-DTB,并利用典型的概念漂移检测机制Hoeffding Bounds和μ检验来进行数据流环境下概念漂移的检测和分类。大量实验表明,WE-DTB能够有效检测概念漂移且具有较好的分类精度及时空性能。

关键词: 数据流,概念漂移,分类,噪音

Abstract: Mining with data stream concept drift is a hot topic in data mining. Existing classification approaches consist of ensemble method based on single base classifiers and ensemble method based on hybrid base classifiers,which depend on the stationary assumption and lcarnable assumption. However, the former probably causes the larger classification deviation and the performance on accuracy is impacted in the noisy data streams,while the latter performs worse on the classification accuracy or the time consumption. Motivated by this, an ensembling classification method WE-DTB was proposed, based on hybrid based models with decision trees and Naive Bayes. It is an extended framework of WE model.Meanwhile, we utilized the popular concept drift detection mechanisms based on Hocffding Bounds and μ test to implement the detection on concept drifts. Extensive experiments demonstrate that our proposed method WE-DTB can detect concept drift effectively while maintaining the good performance on classification accuracy and consumptions on time and space.

Key words: Data streams, Concept drifts, Classification, Noise

[1] 李燕,张玉红,胡学钢.
基于C4. 5和NB混合模型的数据流分类算法
Classification Algorithm for Data Stream Based on Mixture Models of C4. 5 and NB
计算机科学, 2010, 37(12): 138-142.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!