计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 173-178.doi: 10.11896/j.issn.1002-137X.2016.12.031

• 数据挖掘 • 上一篇    下一篇

基于Kappa系数的数据流分类算法

徐树良,王俊红   

  1. 山西大学计算机与信息技术学院 太原030006,山西大学计算机与信息技术学院 太原030006;计算智能与中文信息处理教育部重点实验室 太原030006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61202018)资助

Data Stream Classification Algorithm Based on Kappa Coefficient

XU Shu-liang and WANG Jun-hong   

  • Online:2018-12-01 Published:2018-12-01

摘要: 数据流挖掘已经成为数据挖掘领域一个热门的研究方向,由于数据流中概念漂移现象的存在,使得传统的分类算法无法直接应用于数据流中。为了能有效地应对数据流中的概念漂移,提出了一种基于Kappa系数的数据流分类算法。该算法采用集成式分类技术,以Kappa系数度量系统的分类性能,根据Kappa系数来动态地调整分类器,当发生概念漂移时,系统能利用已有的知识很快删除不符合要求的分类器来适应新概念。实验结果表明,相对于实验中参与比较的BWE,AE和AWE算法,该算法不但具有较好的分类性能,而且在一定程度上能较为有效地降低时间开销。

关键词: 数据流,概念漂移,分类,Kappa系数

Abstract: Data streams mining has become one of hot topics in the area of data mining.Because of the existence of concept drift,it is impossible for conventional classification algorithms to be directly applied in data streams environment.In order to deal with the concept changes in data streams,an algorithm based on Kappa coefficient was proposed.The approach uses ensemble classification techniques and a weighted voting strategy to decide the labels of test sets,in addition,the approach employs Kappa coefficient to measure the performance of classification system.When the performance of classifiers decreases significantly,an alarm about concept drift will be made and the algorithm will apply prior know-ledge to delete inaccurate classifiers to adapt to new concept.The experimental results shows that,comparing with the contrast algorithms in the experiments:BWE,AE and AWE,the new approach can not only possess better performance for classification,but also efficiently decrease time cost.

Key words: Data streams,Concept drift,Classification,Kappa coefficient

[1] Bifet A,Holmes G,Pfahringer B.Leveraging Bagging for Evolving Data Streams [M]∥ Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2010:135-150
[2] Lemaire V,Salperwyck C,Bondu A.A survey on SupervisedClassification on Data Streams [J].Business Intelligence,2015,205:88-125
[3] Sharker A,Hullermeier E.IBLStreams:a system for instance-based classification and regression on data streams [J].Evolving System,2012,3(4):235-249
[4] Gama J,Kosina P.Recurrent concepts in data streams classification [J].Knowledge and Information Systems,2014,40(3):489-507
[5] Bifet A,Gavalda R.Adaptive Learning from Evolving Data Str-eams [C]∥Proceedings of 8th International Symposium on Intelligent Data Analysis.Heidelberg:Springer,2009:249-260
[6] Wu Xin-dong,Li Pei-pei,Hu Xue-gang.Learning from conceptdrifting data streams with unlabeled data [J].Neurocomputing,2012,92(3):145-155
[7] Gama J,Sebastiao R,Rodrigues P P.On evaluating streamlearning algorithms [J].Machine Learning,2013,90:317-346
[8] Brzezinski D,Stefanowski J.Prequential AUC for Classifier E-valuation and Drift Detection in Evolving Data Streams [C]∥Third International Workshop NFMCP 2014 Held in Conjunction with ECML(PKDD 2014).Heidelberg:Springer,2015:87-101
[9] Rutkowski L,Pietruczuk L,Duda P,et al.Decision trees formining data streams based on the McDiarmid’s bound [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(6):1272-1279
[10] Domingos P,Hulten G.Mining High- Speed Data Streams [C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.New York:ACM,2000:71-80
[11] Magdalena.Batch Weighted Ensemble for Mining Data Streams with Concept Drift [C]∥9th International Symposium(ISMIS 2011).Heidelberg:Springer,2011:290-299
[12] Zhang Peng,Zhu Xing-quan,Shi Yong,et al.An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise [C]∥13th Pacific-Asia Conference,PAKDD 2009.Heidelberg:Springer,2009:1021-1029
[13] Wang Hai-xun,Fan Wei,Yu P S,et al.Mining Concept-Drifting Data Streams Using Ensemble Classifiers [C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:ACM,2003:226-235
[14] Wozniak M,Kasprzak A,Cal P.Weighted Aging Classifier Ensemble for the Incremental Drifted Data Streams [C]∥10th International Conference,FQAS 2013.Heidelberg:Springer,2013:579-588
[15] Zhou Ji-xiang,Mao Shi-song.Statistical Methods for QualityManagement [M].Beijing:China Statics Press,2008:433-440(in Chinese) 周纪芗,茆诗松.质量管理统计方法[M].北京:中国统计出版社,2008:433-440
[16] Wang Tao,Liu Ming-ju,Li De-ming.A Sstrong Chernoff Bounds Derived from Equitable Colorings of Graphs[J].Journal of Mathematics,2014,34(6):1015-1024
[17] Zliobaite I,Bifet A,Read J,et al.Evaluation methods and decision theory for classification of streaming data with temporal dependence [J].Machine Learning,2015,98(3):455-482
[18] Zhang Chen-guang,Zhang Yan.Semi-Supervised Learning [M].Beijing:China Agriculture Scientech Press,2013:31-33(in Chinese) 张晨光,张燕.半监督学习[M].北京:中国农业科学技术出版社,2013:31-33
[19] Li Pei-pei,Wu Xin-dong,Hu Xue-gang,et al.Learning concept-drifting data streams with random ensemble decision trees [J].Neurocomputing,2015,166(c):68-83
[20] Bofet A,Holmes G,Kirkby R,et al.MOA:Massive Online Analysis [J].The Journal of Machine Learning Research,2010,11(2):1601-1604

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!