Computer Science ›› 2016, Vol. 43 ›› Issue (12): 173-178.doi: 10.11896/j.issn.1002-137X.2016.12.031

Previous Articles     Next Articles

Data Stream Classification Algorithm Based on Kappa Coefficient

XU Shu-liang and WANG Jun-hong   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Data streams mining has become one of hot topics in the area of data mining.Because of the existence of concept drift,it is impossible for conventional classification algorithms to be directly applied in data streams environment.In order to deal with the concept changes in data streams,an algorithm based on Kappa coefficient was proposed.The approach uses ensemble classification techniques and a weighted voting strategy to decide the labels of test sets,in addition,the approach employs Kappa coefficient to measure the performance of classification system.When the performance of classifiers decreases significantly,an alarm about concept drift will be made and the algorithm will apply prior know-ledge to delete inaccurate classifiers to adapt to new concept.The experimental results shows that,comparing with the contrast algorithms in the experiments:BWE,AE and AWE,the new approach can not only possess better performance for classification,but also efficiently decrease time cost.

Key words: Data streams,Concept drift,Classification,Kappa coefficient

[1] Bifet A,Holmes G,Pfahringer B.Leveraging Bagging for Evolving Data Streams [M]∥ Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2010:135-150
[2] Lemaire V,Salperwyck C,Bondu A.A survey on SupervisedClassification on Data Streams [J].Business Intelligence,2015,205:88-125
[3] Sharker A,Hullermeier E.IBLStreams:a system for instance-based classification and regression on data streams [J].Evolving System,2012,3(4):235-249
[4] Gama J,Kosina P.Recurrent concepts in data streams classification [J].Knowledge and Information Systems,2014,40(3):489-507
[5] Bifet A,Gavalda R.Adaptive Learning from Evolving Data Str-eams [C]∥Proceedings of 8th International Symposium on Intelligent Data Analysis.Heidelberg:Springer,2009:249-260
[6] Wu Xin-dong,Li Pei-pei,Hu Xue-gang.Learning from conceptdrifting data streams with unlabeled data [J].Neurocomputing,2012,92(3):145-155
[7] Gama J,Sebastiao R,Rodrigues P P.On evaluating streamlearning algorithms [J].Machine Learning,2013,90:317-346
[8] Brzezinski D,Stefanowski J.Prequential AUC for Classifier E-valuation and Drift Detection in Evolving Data Streams [C]∥Third International Workshop NFMCP 2014 Held in Conjunction with ECML(PKDD 2014).Heidelberg:Springer,2015:87-101
[9] Rutkowski L,Pietruczuk L,Duda P,et al.Decision trees formining data streams based on the McDiarmid’s bound [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(6):1272-1279
[10] Domingos P,Hulten G.Mining High- Speed Data Streams [C]∥Proceedings of the Sixth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.New York:ACM,2000:71-80
[11] Magdalena.Batch Weighted Ensemble for Mining Data Streams with Concept Drift [C]∥9th International Symposium(ISMIS 2011).Heidelberg:Springer,2011:290-299
[12] Zhang Peng,Zhu Xing-quan,Shi Yong,et al.An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise [C]∥13th Pacific-Asia Conference,PAKDD 2009.Heidelberg:Springer,2009:1021-1029
[13] Wang Hai-xun,Fan Wei,Yu P S,et al.Mining Concept-Drifting Data Streams Using Ensemble Classifiers [C]∥Proceedings of the Ninth ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:ACM,2003:226-235
[14] Wozniak M,Kasprzak A,Cal P.Weighted Aging Classifier Ensemble for the Incremental Drifted Data Streams [C]∥10th International Conference,FQAS 2013.Heidelberg:Springer,2013:579-588
[15] Zhou Ji-xiang,Mao Shi-song.Statistical Methods for QualityManagement [M].Beijing:China Statics Press,2008:433-440(in Chinese) 周纪芗,茆诗松.质量管理统计方法[M].北京:中国统计出版社,2008:433-440
[16] Wang Tao,Liu Ming-ju,Li De-ming.A Sstrong Chernoff Bounds Derived from Equitable Colorings of Graphs[J].Journal of Mathematics,2014,34(6):1015-1024
[17] Zliobaite I,Bifet A,Read J,et al.Evaluation methods and decision theory for classification of streaming data with temporal dependence [J].Machine Learning,2015,98(3):455-482
[18] Zhang Chen-guang,Zhang Yan.Semi-Supervised Learning [M].Beijing:China Agriculture Scientech Press,2013:31-33(in Chinese) 张晨光,张燕.半监督学习[M].北京:中国农业科学技术出版社,2013:31-33
[19] Li Pei-pei,Wu Xin-dong,Hu Xue-gang,et al.Learning concept-drifting data streams with random ensemble decision trees [J].Neurocomputing,2015,166(c):68-83
[20] Bofet A,Holmes G,Kirkby R,et al.MOA:Massive Online Analysis [J].The Journal of Machine Learning Research,2010,11(2):1601-1604

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!