Computer Science ›› 2017, Vol. 44 ›› Issue (6): 255-259.doi: 10.11896/j.issn.1002-137X.2017.06.044

Previous Articles     Next Articles

Online Data Stream Mining for Seriously Unbalanced Applications

ZHAO Qiang-li and JIANG Yan-huang   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Using ensemble of classifiers on sequential blocks of training instances is a popular strategy for data stream mining with concept drifts.Yet for the seriously unbalanced applications where the number of examples for each class in the data blocks is totally different,traditional data block creation will result in low accuracy for the small classes with much less number of instances.This paper provided an updating algorithm UMAE (Unbalanced data learning based on MAE) for seriously unbalanced applications based on MAE (Memorizing based Adaptive Ensemble).UMAE sets an equal-sized sliding window for each class.When each data block comes,each example in the data block comes into the corresponding sliding window based on its classes.During the learning process,a new data block will be created by using the instances in the current sliding windows.This new data block is adopted to generate a new classifier.Compared with five traditional data stream mining approaches,the results show that UMAE achieves high accuracy for seriously unba-lanced applications,especially for the small classes with much less number of instances in the applications.

Key words: Online learning,Data stream mining,Recalling and forgetting mechanisms,Unbalanced data learning

[1] SAYED-MOUCHAWEH M,LUGHOFER E.Learning in Non-Stationary Environments:Methods and Applications [M].New York:Springer,2012.
[2] GAMA J.Knowledge Discovery from Data Streams(1st ed)[M].London,U.K.:Chapman & Hall,2010.
[3] STREET W N,KIM Y.A streaming ensemble algorithm (SEA) for large-scale classification [C]∥Proc.KDD ’01 ACM SIGKDD Int.Conf.Knowl.Discovery Data Mining.2001:377-382.
[4] WANG H,FAN W,YU P S,et al.Mining concept-drifting data streams using ensemble classifiers [C]∥Proc.KDD’03 ACM SIGKDD Int.Conf.Knowl.Discovery Data Mining.2003:226-235.
[5] NISHIDA K,YAMAUCHI K,OMORI T.ACE:Adaptive classifiers-ensemble system for concept-drifting environments[C]∥Proc.6th Int.Workshop Multiple Classifier Syst..2005:176-185.
[6] ZHAO Q L,JIANG Y F,LU Y T.Ensemble model and algorithm with recalling and forgetting mechanisms for data stream mining[J].Journal of Software,2015,6(10):2567-2580.(in Chinese) 赵强利,蒋艳凰,卢宇彤.具有回忆和遗忘机制的数据流挖掘模型与算法[J].软件学报,2015,6(10):2567-2580.
[7] JIANG Y H,ZHAO Q L,LU Y T.Adaptive Ensemble with Human Memorizing Characteristics for Data Stream Mining [J].Mathematical Problems in Engineering,2015,2015:1-10.
[8] UCI Machine Learning Repository.http://archive.ics.uci.edu/ml.
[9] QUINLAN J R.C4.5:Programs for Machine Learning[M].USA:Morgan Kaufmann Publishers,1993.
[10] MARTINEZ-MUNOZ G,HERNNDEZ-LOBATO D,SUAR-EZ A.An analysis of ensemble pruning techniques based on ordered aggregation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,1(2):245-259.
[11] ZHAO Q L,JIANG Y H,XU M.Catergorization and Comparision of the Eusemble Pruning Algorithm[J].Computer Engineering and Science,202,4(2):134-138.(in Chinese) 赵强利,蒋艳凰,徐明.选择性集成算法分类与比较[J].计算机工程与科学,2012,4(2):134-138.
[12] ZHAO Q,JIANG Y.LibEDM:a platform for ensemble based data mining [C]∥Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW’14).Shenzhen,2014:1250-1253.
[13] Library for Ensemble based Data Mining .https://github.com/Qiangli-Zhao/LibEDM.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!