计算机科学 ›› 2017, Vol. 44 ›› Issue (6): 255-259.doi: 10.11896/j.issn.1002-137X.2017.06.044
赵强利,蒋艳凰
ZHAO Qiang-li and JIANG Yan-huang
摘要: 集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法。对于类别分布严重不均衡的应用,集成式数据流挖掘中数据块的学习方式导致样本数多的类别的分类精度高,样本数少的类别的分类精度低的问题,现有算法无法满足此类应用的需求。针对上述问题,对基于回忆机制的集成式数据流学习算法MAE(Memorizing based Adaptive Ensemble)进行改进,提出面向类别严重不均衡应用的在线数据流学习算法UMAE(Unbalanced data Lear-ning based on MAE)。UMAE算法为每个类别设置了一个样本滑动窗口,对于新到达的数据块,其样本依据自身的类别分别进入相应的滑动窗口,最后利用各类别滑动窗口内的样本构建用于在线学习的数据块。与5种典型的数据流挖掘算法的比较结果表明,UMAE算法在满足实时性的同时,不仅整体分类精度高,而且对于样本数很少的小类别的分类精度有大幅度提高;对于异常检测等类别分布严重不均衡的应用,UMAE算法的实用性明显优于其他算法。
[1] SAYED-MOUCHAWEH M,LUGHOFER E.Learning in Non-Stationary Environments:Methods and Applications [M].New York:Springer,2012. [2] GAMA J.Knowledge Discovery from Data Streams(1st ed)[M].London,U.K.:Chapman & Hall,2010. [3] STREET W N,KIM Y.A streaming ensemble algorithm (SEA) for large-scale classification [C]∥Proc.KDD ’01 ACM SIGKDD Int.Conf.Knowl.Discovery Data Mining.2001:377-382. [4] WANG H,FAN W,YU P S,et al.Mining concept-drifting data streams using ensemble classifiers [C]∥Proc.KDD’03 ACM SIGKDD Int.Conf.Knowl.Discovery Data Mining.2003:226-235. [5] NISHIDA K,YAMAUCHI K,OMORI T.ACE:Adaptive classifiers-ensemble system for concept-drifting environments[C]∥Proc.6th Int.Workshop Multiple Classifier Syst..2005:176-185. [6] ZHAO Q L,JIANG Y F,LU Y T.Ensemble model and algorithm with recalling and forgetting mechanisms for data stream mining[J].Journal of Software,2015,6(10):2567-2580.(in Chinese) 赵强利,蒋艳凰,卢宇彤.具有回忆和遗忘机制的数据流挖掘模型与算法[J].软件学报,2015,6(10):2567-2580. [7] JIANG Y H,ZHAO Q L,LU Y T.Adaptive Ensemble with Human Memorizing Characteristics for Data Stream Mining [J].Mathematical Problems in Engineering,2015,2015:1-10. [8] UCI Machine Learning Repository.http://archive.ics.uci.edu/ml. [9] QUINLAN J R.C4.5:Programs for Machine Learning[M].USA:Morgan Kaufmann Publishers,1993. [10] MARTINEZ-MUNOZ G,HERNNDEZ-LOBATO D,SUAR-EZ A.An analysis of ensemble pruning techniques based on ordered aggregation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,1(2):245-259. [11] ZHAO Q L,JIANG Y H,XU M.Catergorization and Comparision of the Eusemble Pruning Algorithm[J].Computer Engineering and Science,202,4(2):134-138.(in Chinese) 赵强利,蒋艳凰,徐明.选择性集成算法分类与比较[J].计算机工程与科学,2012,4(2):134-138. [12] ZHAO Q,JIANG Y.LibEDM:a platform for ensemble based data mining [C]∥Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW’14).Shenzhen,2014:1250-1253. [13] Library for Ensemble based Data Mining .https://github.com/Qiangli-Zhao/LibEDM. |
No related articles found! |
|