Computer Science ›› 2019, Vol. 46 ›› Issue (1): 64-72.doi: 10.11896/j.issn.1002-137X.2019.01.010

• CCDM2018 • Previous Articles     Next Articles

Multi-source Online Transfer Learning Algorithm for Classification of Data Streams with Concept Drift

QIN Yi-xiu1, WEN Yi-min1,2, HE Qian1   

  1. (School of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)1
    (Guangxi Key Laboratory of Trustworthy Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)2
  • Received:2018-06-02 Online:2019-01-15 Published:2019-02-25

Abstract: The existing algorithms for classification of data streams with concept drift always train a new classifier on new collected data when new concept is detected,and forget the historical models.This strategy always lead to insufficient training of classifier in a short time,because the training data for the new concept are always not collected enough in initial stage.And further,some existing online transfer learning algorithms for classification of data streams with concept drift only take advantage of single source domain,which sometimes lead to poor classification accuracy when the historical concepts are different with the new concept.Aiming to solve these problems above,this paper proposed a multi-source online transfer learning algorithms for classification of data stream with concept drift (CMOL),which can utilize the knowledges from multiple historical classifiers.The CMOL algorithm adopts a dynamic classifier weight adjustment mechanism and updates classifier pool according to the weights of classifiers in it.Experiments validate that CMOL can adapt to new concept faster than other corresponding methods when concept drift occurs,and get higher classification accuracy.

Key words: Concept drift, Data stream classification, Multi-source transfer learning, Online learning

CLC Number: 

  • TP391
[1]SCHLIMMER J C,GRANGER R H.Incremental Learning from Noisy Data[J].Machine Learning,1986,1(3):317-354.<br /> [2]HULTEN G,SPENCER L,DOMINGOS P.Mining time-changing data streams[C]//Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2001:97-106.<br /> [3]KOLTER J Z,MALOOF M A.Dynamic weighted majority:a new ensemble method for tracking concept drift[C]//Procee-dings of the IEEE Conference on Data Mining.Piscataway:IEEE,2003:123-130.<br /> [4]JR P M G,BARROS R S M D.RCD:A recurring concept drift framework[J].Pattern Recognition Letters,2013,34(9):1018-1025.<br /> [5]LI P,WU X,HU X.Mining recurring concept drifts with limited labeled streaming data[C]//Proceedings of the 2nd Asian Conference on Machine Learning.New York:ACM,2010:241-252.<br /> [6]ZHAO P,HOI S C H,WANG J,et al.Online Transfer Learning[J].Journal of Artificial Intelligence,2014,216(16):76-102.<br /> [7]WEN Y M,TANG S Q,FENG C,et al.Online Transfer Learning for Mining Recurring Concept in Data Stream Classification[J].Journal of Computer Research and Development,2016,53(8):1781-1791.(in Chinese)<br /> 文益民,唐诗淇,冯超,等.基于在线迁移学习的重现概念漂移数据流分类[J].计算机研究与发展,2016,53(8):1781-1791.<br /> [8]WEN Y M,QIANG B H,FAN Z G.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(2):95-104.(in Chinese)<br /> 文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(2):95-104.<br /> [9]ZLIOBAITE I,PECHENIZKIY M,GAMA J.An overview of concept drift applications[J].Studies in Big Data,2016,16(1):91-114.<br /> [10]KRAWCZYK B,MINKU L L,GAMA J,et al.Ensemble learning for data stream analysis:A survey[J].Information Fusion,2017,37(C):132-156.<br /> [11]GAMA J,ZLIOBAITE I,BIFET A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys (CSUR),2014,46(4):1-37.<br /> [12]CASTILLO G,GAMA J,BREDA A M.Adaptive bayes for a student modeling prediction task based on learning styles[C]//Proceedings of the International Conference on User Modeling.BerLin:Springer,2003:328-332.<br /> [13]KUKAR M.Drifting Concepts as Hidden Factors in Clinical Studies[M]//Artificial Intelligence in Medicine.Berlin:Sprin-ger,2003:28-35.<br /> [14]ZHUANG F Z,LUO P,HE Q,et al.Survey on transfer learning research[J].Journal of Software,2015,26(1):26-39.(in Chinese)<br /> 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.<br /> [15]LU L L,ZHANG Y P,TAN H Y,et al.Research on classification algorithm and concept drift based on big data[J].Journal of Frontiers of Computer Science & Technology,2016,10(12):1683-1692.(in Chinese)<br /> 陆莉莉,张永潘,谈海宇,等.大数据分类挖掘算法及其概念漂移应用研究[J].计算机科学与探索,2016,10(12):1683-1692.<br /> [16] LI Y,ZHANG Y H,HU X G,et al.Classification Algorithm for Data Stream Based on Mixture Models of C4.5 and NB[J].Computer Science,2010,37(12):138-142.(in Chinese)<br /> 李燕,张玉红,胡学钢,等.基于C4.5和NB混合模型的数据流分类算法[J].计算机科学,2010,37(12):138-142.<br /> [17]VINAYAGA SUNDARAM B,AARIHI R J,SARANYA P A.Efficient Gaussian Decision Tree method for Concept drift data stream[C]//Proceedings of the International Conference on Signal Processing,Communication and Networking.Piscataway:IEEE,2015:1-5.<br /> [18]STREET W N.A streaming ensemble algorithm (SEA) for large-scale classification[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.New York:ACM,2001:377-382.<br /> [19]RAMAMURTHY S,BHATNAGAR R.Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers[C]//Proceedings of the International Conference on Machine Lear-ning and Applications.IEEE:NJ,2007:404-409.<br /> [20]BRZEZINSKI D,STEFANOWSKI J.Reacting to different types of concept drift:The Accuracy Updated Ensemble algorithm[J].IEEE Transactions on Neural Networks & Learning Systems,2014,25(1):81-94.<br /> [21]SUN Y,TANG K,ZHU Z,et al.Concept Drift Adaptation by Exploiting Historical Knowledge[J].IEEE Transactions on Neural Networks & Learning Systems,2017,PP(99):1-11.<br /> [22]XIN Y,GUO G D,CHEN L F,et al.IKnnM-DHecoc:A Method for Handling the Problem of Concept Drift[J].Journal of Computer Research and Development,2011,48(4):592-601.(in Chinese)<br /> 辛轶,郭躬德,陈黎飞,等.IKnnM-DHecoc:一种解决概念漂移问题的方法[J].计算机研究与发展,2011,48(4):592-601.<br /> [23]WEISS K,KHOSHGOFTAAR T M,WANG D D.A survey of transfer learning[J].Journal of Big Data,2016,3(1):9.<br /> [24]SUN S,SHI H,WU Y.A survey of multi-source domain adaptation[J].Journal of Information Fusion,2015,24(C):84-92.<br /> [25]PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge And Data Engineering,2010,22(10):1345-1359.<br /> [26]WU Q,WU H,ZHOU X,et al.Online transfer learning with multiple homogeneous or heterogeneous sources[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(7):1494-1507.<br /> [27]TANG S Q,WEN Y M,QIN Y X,et al.Online Transfer Learning from Multiple Sources Based on Local Classification Accuracy[J].Journal of Software,2017,28(11):2940-2960.(in Chinese)<br /> 唐诗淇,文益民,秦一休,等.一种基于局部分类精度的多源在线迁移学习算法[J].软件学报,2017,28(11):2940-2960.<br /> [28]BIFET A,HOLMES G,KIRKBY R,et al.MOA:Massive Online Analysis[J].Journal of Machine Learning Research,2010,11(2):1601-1604.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] CHEN Yuan-yuan, WANG Zhi-hai. Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition [J]. Computer Science, 2022, 49(7): 25-30.
[3] XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[4] LIU Ling-yun, QIAN Hui, XING Hong-jie, DONG Chun-ru, ZHANG Feng. Incremental Classification Model Based on Q-learning Algorithm [J]. Computer Science, 2020, 47(8): 171-177.
[5] KONG Fang, LI Qi-zhi, LI Shuai. Survey on Online Influence Maximization [J]. Computer Science, 2020, 47(5): 7-13.
[6] HE Xiao-wen, HU Yi-fei, WANG Hai-ping, CHEN Mo. Online Learning Nonnegative Matrix Factorization [J]. Computer Science, 2019, 46(6A): 473-477.
[7] WAN Jia-shan, CHEN Lei, WU Jin-hua, GAO Chao. Persona Based Social User Modeling Using KD-Tree [J]. Computer Science, 2019, 46(6A): 442-445.
[8] LI De-quan, DONG Qiao, ZHOU Yue-jin. Distributed Online Conditional Gradient Optimization Algorithm [J]. Computer Science, 2019, 46(3): 332-337.
[9] YANG Hai-min, PAN Zhi-song, BAI Wei. Review of Time Series Prediction Methods [J]. Computer Science, 2019, 46(1): 21-28.
[10] CHEN Jin-yin, FANG Hang, LIN Xiang, ZHENG Hai-bin, YANG Dong-yong, ZHOU Xiao. Personal Learning Recommendation Based on Online Learning Behavior Analysis [J]. Computer Science, 2018, 45(11A): 422-426.
[11] ZHAO Qiang-li and JIANG Yan-huang. Online Data Stream Mining for Seriously Unbalanced Applications [J]. Computer Science, 2017, 44(6): 255-259.
[12] WANG Chang-bao, LI Qing-wen and YU Hua-long. Active,Online and Weighted Extreme Learning Machine Algorithm for Class Imbalance Data [J]. Computer Science, 2017, 44(12): 221-226.
[13] XUE Wei, ZHANG Wen-sheng and REN Jun-hong. Online Learning Based on Stochastic Spectral Gradient [J]. Computer Science, 2016, 43(9): 47-51.
[14] CHEN Xiao-dong, SUN Li-juan, HAN Chong and GUO Jian. Detecting Concept Drift of Data Stream Based on Fuzzy Clustering [J]. Computer Science, 2016, 43(4): 219-223.
[15] DING Jian, HAN Meng and LI Juan. Review of Concept Drift Data Streams Mining Techniques [J]. Computer Science, 2016, 43(12): 24-29.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!