计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 64-72.doi: 10.11896/j.issn.1002-137X.2019.01.010
秦一休1, 文益民1,2, 何倩1
QIN Yi-xiu1, WEN Yi-min1,2, HE Qian1
摘要: 现有概念漂移处理算法在检测到概念漂移发生后,通常需要在新到概念上重新训练分类器,同时“遗忘”以往训练的分类器。在概念漂移发生初期,由于能够获取到的属于新到概念的样本较少,导致新建的分类器在短时间内无法得到充分训练,分类性能通常较差。进一步,现有的基于在线迁移学习的数据流分类算法仅能使用单个分类器的知识辅助新到概念进行学习,在历史概念与新到概念相似性较差时,分类模型的分类准确率不理想。针对以上问题,文中提出一种能够利用多个历史分类器知识的数据流分类算法——CMOL。CMOL算法采取分类器权重动态调节机制,根据分类器的权重对分类器池进行更新,使得分类器池能够尽可能地包含更多的概念。实验表明,相较于其他相关算法,CMOL算法能够在概念漂移发生时更快地适应新到概念,显示出更高的分类准确率。
中图分类号:
[1]SCHLIMMER J C,GRANGER R H.Incremental Learning from Noisy Data[J].Machine Learning,1986,1(3):317-354.<br /> [2]HULTEN G,SPENCER L,DOMINGOS P.Mining time-changing data streams[C]//Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2001:97-106.<br /> [3]KOLTER J Z,MALOOF M A.Dynamic weighted majority:a new ensemble method for tracking concept drift[C]//Procee-dings of the IEEE Conference on Data Mining.Piscataway:IEEE,2003:123-130.<br /> [4]JR P M G,BARROS R S M D.RCD:A recurring concept drift framework[J].Pattern Recognition Letters,2013,34(9):1018-1025.<br /> [5]LI P,WU X,HU X.Mining recurring concept drifts with limited labeled streaming data[C]//Proceedings of the 2nd Asian Conference on Machine Learning.New York:ACM,2010:241-252.<br /> [6]ZHAO P,HOI S C H,WANG J,et al.Online Transfer Learning[J].Journal of Artificial Intelligence,2014,216(16):76-102.<br /> [7]WEN Y M,TANG S Q,FENG C,et al.Online Transfer Learning for Mining Recurring Concept in Data Stream Classification[J].Journal of Computer Research and Development,2016,53(8):1781-1791.(in Chinese)<br /> 文益民,唐诗淇,冯超,等.基于在线迁移学习的重现概念漂移数据流分类[J].计算机研究与发展,2016,53(8):1781-1791.<br /> [8]WEN Y M,QIANG B H,FAN Z G.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(2):95-104.(in Chinese)<br /> 文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(2):95-104.<br /> [9]ZLIOBAITE I,PECHENIZKIY M,GAMA J.An overview of concept drift applications[J].Studies in Big Data,2016,16(1):91-114.<br /> [10]KRAWCZYK B,MINKU L L,GAMA J,et al.Ensemble learning for data stream analysis:A survey[J].Information Fusion,2017,37(C):132-156.<br /> [11]GAMA J,ZLIOBAITE I,BIFET A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys (CSUR),2014,46(4):1-37.<br /> [12]CASTILLO G,GAMA J,BREDA A M.Adaptive bayes for a student modeling prediction task based on learning styles[C]//Proceedings of the International Conference on User Modeling.BerLin:Springer,2003:328-332.<br /> [13]KUKAR M.Drifting Concepts as Hidden Factors in Clinical Studies[M]//Artificial Intelligence in Medicine.Berlin:Sprin-ger,2003:28-35.<br /> [14]ZHUANG F Z,LUO P,HE Q,et al.Survey on transfer learning research[J].Journal of Software,2015,26(1):26-39.(in Chinese)<br /> 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.<br /> [15]LU L L,ZHANG Y P,TAN H Y,et al.Research on classification algorithm and concept drift based on big data[J].Journal of Frontiers of Computer Science & Technology,2016,10(12):1683-1692.(in Chinese)<br /> 陆莉莉,张永潘,谈海宇,等.大数据分类挖掘算法及其概念漂移应用研究[J].计算机科学与探索,2016,10(12):1683-1692.<br /> [16] LI Y,ZHANG Y H,HU X G,et al.Classification Algorithm for Data Stream Based on Mixture Models of C4.5 and NB[J].Computer Science,2010,37(12):138-142.(in Chinese)<br /> 李燕,张玉红,胡学钢,等.基于C4.5和NB混合模型的数据流分类算法[J].计算机科学,2010,37(12):138-142.<br /> [17]VINAYAGA SUNDARAM B,AARIHI R J,SARANYA P A.Efficient Gaussian Decision Tree method for Concept drift data stream[C]//Proceedings of the International Conference on Signal Processing,Communication and Networking.Piscataway:IEEE,2015:1-5.<br /> [18]STREET W N.A streaming ensemble algorithm (SEA) for large-scale classification[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mi-ning.New York:ACM,2001:377-382.<br /> [19]RAMAMURTHY S,BHATNAGAR R.Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers[C]//Proceedings of the International Conference on Machine Lear-ning and Applications.IEEE:NJ,2007:404-409.<br /> [20]BRZEZINSKI D,STEFANOWSKI J.Reacting to different types of concept drift:The Accuracy Updated Ensemble algorithm[J].IEEE Transactions on Neural Networks & Learning Systems,2014,25(1):81-94.<br /> [21]SUN Y,TANG K,ZHU Z,et al.Concept Drift Adaptation by Exploiting Historical Knowledge[J].IEEE Transactions on Neural Networks & Learning Systems,2017,PP(99):1-11.<br /> [22]XIN Y,GUO G D,CHEN L F,et al.IKnnM-DHecoc:A Method for Handling the Problem of Concept Drift[J].Journal of Computer Research and Development,2011,48(4):592-601.(in Chinese)<br /> 辛轶,郭躬德,陈黎飞,等.IKnnM-DHecoc:一种解决概念漂移问题的方法[J].计算机研究与发展,2011,48(4):592-601.<br /> [23]WEISS K,KHOSHGOFTAAR T M,WANG D D.A survey of transfer learning[J].Journal of Big Data,2016,3(1):9.<br /> [24]SUN S,SHI H,WU Y.A survey of multi-source domain adaptation[J].Journal of Information Fusion,2015,24(C):84-92.<br /> [25]PAN S J,YANG Q.A Survey on Transfer Learning[J].IEEE Transactions on Knowledge And Data Engineering,2010,22(10):1345-1359.<br /> [26]WU Q,WU H,ZHOU X,et al.Online transfer learning with multiple homogeneous or heterogeneous sources[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(7):1494-1507.<br /> [27]TANG S Q,WEN Y M,QIN Y X,et al.Online Transfer Learning from Multiple Sources Based on Local Classification Accuracy[J].Journal of Software,2017,28(11):2940-2960.(in Chinese)<br /> 唐诗淇,文益民,秦一休,等.一种基于局部分类精度的多源在线迁移学习算法[J].软件学报,2017,28(11):2940-2960.<br /> [28]BIFET A,HOLMES G,KIRKBY R,et al.MOA:Massive Online Analysis[J].Journal of Machine Learning Research,2010,11(2):1601-1604. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 陈圆圆, 王志海. 基于聚类分区的多维数据流概念漂移检测方法 Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition 计算机科学, 2022, 49(7): 25-30. https://doi.org/10.11896/jsjkx.210600155 |
[3] | 夏源, 赵蕴龙, 范其林. 基于信息熵更新权重的数据流集成分类算法 Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight 计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047 |
[4] | 刘凌云, 钱辉, 邢红杰, 董春茹, 张峰. 一种基于Q-学习算法的增量分类模型 Incremental Classification Model Based on Q-learning Algorithm 计算机科学, 2020, 47(8): 171-177. https://doi.org/10.11896/jsjkx.190600150 |
[5] | 孔芳, 李奇之, 李帅. 在线影响力最大化研究综述 Survey on Online Influence Maximization 计算机科学, 2020, 47(5): 7-13. https://doi.org/10.11896/jsjkx.200200071 |
[6] | 何孝文, 胡一飞, 王海平, 陈默. 在线学习非负矩阵分解 Online Learning Nonnegative Matrix Factorization 计算机科学, 2019, 46(6A): 473-477. |
[7] | 李德权,董翘,周跃进. 分布式在线条件梯度优化算法 Distributed Online Conditional Gradient Optimization Algorithm 计算机科学, 2019, 46(3): 332-337. https://doi.org/10.11896/j.issn.1002-137X.2019.03.049 |
[8] | 杨海民, 潘志松, 白玮. 时间序列预测方法综述 Review of Time Series Prediction Methods 计算机科学, 2019, 46(1): 21-28. https://doi.org/10.11896/j.issn.1002-137X.2019.01.004 |
[9] | 陈晋音, 方航, 林翔, 郑海斌, 杨东勇, 周晓. 基于在线学习行为分析的个性化学习推荐 Personal Learning Recommendation Based on Online Learning Behavior Analysis 计算机科学, 2018, 45(11A): 422-426. |
[10] | 赵强利,蒋艳凰. 类别严重不均衡应用的在线数据流学习算法 Online Data Stream Mining for Seriously Unbalanced Applications 计算机科学, 2017, 44(6): 255-259. https://doi.org/10.11896/j.issn.1002-137X.2017.06.044 |
[11] | 王长宝,李青雯,于化龙. 面向类别不平衡数据的主动在线加权极限学习机算法 Active,Online and Weighted Extreme Learning Machine Algorithm for Class Imbalance Data 计算机科学, 2017, 44(12): 221-226. https://doi.org/10.11896/j.issn.1002-137X.2017.12.040 |
[12] | 薛伟,张文生,任俊宏. 基于随机谱梯度的在线学习 Online Learning Based on Stochastic Spectral Gradient 计算机科学, 2016, 43(9): 47-51. https://doi.org/10.11896/j.issn.1002-137X.2016.09.008 |
[13] | 陈小东,孙力娟,韩崇,郭剑. 基于模糊聚类的数据流概念漂移检测算法 Detecting Concept Drift of Data Stream Based on Fuzzy Clustering 计算机科学, 2016, 43(4): 219-223. https://doi.org/10.11896/j.issn.1002-137X.2016.04.045 |
[14] | 张玉红,陈伟,胡学钢. 一种面向不完全标记的文本数据流自适应分类方法 Self-adaptation Classification for Incomplete Labeled Text Data Stream 计算机科学, 2016, 43(12): 179-182. https://doi.org/10.11896/j.issn.1002-137X.2016.12.032 |
[15] | 徐树良,王俊红. 基于Kappa系数的数据流分类算法 Data Stream Classification Algorithm Based on Kappa Coefficient 计算机科学, 2016, 43(12): 173-178. https://doi.org/10.11896/j.issn.1002-137X.2016.12.031 |
|