计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 124-131.doi: 10.11896/jsjkx.230300023
康伟, 黎利辉, 文益民
KANG Wei, LI Lihui, WEN Yimin
摘要: 带概念漂移的半监督数据流分类任务中,仅有少部分的数据被标记,这给分类器的训练、概念漂移的检测以及分类器对新概念的适应带来了巨大的挑战。现有的半监督聚簇分类算法仅对分类器池中的聚簇模型进行简单的增量更新,未能有效重用历史聚簇模型。因此,文中提出了一种新的聚簇模型重用的半监督分类算法,称为CDCMR。首先,数据流以数据块的形式到来,对数据块分完类后,训练一个簇数自适应确定的聚簇模型。其次,通过计算分类器池中的各组件分类器与聚簇模型之间的相似度,挑选多个组件分类器。再次,用当前数据块对挑选出来的组件分类器进行模型重用后,与聚簇模型集成。然后,将分类器池划分为新旧更替和多样性最大化分类器池进行更新。最后,对下一个数据块的样本进行集成分类。在多个人工和真实数据集上进行实验,结果表明,所提算法1)能有效适应概念漂移,与现有方法相比其性能有显著性提升。
中图分类号:
[1]CHEN Z Q,HAN M,LI M H,et al.Survey of Concept Drift Handling Methods in Data Streams[J].Compute Science,2022,49(9):14-32. [2]GONÇALVES JR P M,DE CARVALHO SANTOS S G T,BARROS R S M,et al.A comparative study on concept drift detectors[J].Expert Systems with Applications,2014,41(18):8144-8156. [3]YUAN L H,LI H,XIA B H,et al.Recent advances in concept drift adaptation methods for deep learning[C]//the Thirty-First International Joint Conference on Artificial Intelligence.Vienna:IJCAI,2022:5654-5661. [4]SUN Y,TANG K,ZHU Z X,et al.Concept drift adaptation by exploiting historical knowledge[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(10):4822-4832. [5]HOSSEINI M J,GHOLIPOUR A,BEIGY H.An ensemble ofcluster-based classifiers for semi-supervised classification of non-stationary data streams[J].Knowledge and Information Systems,2016,46(3):567-597. [6]MASUD M M,WOOLAM C,GAO JIN,et al.Facing the reality of data stream classification:coping with scarcity of labeled data[J].Knowledge and Information Systems,2012,33(1):213-244. [7]ZUBAROĞLU A,ATALAY V.Data stream clustering:a re-view[J].Artificial Intelligence Review,2021,54(2):1201-1236. [8]WU X D,LI P P,HU X G.Learning from concept drifting data streams with unlabeled data[J].Neurocomputing,2012,92:145-155. [9]AHMADI Z,BEIGY H.Semi-supervised ensemble learning of data streams in the presence of concept drift[C]//Hybrid Artificial Intelligent Systems:7th International Conference.Berlin:Springer,2012:526-537. [10]SILVA J A,FARIA E R,BARROS R C,et al.Data stream clustering:A survey[J].ACM Computing Surveys(CSUR),2013,46(1):1-31. [11]WEN Y M,LIU S.Semi-supervised classification of datastreams by BIRCH ensemble and local structure mapping[J].Journal of Computer Science and Technology,2020,35(2):295-304. [12]TANHA J,SAMADI N,ABDI Y,et al.Cpssds:conformal prediction for semi-supervised classification on data streams[J].Information Sciences,2022,584:212-234. [13]KHEZRI S,TANHA J,AHMADI A,et al.Stds:self-training data streams for mining limited labeled data in non-stationary environment[J].Applied Intelligence,2020,50(5):1448-1467. [14]ZHENG X L,LI P P,HU X G,et al.Semi-supervised classification on data streams with recurring concept drift and concept evolution[J].Knowledge-Based Systems,2021,215:106749. [15]XU W H,QIN Z,CHANG Y.Semi-supervised learning based ensemble classifier for stream data[J].Pattern Recognition and Artificial Intelligence,2012,25(2):292-299. [16]MASUD M M,GAO J,KHAN L,et al.A practical approach to classify evolving data streams:Training with limited amount of labeled data[C]//2008 Eighth IEEE International Conference on Data Mining.NJ:IEEE,2008:929-934. [17]DIN S U,SHAO J M,KUMAR J,et al.Online reliable semi-supervised learning on evolving data streams[J].Information Sciences,2020,525:153-171. [18]KHEZRI S,TANHA J,AHMADI A,et al.A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams[J].Neurocomputing,2021,442:125-145. [19]ZHAO P L,HOI S C H,WANG J L,et al.Online transfer lear-ning[J].Artificial intelligence,2014,216:76-102. [20]CRAMMER K,DEKEL O,KESHET J,et al.Online passive aggressive algorithms[J].Journal of Machine Learning Research,2006,7:551-585. [21]TANG S Q,WEN Y M,QIN Y X.A multi-source online transfer learning method based on local accuracy[J].Journal of Software,2017,28(11):2940-2960. [22]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450(C):200-226. [23]LIU C J,WEN Y M,XUE Y.Semi-supervised classification of data streams based on adaptive density peak clustering[C]//the 27th International Conference on Neural Information Proces-sing.Berlin:Springer,2020:639-650. [24]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:many could be better than all[J].Artificial Intelligence,2002,137(1/2):239-263. [25]DYER K B,CAPO R,POLIKAR R.Compose:a semi-supervised learning framework for initially labeled nonstationary streaming data[J].IEEE Transactions on Neural Networks and Learning Systems,2013,25(1):12-26. |
|