Computer Science ›› 2024, Vol. 51 ›› Issue (4): 124-131.doi: 10.11896/jsjkx.230300023

• Database & Big Data & Data Science • Previous Articles     Next Articles

Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse

KANG Wei, LI Lihui, WEN Yimin   

  1. Guangxi Key Laboratory of Image and Graphic Intelligent Processing,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China
  • Received:2023-03-02 Revised:2023-05-16 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    Key Research and Development Program of Guangxi(Guike AB21220023),National Natural Science Foundation of China(62366011)and Guangxi Key Laboratory of Image and Graphic Intelligent Processing(GIIP2306).

Abstract: Semi-supervised classification of data stream with concept drift poses challenges to classifier training,classifier adaption for new concept,and concept drifting detection,for only some or even very few instances are labeled.In the existing semi-supervised clustering classification algorithms,only the clustering model in the classifier pool is updated incrementally,and the historical clustering model cannot be reused effectively.Therefore,this paper proposes a new cluster-based model reuse semi-supervised classification algorithm,CDCMR.First,the data stream comes in the form of data chunks.After classifying the data chunks,a clustering model with adaptive determination of the number of clusters is trained.Secondly,multiple history classifiers are selected by calculating the similarity between each history classifier in the classifier pool and the clustering model.Thirdly,the selected history classifier is reused with the current data chunk and integrated with the cluster model.Then,the classifier pool is divided into old and new replacement and diversity maximization classifier pool for updating.Finally,the samples of the next data chunk are ensemble classification.Experimental results on several artificial and real data sets show that the algorithm can effectively adapt to concept drift,which is significantly improved compared with the existing methods.

Key words: Data stream, Semi-supervised learning, Concept drift, Clustering model reuse, Ensemble learning

CLC Number: 

  • TP391
[1]CHEN Z Q,HAN M,LI M H,et al.Survey of Concept Drift Handling Methods in Data Streams[J].Compute Science,2022,49(9):14-32.
[2]GONÇALVES JR P M,DE CARVALHO SANTOS S G T,BARROS R S M,et al.A comparative study on concept drift detectors[J].Expert Systems with Applications,2014,41(18):8144-8156.
[3]YUAN L H,LI H,XIA B H,et al.Recent advances in concept drift adaptation methods for deep learning[C]//the Thirty-First International Joint Conference on Artificial Intelligence.Vienna:IJCAI,2022:5654-5661.
[4]SUN Y,TANG K,ZHU Z X,et al.Concept drift adaptation by exploiting historical knowledge[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(10):4822-4832.
[5]HOSSEINI M J,GHOLIPOUR A,BEIGY H.An ensemble ofcluster-based classifiers for semi-supervised classification of non-stationary data streams[J].Knowledge and Information Systems,2016,46(3):567-597.
[6]MASUD M M,WOOLAM C,GAO JIN,et al.Facing the reality of data stream classification:coping with scarcity of labeled data[J].Knowledge and Information Systems,2012,33(1):213-244.
[7]ZUBAROĞLU A,ATALAY V.Data stream clustering:a re-view[J].Artificial Intelligence Review,2021,54(2):1201-1236.
[8]WU X D,LI P P,HU X G.Learning from concept drifting data streams with unlabeled data[J].Neurocomputing,2012,92:145-155.
[9]AHMADI Z,BEIGY H.Semi-supervised ensemble learning of data streams in the presence of concept drift[C]//Hybrid Artificial Intelligent Systems:7th International Conference.Berlin:Springer,2012:526-537.
[10]SILVA J A,FARIA E R,BARROS R C,et al.Data stream clustering:A survey[J].ACM Computing Surveys(CSUR),2013,46(1):1-31.
[11]WEN Y M,LIU S.Semi-supervised classification of datastreams by BIRCH ensemble and local structure mapping[J].Journal of Computer Science and Technology,2020,35(2):295-304.
[12]TANHA J,SAMADI N,ABDI Y,et al.Cpssds:conformal prediction for semi-supervised classification on data streams[J].Information Sciences,2022,584:212-234.
[13]KHEZRI S,TANHA J,AHMADI A,et al.Stds:self-training data streams for mining limited labeled data in non-stationary environment[J].Applied Intelligence,2020,50(5):1448-1467.
[14]ZHENG X L,LI P P,HU X G,et al.Semi-supervised classification on data streams with recurring concept drift and concept evolution[J].Knowledge-Based Systems,2021,215:106749.
[15]XU W H,QIN Z,CHANG Y.Semi-supervised learning based ensemble classifier for stream data[J].Pattern Recognition and Artificial Intelligence,2012,25(2):292-299.
[16]MASUD M M,GAO J,KHAN L,et al.A practical approach to classify evolving data streams:Training with limited amount of labeled data[C]//2008 Eighth IEEE International Conference on Data Mining.NJ:IEEE,2008:929-934.
[17]DIN S U,SHAO J M,KUMAR J,et al.Online reliable semi-supervised learning on evolving data streams[J].Information Sciences,2020,525:153-171.
[18]KHEZRI S,TANHA J,AHMADI A,et al.A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams[J].Neurocomputing,2021,442:125-145.
[19]ZHAO P L,HOI S C H,WANG J L,et al.Online transfer lear-ning[J].Artificial intelligence,2014,216:76-102.
[20]CRAMMER K,DEKEL O,KESHET J,et al.Online passive aggressive algorithms[J].Journal of Machine Learning Research,2006,7:551-585.
[21]TANG S Q,WEN Y M,QIN Y X.A multi-source online transfer learning method based on local accuracy[J].Journal of Software,2017,28(11):2940-2960.
[22]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450(C):200-226.
[23]LIU C J,WEN Y M,XUE Y.Semi-supervised classification of data streams based on adaptive density peak clustering[C]//the 27th International Conference on Neural Information Proces-sing.Berlin:Springer,2020:639-650.
[24]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:many could be better than all[J].Artificial Intelligence,2002,137(1/2):239-263.
[25]DYER K B,CAPO R,POLIKAR R.Compose:a semi-supervised learning framework for initially labeled nonstationary streaming data[J].IEEE Transactions on Neural Networks and Learning Systems,2013,25(1):12-26.
[1] DAI Wei, CHAI Jing, LIU Yajiao. Semi-supervised Learning Algorithm Based on Maximum Margin and Manifold Hypothesis [J]. Computer Science, 2024, 51(2): 259-267.
[2] LI Hui, LI Wengen, GUAN Jihong. Dually Encoded Semi-supervised Anomaly Detection [J]. Computer Science, 2023, 50(7): 53-59.
[3] ZHANG Desheng, CHEN Bo, ZHANG Jianhui, BU Youjun, SUN Chongxin, SUN Jia. Browser Fingerprint Recognition Based on Improved Self-paced Ensemble Algorithm [J]. Computer Science, 2023, 50(7): 317-324.
[4] GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[5] WANG Qingyu, WANG Hairui, ZHU Guifu, MENG Shunjian. Study on SQL Injection Detection Based on FlexUDA Model [J]. Computer Science, 2023, 50(6A): 220600172-6.
[6] QIN Liang, XIE Liang, CHEN Shengshuang, XU Haijiao. Online Semi-supervised Cross-modal Hashing Based on Anchor Graph Classification [J]. Computer Science, 2023, 50(6): 183-193.
[7] YANG Qianlong, JIANG Lingyun. Study on Load Balancing Algorithm of Microservices Based on Machine Learning [J]. Computer Science, 2023, 50(5): 313-321.
[8] ZHANG Renbin, ZUO Yicong, ZHOU Zelin, WANG Long, CUI Yuhang. Multimodal Generative Adversarial Networks Based Multivariate Time Series Anomaly Detection [J]. Computer Science, 2023, 50(5): 355-362.
[9] HU Zhongyuan, XUE Yu, ZHA Jiajie. Survey on Evolutionary Recurrent Neural Networks [J]. Computer Science, 2023, 50(3): 254-265.
[10] LI Haitao, WANG Ruimin, DONG Weiyu, JIANG Liehui. Semi-supervised Network Traffic Anomaly Detection Method Based on GRU [J]. Computer Science, 2023, 50(3): 380-390.
[11] WANG Xiangwei, HAN Rui, Chi Harold LIU. Hierarchical Memory Pool Based Edge Semi-supervised Continual Learning Method [J]. Computer Science, 2023, 50(2): 23-31.
[12] XU Huajie, XIAO Yifeng. Semi-supervised Semantic Segmentation Method Based on Multiple Teacher Network Model [J]. Computer Science, 2023, 50(12): 279-284.
[13] SONG Faxing, MIAO Duoqian, ZHANG Hongyun. Semi-supervised Object Detection with Sequential Three-way Decision [J]. Computer Science, 2023, 50(10): 1-6.
[14] YAN Yuanting, MA Yingao, REN Yanping, ZHANG Yanping. Imbalanced Undersampling Based on Constructive Neural Network and Global Density Information [J]. Computer Science, 2023, 50(10): 48-58.
[15] HE Yulin, ZHU Penghui, HUANG Zhexue, Fournier-Viger PHILIPPE. Classification Uncertainty Minimization-based Semi-supervised Ensemble Learning Algorithm [J]. Computer Science, 2023, 50(10): 88-95.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!