Computer Science ›› 2022, Vol. 49 ›› Issue (3): 92-98.doi: 10.11896/jsjkx.210200047

Special Issue: Big Data & Data Scinece

• Database & Big Data & Data Science • Previous Articles     Next Articles

Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight

XIA Yuan1, ZHAO Yun-long1,2, FAN Qi-lin1   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210023,China
  • Received:2021-02-04 Revised:2021-07-08 Online:2022-03-15 Published:2022-03-15
  • About author:XIA Yuan,born in 1995,postgraduate.His main research interests include data mining and so on.
    ZHAO Yun-long,born in 1975,Ph.D,professor,is a member of China Computer Federation.His main research interests include pervasive computing,collective computing,wearable computing and swarm intelligence.

Abstract: In the dynamic data stream,due to its instability and the existence of concept drift,the ensemble classification model needs the ability to adapt to the new environment in time.At present,the weight of the base classifier is usually updated by using the supervision information,so as to give higher weight to the base classifier suitable for the current environment.However,supervision information cannot be obtained immediately in a real data stream environment.In order to solve this problem,this paper presents a data stream ensemble classification algorithm,which updates the weight of the base classifier through information entropy.Firstly,the random feature subspace is used to initialize each base classifier to construct the ensemble classifier.Secondly,a new base classifier is constructed based on each new data block to replace the base classifier with the lowest weight in the ensemble.Then,the weight update strategy based on information entropy will update the weights in the base classifier in real time.Finally,the base classifier that meets the requirements participates in weighted voting to obtain the classification result.Comparing the proposed algorithm with several other classic learning algorithms,the experimental results show that the proposed me-thod has obvious advantages in classification accuracy and is suitable for various types of concept drift environments.

Key words: Classification, Concept drift, Data stream, Ensemble algorithm, Information entropy

CLC Number: 

  • TP391
[1]KRAWCZYK B,MINKU L L,GAMA J,et al.Ensemble lear-ning for data stream analysis:A survey[J].Information Fusion,2017,37:132-156.
[2]KHAMASSI I,SAYED-MOUCHAWEH M,HAMMAMI M,et al.Discussion and review on evolving data streams and concept drift adapting[J].Evolving Systems,2018,9(1):1-23.
[3]STREET W N,KIM Y S.A streaming ensemble algorithm(SEA) for large-scale classification[C]//Proc. of the Acm Sigkdd Int. Conference on Knowledge Discovery & Data Mining.2001:377-382.
[4]WANG H,FAN W,YU P S,et al.Mining concept-drifting data streams using ensemble classifiers[C]//Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2003:226-235.
[5]BRZEZINSKI D,STEFANOWSKI J.Reacting to different types of concept drift:The accuracy updated ensemble algorithm[J].IEEE Transactions on Neural Networks and Learning Systems,2013,25(1):81-94.
[6]ELWELL R,POLIKAR R.Incremental learning of concept drift in nonstationary environments[J].IEEE Transactions on Neural Networks,2011,22(10):1517-1531.
[7]LV Y,PENG S,YUAN Y,et al.A classifier using online bagging ensemble method for bigdata stream learning[J].Tsinghua Science and Technology,2019,24(4):379-388.
[8]KOLTER J Z,MALOOF M A.Dynamic weighted majority:An ensemble method for drifting concepts[J].Journal of Machine Learning Research,2007,8(12):2755-2790.
[9]PESARANGHADER A,VIKTOR H,PAQUET E.Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams[J].Machine Learning,2018,107(11):1711-1743.
[10]OLORUNNIMBE M K,VIKTOR H L,PAQUET E.Dynamic adaptation of online ensembles for drifting data streams[J].Journal of Intelligent Information Systems,2018,50(2):291-313.
[11]REN S,LIAO B,ZHU W,et al.Knowledge-maximized ensemblealgorithm for different types of concept drift[J].Information Sciences,2018,430:261-281.
[12]CANO A,KRAWCZYK B.Kappa Updated Ensemble for drifting data stream mining[J].Machine Learning,2020,109(1):175-218.
[13]RAMÍREZ-GALLEGO S,KRAWCZYK B,GARCÍA S,et al.A survey on data preprocessing for data stream mining:Current status and future directions[J].Neurocomputing,2017,239:39-57.
[14]LOSING V,HAMMER B,WERSING H.KNN classifier with self adjusting memory for heterogeneous concept drift[C]//2016 IEEE 16th International Conference on Data Mining (ICDM).IEEE,2016:291-300.
[15]ZHOU Z H.Machine learning[M].Beijing:Tsinghua University Press,2016:211-214.
[16]SHANNON C E.A mathematical theory of communication[J].ACM SIGMOBILE Mobile Computing and Communications Review,2001,5(1):3-55.
[17]BIFET A,HOLMES G,PFAHRINGER B,et al.Moa:Massive online analysis,a framework for stream classification and clustering[C]//Proceedings of the First Workshop on Applications of Pattern Analysis.PMLR,2010:44-50.
[18]DOMINGOS P,HULTEN G.Mining high-speed data streams[C]//Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2000:71-80.
[19]AGRAWAL R,IMIELINSKI T,SWAMI A.Database mining:A performance perspective[J].IEEE Transactions on Knowledge and Data Engineering,1993,5(6):914-925.
[20]LANGLEY P,IBA W,THOMPSON K.An analysis of Bayesian classifiers[C]//AAAI.1992:223-228.
[21]OZA N C,RUSSELL S.Experimental comparisons of online and batch versions of bagging and boosting[C]//Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2001:359-364.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[5] LI Xia, MA Qian, BAI Mei, WANG Xi-te, LI Guan-yu, NING Bo. RIIM:Real-Time Imputation Based on Individual Models [J]. Computer Science, 2022, 49(8): 56-63.
[6] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[7] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[8] CHEN Yuan-yuan, WANG Zhi-hai. Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition [J]. Computer Science, 2022, 49(7): 25-30.
[9] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[10] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[11] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[12] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[13] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[14] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[15] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!