Computer Science ›› 2022, Vol. 49 ›› Issue (7): 25-30.doi: 10.11896/jsjkx.210600155

• Database & Big Data & Data Science • Previous Articles     Next Articles

Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

CHEN Yuan-yuan, WANG Zhi-hai   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China
  • Received:2021-06-19 Revised:2021-12-07 Online:2022-07-15 Published:2022-07-12
  • About author:CHEN Yuan-yuan,born in 1997,master.Her main research interests include data stream mining and unsupervised learning.
    WANG Zhi-hai,born in 1963.Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include data mining and business intelligence,machine learning and computation intelligence.
  • Supported by:
    National Natural Science Foundation of China(61771058).

Abstract: The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the k-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.

Key words: K-Means, Concept drift detection, Data stream mining, Histogram, Hypothetical test

CLC Number: 

  • TP391
[1]BARDDAL J P,GOMES H M,ENEMBRECK F,et al.A survey on feature drift adaptation[J].Journal of Systems and Software,2017,127(C):278-294.
[2]BEUTEL A,FALOUTSOS C.User behavior modeling andfraud detection[J].IEEE Intelligent Systems,2016,31(2):84-86.
[3]MELIDIS D P,SPILIOPOULOU M,NTOUTSI E.Learning under feature drifts in textual streams[C]//Proceedings of the 27th ACM International Conference on Information and Know-ledge Management.2018:527-536.
[4]PUSCHMANN D,BARNAGHI P,TAFAZOLLI R.Adaptiveclustering for dynamic IoT data streams[J].IEEE Internet of Things Journal,2016,4(1):64-74.
[5]GAMA J,ŽLIOBAITÉ I,BIFET A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys(CSUR),2014,46(4):1-37.
[6]LU J,LIU A,DONG F,et al.Learning under concept drift:A review[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2346-2363.
[7]HU H,KANTARDZIC M,SETHI T S.No Free Lunch Theorem for concept drift detection in streaming data classification:A review[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2020,10(2):1327-1351.
[8]KUNCHEVA L I.Change Detection in Streaming Multivariate Data Using Likelihood Detectors[J].IEEE Annals of the His-tory of Computing,2013(5):1175-1180.
[9]BORACCHI G,CERVELLERA C,MACCIò D.Uniform histograms for change detection in multivariate data[C]//Procee-dings of the International Joint Conference on Neural Networks(IJCNN).IEEE,2017:1732-1739.
[10]BORACCHI G,CARRERA D,CERVELLERA C,et al.QuantTree:histograms for change detection in multivariate data streams[C]//Proceedings of the International Conference on Machine Learning.2018:639-648.
[11]GAMA J,MEDAS P,CASTILLO G,et al.Learning with drift detection[C]//Proceedings of Brazilian Symposium on Artificial Intelligence.Berlin:Springer,2004:286-295.
[12]LIU A,ZHANG G,LU J.Fuzzy time windowing for gradual concept drift adaptation[C]//Proceedings of IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).IEEE,2017:1-6.
[13]BIFET A,READ J,ŽLIOBAITÉ I,et al.Pitfalls in benchmar-king data stream classification and how to avoid them[C]//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2013:465-479.
[14]BASSEVILLE M,NIKIFOROV I V.Detection of abrupt changes:theory and application[M].Florida:Englewood Cliffs:Prentice Hall,1993.
[15]ALIPPI C,BORACCHI G,CARRERA D,et al.Change detection in multivariate datastreams:Likelihood and detectability loss[C]//Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI).SF:Morgan Kaufmann,2016:1368-1374.
[16]GAMA J.Knowledge discovery from data streams[M].London:CRC Press,2010.
[17]SILVERMAN B W.Density estimation for statistics and data analysis[M].Routledge,2018.
[18]SHENG Z,XIE S Q,PAN C Y.Probability theory and mathematical statistics[M].Beijing:Higher Education Press,2008.
[19]HE J R,DING L X,HU Q,et al.Properties of high-dimensional data space and Metric choice[J].Journal of Computer Science,2014,3(41):212-217.
[20]CARRERA D,BORACCHI G.Generating high-dimensionaldatastreams for change detection[J].Big Data Research,2018,11:11-21.
[21]LIU A,LU J,ZHANG G.Concept Drift Detection via Equal Intensity k-Means Space Partitioning[J].IEEE Transactions on Cybernetics,2020,51(6):3198-3211.
[22]DOS REIS D M,FLACH P,MATWIN S,et al.Fast unsupervised online drift detection using incremental kolmogorov-smirnov test[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:1545-1554.
[1] HAO Jie, PING Ping, FU De-yin, ZHAO Hong-ze. Bi-histogram Shifting Reversible Data Hiding Method After Compressed Differences [J]. Computer Science, 2022, 49(9): 340-346.
[2] CAO Yang-chen, ZHU Guo-sheng, SUN Wen-he, WU Shan-chao. Study on Key Technologies of Unknown Network Attack Identification [J]. Computer Science, 2022, 49(6A): 581-587.
[3] YANG Xu-hua, WANG Lei, YE Lei, ZHANG Duan, ZHOU Yan-bo, LONG Hai-xia. Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding [J]. Computer Science, 2022, 49(3): 121-128.
[4] KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173.
[5] DAN Zhou-yang, LIU Fen-lin, GONG Dao-fu. Smoothing Filter Detection Algorithm Based on Middle and Tail Information of Differential Histogram [J]. Computer Science, 2021, 48(11): 234-241.
[6] JIN Yu-fang, WU Xiang, DONG Hui, YU Li, ZHANG Wen-an. Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection [J]. Computer Science, 2021, 48(11): 268-275.
[7] TAN Jian-hao, YIN Wang, LIU Li-ming, WANG Yao-nan. Robust Long-term Adaptive Object Tracking Based onMulti-correlation Filtering Strategy [J]. Computer Science, 2020, 47(12): 169-176.
[8] DING Rong-li, LI Jie, ZHANG Man, LIU Yan-li, WU Wei. Ship Target Detection in Remote Sensing Image Based on S-HOG [J]. Computer Science, 2020, 47(11A): 248-252.
[9] WANG Qia, QI Yong. Method for Traffic Video Background Modeling Based on Inter-frame Difference and Statistical Histogram [J]. Computer Science, 2020, 47(10): 174-179.
[10] GUO Lan-ying, HAN Rui-zhi, CHENG Xin. Digital Instrument Identification Method Based on Deformable Convolutional Neural Network [J]. Computer Science, 2020, 47(10): 187-193.
[11] WANG Xiao, ZOU Ze-wei, LI Bo-bo, WANG Jing. Target Detection in Colorful Imaging Sonar Based on Multi-feature Fusion [J]. Computer Science, 2019, 46(6A): 177-181.
[12] HAN Xu, CHEN Hai-yun, WANG Yi, XU Jin. Face Recognition Using SPCA and HOG with Single Training Image Per Person [J]. Computer Science, 2019, 46(6A): 274-278.
[13] FAN Rong-rong, FAN Jia-qing, LIU Qing-shan. Real-time High-confidence Update Complementary Learner Tracking [J]. Computer Science, 2019, 46(3): 137-141.
[14] WU Fei, ZHAO Xin-can, ZHAN Peng-lei, GUAN Ling. FPFH Feature Extraction Algorithm Based on Adaptive Neighborhood Selection [J]. Computer Science, 2019, 46(2): 266-270.
[15] JIA Hong-jie, WANG Liang-jun, SONG He-ping. HMRF Semi-supervised Approximate Kernel k-means Algorithm [J]. Computer Science, 2019, 46(12): 31-37.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!