Computer Science ›› 2019, Vol. 46 ›› Issue (12): 74-82.doi: 10.11896/jsjkx.190600158

• Big Data & Data Science • Previous Articles     Next Articles

Euler Kernel-based Data Stream Clustering Algorithm

ZHU Ying-wen1,2, YANG Jun1   

  1. (School of Computer Science and Engineering,Sanjiang University,Nanjing 210012,China)1;
    (College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)2
  • Received:2019-05-26 Online:2019-12-15 Published:2019-12-17

Abstract: With the advance of both cloud computing and internet of things,many applications generate huge amounts of data streams at fast speed.Examples include real-time surveillance systems,vehicle traffic monitoring systems,electricity consumption recording,and network traffic monitoring.Data stream mining has become a hot research topic.Its goal is to extract hidden knowledge/patterns from continuous data streams.Clustering,one of the most important problems in stream mining,has been highly explored.Different from traditional data clustering algorithm where given datasets are generally static and can be repeatedly read and processed,clustering data streams face more challenges due to having to satisfy such constraints as bounded memory,single-pass,real-time response and concept-drift detection.This paper pre-sented a new clustering algorithm for data streams,called EG-Stream,by combining the Euler kernel method with the Growing Neural Gas(GNG) model.It can not only maintain the benefit of nonlinear modeling using kernel function,but also significantly solve the large scale computational problem in kernel-based clustering.Euler kernel is relying on a nonlinear and robust cosine metric that is less sensitive to outliers.More important,it intrinsically induces an empirical map which maps data onto a complex space of the same dimension,and it takes these advantages to measure the similari-ty between data in a robust way without increasing the dimensionality of data,which avoids the problem that other kernel clustering algorithms can not deal with data streams.Although this method is embarrassingly simple just by incorporating the Euler kernel into GNG,the experimental results on variety of UCI datasets indicate that this method can still achieve comparable or even better performance than G-Stream algorithm,and identify the structural information from stream data.

Key words: Data stream clustering, Euler kernel, GNG, Kernel method

CLC Number: 

  • TP391
[1]NGUYEN H L,WOON Y K,NG W K.A survey on data stream clustering and classification[J].Knowledge and information systems,2015,45(3):535-569.
[2]GABER M M,ZASLAVSKY A,KRISHNASWAMY S.Mining data streams:a review[J].ACM Sigmod Record,2005,34(2):18-26.
[3]GAMA J,RODRIGUESP P.An overview on mining data streams[M]//Foundations of Computational,Intelligence Volume 6.Berlin:Springer,2009:29-45.
[4]AGGARWAL C C.Data streams:An overview and scientific applications[M]//Scientific Data Mining and Knowledge Discovery.Berlin:Springer,2009:377-397.
[5]SILVA J A,FARIA E R,BARROS R C,et al.Data stream clustering:A survey[J].ACM Computing Surveys (CSUR),2013,46(1):13.
[6]LI Y,YANG G,HE H,et al.A study of large-scale data clustering based on fuzzy clustering[J].Soft Computing,2016,20(8):3231-3242.
[7]ZHANG P,SHEN Q.Fuzzy c-means based coincidental link filtering in support of inferring social networks from spatiotemporal data streams[J].Soft Computing,2018,22(21):7015-7025.
[8]O’CALLAGHAN L,MISHRA N,MEYERSON A,et al. Streaming-data algorithms for high-quality clustering[C]//Proceedings 18th International Conference on Data Engineering.IEEE,2002:685-694.
[9]AGGARWAL C C,HAN J,WANG J,et al.A framework for clustering evolving data streams[C]//Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29.VLDB Endowment,2003:81-92.
[10]AGGARWAL C C,HAN J,WANG J,et al.A framework for projected clustering of high dimensional data streams[C]//Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30.VLDB Endowment,2004:852-863.
[11]ZHOU A,CAO F,QIAN W,et al.Tracking clusters in evolving data streams over sliding windows[J].Knowledge and Information Systems,2008,15(2):181-214.
[12]UDOMMANETANAKIT K,RAKTHANMANON T,WAIYAMAI K.E-stream:Evolution-based technique for stream clustering[C]//International Conference on Advanced Data Mining and Applications.Berlin:Springer,2007:605-615.
[13]LÜHR S,LAZARESCU M.Incremental clustering of dynamic data streams using connectivity based representative points[J].Data & Knowledge Engineering,2009,68(1):1-27.
[14]CAO F,ESTERT M,QIAN W,et al.Density-based clustering over an evolving data stream with noise[C]//Proceedings of the 2006 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2006:328-339.
[15]TASOULIS D K,ROSS G,ADAMS N M.Visualising the cluster structure of data streams[C]//International Symposium on Intelligent Data Analysis.Berlin:Springer,2007:81-92.
[16]KRIEGEL H P,KRÖGER P,NTOUTSI I,et al.Density based subspace clustering over dynamic data[C]//International Conference on Scientific and Statistical Database Management.Berlin:Springer,2011:387-404.
[17]CHEN Y,TU L.Density-based clustering for real-time stream data[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2007:133-142.
[18]WAN L,NG W K,DANG X H,et al.Density-based clustering of data streams at multiple resolutions[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2009,3(3):14.
[19]PARK N H,LEE W S.Statistical grid-based clustering over data streams[J].Acm Sigmod Record,2004,33(1):32-37.
[20]DANG X H,LEE V,NG W K,et al.An EM-based algorithm for clustering data streams in sliding windows[C]//International Conference on Database Systems for Advanced Applications.Berlin:Springer,2009:230-235.
[21]SMITH T,ALAHAKOON D.Growing self-organizing map for online continuous clustering[M]//Foundations of ComputationalIntelligence Volume 4.Berlin:Springer,2009:49-83.
[22]GHESMOUNE M,LEBBAH M,AZZAG H.A new growing neural gas for clustering data streams[J].Neural Networks,2016,78:36-50.
[23]SCHOLKOPF B,SMOLA A,MULLER K R.Nonlinear component analysis as a kernel eigenvalue problem[J].Neural computation,1998,10(5):1299-1319.
[24]MIKA S,RATSCH G,WESTON J,et al.Fish discriminant analysis with kernels[C]//Proceedings of IEEE Neural Networks for Signal Processing Workshop(NNSP).Madison,WI:IEEE Press,1999:41-48.
[25]ZHANG D Q.Kernel-Based Associative Memories,Clustering Algorithms and their Applications[D].Nanjing:Nanjing University of Aeronautics and Astronautics,2004.(in Chinese)
张道强.基于核的联想记忆及聚类算法的研究与应用[D].南京:南京航空航天大学,2004.
[26]ZHANG D Q,CHEN S C.Kernel-Based fuzzy and possibilities C-Means clustering[C]//Proceedings of the 13th International Conference on Artificial Neural Networks.Istanbul,Turkey:Springer,2003:122-125.
[27]GIROLAMI M.Mercer-based clustering in feature space[J]. IEEE Transactions on Neural Networks,2002,13(3):780-784.
[28]LIWICKI S,TZIMIROPOULOS G,ZAFEIRIOU S,et al.Euler principal component analysis[J].International Journal of Computer Vision,2013,101(3):498-518.
[29]WU J S,ZHENG W S,LAI J H.Euler Clustering[C]//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.2013:1792-1798.
[30]CAO F,ESTERT M,QIAN W,et al.Density-based clustering over an evolving data stream with noise[C]//Proceedings of the 2006 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2006:328-339.
[31]STREHL A,GHOSH J.Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2002,3:583-617.
[32]RAND W M.Objective criteria for the evaluation of clustering methods[J].Journal of the American Statistical Association,1971,66(336):846-850.
[1] XU Jian-min, SUN Peng, WU Shu-fang. Microblog Rumor Detection Method Based on Propagation Path Tree Kernel Learning [J]. Computer Science, 2022, 49(6): 342-349.
[2] WANG Zhong-yuan, LIU Jing-lei. Kernel Subspace Clustering Based on Second-order Neighbors [J]. Computer Science, 2021, 48(6): 86-95.
[3] GONG Qu and XU Kai-qiang. Parameter-less Supervised Kernel Locality Preserving Projection and Face Recognition [J]. Computer Science, 2016, 43(9): 301-304.
[4] ZHENG Jian-wei, KONG Chen-chen, WANG Wan-liang, QIU Hong and ZHANG Hang-ke. Kernel-based Supervised Neighborhood Projection Analysis Algorithm [J]. Computer Science, 2016, 43(6): 312-315.
[5] WANG Pei-yan and CAI Dong-feng. Distance-based Kernel Evaluation Measure [J]. Computer Science, 2014, 41(2): 72-75.
[6] QU Wu,WANG Li-jun and HAN Xiao-guang. Distributed Data Stream Clustering Based on LSH on Cloud Environments [J]. Computer Science, 2014, 41(11): 195-202.
[7] . Face Feature Extraction Based on Weighted Multiple Kernel Fisher Discriminant Analysis [J]. Computer Science, 2012, 39(9): 262-265.
[8] FANG Yi-min,ZHANG Ling,SUN Wei-min,XU Bao-guo. SMO Algorithm for Resolving Huber-SVR with Non-positive Kernels [J]. Computer Science, 2010, 37(7): 212-216.
[9] . [J]. Computer Science, 2009, 36(3): 156-157.
[10] . [J]. Computer Science, 2008, 35(12): 190-191.
[11] JIA Lei LIAO Shi-zhong (School of Computer Science and Technology,Tianjin University,Tianjin 300072,China). [J]. Computer Science, 2008, 35(12): 148-150.
[12] . [J]. Computer Science, 2007, 34(9): 131-134.
[13] . [J]. Computer Science, 2006, 33(6): 172-174.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!