Computer Science ›› 2013, Vol. 40 ›› Issue (10): 226-230.

Previous Articles     Next Articles

Double k-nearest Neighbors of Heterogeneous Data Stream Clustering Algorithm

HUANG De-cai,SHEN Xian-qiao and LU Yi-hong   

  • Online:2018-11-16 Published:2018-11-16

Abstract: On the one hand,most of the existing data stream clustering algorithm can handle data with numerical attri-bute,but can not cope with the data containing both numeric and classification attributes.On the other hand,there is also a lot of room for heterogeneous data stream algorithms to improve standardization and clustering of data.So,double k-nearest neighbors of heterogeneous data stream clustering algorithm was proposed.The algorithm uses CluStream’s online and offline framework with proposing three steps of clustering thought.Firstly,the algorithm uses double k-nearest neighbors and improved dimension distance to form micro clusters.Secondly,the algorithm uses dynamic standardization data method and cosine model based on mean value to form initial macro clusters.Thirdly,the algorithm uses cosine model based on mean value and priori clusters to do macro clustering optimization.Experimental results demonstrate that the proposed method improves clustering’s accuracy and scalability.

Key words: Data stream,Heterogeneous,Clustering,Double k-nearest neighbors

[1] 屠莉,陈崚,绉凌君.数据流的网格密度聚类算法[J].小型微型计算机系统,2009,0(7):1376-1383
[2] 王述云,胡运发,范颖捷,等.基于距离与熵的混合属性数据流聚类算法[J].小型微型计算机系统,2010,31(12):2365-2372
[3] Marques J P.Pattern recognition concepts,methods and applications[M].Beijing:Tsinghua University Press,2002:51-74
[4] Huang Z.Extensions to the K-means algorithm for clustering large datasets with categorical values[J].Data Mining and Knowledge Discovery II,1998(2):283-304
[5] Huang Z,Ma N G.Fuzzy K-modes algorithm for clustering categorical data[J].IEEE Transactions on Fuzzy Systems,1999,7(4):446-452
[6] Aggarwal C,Han J,Wang J,et a1.A Framework for Clustering Evolving Data Streams[C]∥Proceedings of 29th Very Large Data Bases Conference.2003,81-92
[7] Aggarwal C C,Yu P S.A framework for clustering massive text and categorical data st reams[C]∥Proc of the 6th SIAM Int Conf on Data Mining.Bethesda,2006:477-481
[8] 杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,0(8):1364-1372
[9] Hsu C C,Huang Y.Incremental clustering of mixed data based on distance hierarchy[J].Expert Systems with Applications,2008,35(3):1177-1185
[10] 黄德才,吴天虹.基于密度的混合属性数据流聚类算法[J].控制与决策,2010,5(3):416-422
[11] 刘青宝,邓苏,张维明.基于相对密度的聚类算法[J].计算机科学,2007,4(2):192-196
[12] 李桃迎,陈燕,张金松,等.基于聚类融合的混合属性数据增量聚类算法[J].控制与决策,2010,7(4):603-609
[13] 周津,陈超,俞能海.采用对象特征向量表示法的标签聚类算法[J].小型微型计算机系统,2012,3(3):525-531

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!