Computer Science ›› 2018, Vol. 45 ›› Issue (2): 125-129.doi: 10.11896/j.issn.1002-137X.2018.02.022

Previous Articles     Next Articles

Clustering Algorithm Based on Shared Nearest Neighbors and Density Peaks

LIU Yi-zhi, CHENG Ru-feng and LIANG Yong-quan   

  • Online:2018-02-15 Published:2018-11-13

Abstract: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors(FKNN-DPC) is a simple and efficient clustering algorithm,which can automatically detect the cluster center and assign the non-cluster center sample based on weighted K-nearest neighbors quickly and accurately.It is powerful in recognizing high quality cluster in any scale ,any dimension,any size and any shape of the data set,but the weight calculation in assigning strategies only considers the Euclidean distance between samples.In this paper,a similarity measure based on shared neighborhood was proposed,and the sample assigning strategy was improved by this similarity,so that the cluster is more consistent with the real attribution,thus improving the clustering quality.The effectiveness of the algorithm is verified by comparing the experiments on the UCI real data set with the K-means,DBSCAN,AP,DPC,and FKNN-DPC algorithm.

Key words: Clustering,Shared nearest neighbors,Similarity measure,Density peak

[1] JAIN A K,DUBES R C.Algorithms for clustering data[M].Upper Saddle Riever:Prentice Hall,1988.
[2] BERKHIN P.A Survey of Clustering Data Mining Techniques[J].Grouping Multidimensional Data,2006,43(1):25-71.
[3] XU R,ND W D.Survey of clustering algorithms[J].IEEETransactions on Neural Networks,2005,16(3):645-678.
[4] GELBARD R,GOLDMAN O,SPIEGLER I.Investigating diversity of clustering methods:An empirical comparison[J].Data & Knowledge Engineering,2007,63(1):155-166.
[5] SUN J G,LIU J,ZHAO L Y.Clustering Algorithms Research[J].Journal of Software,2008,19(1):48-61.(in Chinese) 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[6] MACQUEEN J.Some Methods for Classification and Analysisof MultiVariate Observations[C]∥Proc.of Berkeley Symposiumon Mathematical Statistics and Probability.1967:281-297.
[7] GUHA S,RASTOGI R,SHIM K,et al.CURE:An EfficientClustering Algorithm for Large Databases[J].Information Systems,1998,26(1):35-58.
[8] KARYPIS G,HAN E H,KUMAR V.CHAMELEON A hierarchical clustering algorithm using dynamic modeling[J].Compu-ter,1999,32(8):68-75.
[9] ESTER M,KRIEGEL H P,XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]∥ International Conference on Knowledge Discovery and Data Mining.1996:226-231.
[10] FAN J.OPE-HCA:an optimal probabilistic estimation approach for hierarchical clustering algorithm[J/OL].Neural Computing &Applications,2015:1-11.https://link.springer.com/article/10.1007%2Fs00521-015-1998-5.
[11] AGRAWAL R,GEHRKE J,GUNOPULOS D,et al.Automatic subspace clustering of high dimensional data for data mining applications[M]∥ACM SIGMOD Record.ACM,1998:94-105.
[12] FREY B J,DUECK D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[13] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[14] DU M,DING S,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145.
[15] MEHMOOD R,ZHANG G,BIE R,et al.Clustering by fastsearch and find of density peaks via heat diffusion[J].Neurocomputing,2016,208(C):210-217.
[16] XIE J Y,GAO H C,XIE W X.K-nearest neighbors optimized clustering algorithm by fast search and nding the density peaks of a dataset[J].Scientia Sinica Informationis,2016,46(2):258-280.(in Chinese) 谢娟英,高红超,谢维信.K 近邻优化的密度峰值快速搜索聚类算法[J].中国科学:信息科学,2016,46(2):258-280.
[17] XIE J,GAO H,XIE W,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(C):19-40.
[18] ZHANG W,LI J.Extended fast search clustering algorithm:widely density clusters,no density peaks[J].arXiv preprint arXiv:1505.05610,2015.
[19] LICHMAN M.UCI Machine Learning Repository .ht-tp://archive.ics.uci.edu/ml.
[20] VINH N X,EPPS J,BAILEY J.Information Theoretic Mea-sures for Clusterings Comparison:Variants,Properties,Norma-lization and Correction for Chance[C]∥International Confe-rence on Machine Learning(ICML 2009).Montreal,Quebec,Canada,2009:2837-2854.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!