计算机科学 ›› 2018, Vol. 45 ›› Issue (2): 125-129.doi: 10.11896/j.issn.1002-137X.2018.02.022

• 第六届全国智能信息处理学术会议 • 上一篇    下一篇

一种基于共享近邻的密度峰值聚类算法

刘奕志,程汝峰,梁永全   

  1. 山东科技大学计算机科学与工程学院 山东 青岛266590 山东省智慧矿山信息技术重点实验室 山东 青岛266590,山东科技大学计算机科学与工程学院 山东 青岛266590 山东省智慧矿山信息技术重点实验室 山东 青岛266590,山东科技大学计算机科学与工程学院 山东 青岛266590 山东省智慧矿山信息技术重点实验室 山东 青岛266590
  • 出版日期:2018-02-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61203305,2),山东省自然科学基金(ZR2015FM013),山东省重点研发计划(攻关)(2016GSF120012),山东省“泰山学者”攀登计划资助

Clustering Algorithm Based on Shared Nearest Neighbors and Density Peaks

LIU Yi-zhi, CHENG Ru-feng and LIANG Yong-quan   

  • Online:2018-02-15 Published:2018-11-13

摘要: 基于加权K近邻的密度峰值发现算法(FKNN-DPC)是一种简单、高效的聚类算法,能够自动发现簇中心,并采用加权K近邻的思想快速、准确地完成对非簇中心样本的分配,在各种规模、任意维度、任意形状的数据集上都能得到高质量的聚类结果,但其样本分配策略中的权重仅考虑了样本间的欧氏距离。文中提出了一种基于共享近邻的相似度度量方式,并以此相似度改进样本分配策略,使得样本的分配更符合真实的簇归属情况,从而提高聚类质量。在UCI真实数据集上进行实验,并将所提算法与K-means,DBSCAN,AP,DPC,FKNN-DPC等算法进行对比,验证了其有效性。

关键词: 聚类,共享近邻,相似性度量,密度峰值

Abstract: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors(FKNN-DPC) is a simple and efficient clustering algorithm,which can automatically detect the cluster center and assign the non-cluster center sample based on weighted K-nearest neighbors quickly and accurately.It is powerful in recognizing high quality cluster in any scale ,any dimension,any size and any shape of the data set,but the weight calculation in assigning strategies only considers the Euclidean distance between samples.In this paper,a similarity measure based on shared neighborhood was proposed,and the sample assigning strategy was improved by this similarity,so that the cluster is more consistent with the real attribution,thus improving the clustering quality.The effectiveness of the algorithm is verified by comparing the experiments on the UCI real data set with the K-means,DBSCAN,AP,DPC,and FKNN-DPC algorithm.

Key words: Clustering,Shared nearest neighbors,Similarity measure,Density peak

[1] JAIN A K,DUBES R C.Algorithms for clustering data[M].Upper Saddle Riever:Prentice Hall,1988.
[2] BERKHIN P.A Survey of Clustering Data Mining Techniques[J].Grouping Multidimensional Data,2006,43(1):25-71.
[3] XU R,ND W D.Survey of clustering algorithms[J].IEEETransactions on Neural Networks,2005,16(3):645-678.
[4] GELBARD R,GOLDMAN O,SPIEGLER I.Investigating diversity of clustering methods:An empirical comparison[J].Data & Knowledge Engineering,2007,63(1):155-166.
[5] SUN J G,LIU J,ZHAO L Y.Clustering Algorithms Research[J].Journal of Software,2008,19(1):48-61.(in Chinese) 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[6] MACQUEEN J.Some Methods for Classification and Analysisof MultiVariate Observations[C]∥Proc.of Berkeley Symposiumon Mathematical Statistics and Probability.1967:281-297.
[7] GUHA S,RASTOGI R,SHIM K,et al.CURE:An EfficientClustering Algorithm for Large Databases[J].Information Systems,1998,26(1):35-58.
[8] KARYPIS G,HAN E H,KUMAR V.CHAMELEON A hierarchical clustering algorithm using dynamic modeling[J].Compu-ter,1999,32(8):68-75.
[9] ESTER M,KRIEGEL H P,XU X.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]∥ International Conference on Knowledge Discovery and Data Mining.1996:226-231.
[10] FAN J.OPE-HCA:an optimal probabilistic estimation approach for hierarchical clustering algorithm[J/OL].Neural Computing &Applications,2015:1-11.https://link.springer.com/article/10.1007%2Fs00521-015-1998-5.
[11] AGRAWAL R,GEHRKE J,GUNOPULOS D,et al.Automatic subspace clustering of high dimensional data for data mining applications[M]∥ACM SIGMOD Record.ACM,1998:94-105.
[12] FREY B J,DUECK D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[13] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[14] DU M,DING S,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145.
[15] MEHMOOD R,ZHANG G,BIE R,et al.Clustering by fastsearch and find of density peaks via heat diffusion[J].Neurocomputing,2016,208(C):210-217.
[16] XIE J Y,GAO H C,XIE W X.K-nearest neighbors optimized clustering algorithm by fast search and nding the density peaks of a dataset[J].Scientia Sinica Informationis,2016,46(2):258-280.(in Chinese) 谢娟英,高红超,谢维信.K 近邻优化的密度峰值快速搜索聚类算法[J].中国科学:信息科学,2016,46(2):258-280.
[17] XIE J,GAO H,XIE W,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(C):19-40.
[18] ZHANG W,LI J.Extended fast search clustering algorithm:widely density clusters,no density peaks[J].arXiv preprint arXiv:1505.05610,2015.
[19] LICHMAN M.UCI Machine Learning Repository .ht-tp://archive.ics.uci.edu/ml.
[20] VINH N X,EPPS J,BAILEY J.Information Theoretic Mea-sures for Clusterings Comparison:Variants,Properties,Norma-lization and Correction for Chance[C]∥International Confe-rence on Machine Learning(ICML 2009).Montreal,Quebec,Canada,2009:2837-2854.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .