计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 151-157.doi: 10.11896/jsjkx.200100112
汤鑫瑶, 张正军, 储杰, 严涛
TANG Xin-yao, ZHANG Zheng-jun, CHU Jie, YAN Tao
摘要: 针对密度峰值聚类算法(Density Peaks Clustering,DPC)需要人为指定截断距离dc,以及局部密度定义简单和一步分配策略导致算法在复杂数据集上表现不佳的问题,提出了一种基于自然最近邻的密度峰值聚类算法(Density Peaks Clustering based on Natural Nearest Neighbor,NNN-DPC)。该算法无需指定任何参数,是一种非参数的聚类方法。该算法首先根据自然最近邻的定义,给出新的局部密度计算方法来描述数据的分布,揭示内在的联系;然后设计了两步分配策略来进行样本点的划分。最后定义了簇间相似度并提出了新的簇合并规则进行簇的合并,从而得到最终聚类结果。实验结果表明,在无需参数的情况下,NNN-DPC算法在各类数据集上都有优秀的泛化能力,对于流形数据或簇间密度差异大的数据能更加准确地识别聚类数目和分配样本点。与DPC、FKNN-DPC(Fuzzy Weighted K-nearest Density Peak Clustering)以及其他3种经典聚类算法的性能指标相比,NNN-DPC算法更具优势。
中图分类号:
[1]MACQUEEN J.Some Methods for Classificati-on and Analysis of MultiVariate Observations[C]//Proc of Berkeley Sympo-sium on Mathematical Statistics & Probability.1967,1(14):281-297. [2]KAUFMAN L,ROUSSEEUW P.Clustering by means of medoids[C]//Statistical Data Analysis Based on the L1-norm & Related Methods.North-Holland,1987:405-416. [3]FREY B J,DUECK D.Clustering by Passing Messages Between Data Points [J].Science,2007,315(5814):972-976. [4]ESTER M,KRIEGEL H P,SANDER J,et al.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Internation-al Conference on Knowledge Discovery & Data Mining.1996:226-231. [5]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks [J].Science,2014,344(6191):1492-1496. [6]YANG Z,WANG H J,ZHOU Y.A clusteri-ng algorithm based on cutoff distance and adaptive cluster center [J].Data Analysis and Knowledge Discovery,2018,2(3):39-48. [7]WANG Y,ZHANG G Z.Density peak algorithm by automatically determining cluster centers [J].Computer Engineering and Applications,2018,54(8):137-142. [8]LIU R,WANG H,YU X.Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226. [9]DU M J,DING S F,JIA H J.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145. [10]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,345:19-40. [11]YANG L J,ZHU Q S,HUANG J L,et al.Adap-tive edited na-tural neighbor algorithm[J].Neurocomputing,2017,230:427-433. [12]HUANG J L,ZHU Q S,YANG L J,et al.QCC:a novel clustering algorithm based on quasicluster centers[J].Machine Lear-ning,2017,106(3):337-357. [13]ZHU Q S,FENG J,HUANG J L.Natural neighbor:a self adaptive neighborhood method without parameter K[J].Pattern Recognition Letters,2016,80:30-36. [14]STEVENS S S.Mathematics,measurement,and psychophysics[C]//Handbook of Experimental Psychology.London:Wiley,1951:1-49. [15]ZOU X L.Application of Natural Nearest Neighbor in High-dimensional Data Structure Learning[D].Chongqing:Chongqing University,2011. [16]HUANG J L.Research on Parameterless Clustering Algorithm Based on Natural Nearest Neighbor[D].Chongqing:Chongqing University,2014. [17]VINH N X,EPPS J,BAILEY J.Information theoretic measures for clusterings comparison:Variants,properties,normalization and correction for chance[J].Journal of Machine Learning Research,2010,11(Oct):2837-2854. [18]FOWLKES E B,MALLOWS C L.A method for comparing two hierarchical clusterings[J].Journal of the American Statistical Association,1983,78(383):553-569. [19]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python[J].Journal of Machine Learning Research,2011,12(Oct):2825-2830. [20]CHANG H,YEUNG D Y.Robust path-based spectral cluste-ring[J].Pattern Recognition,2008,41(1):191-203. [21]GIONIS A,MANNILA H,TSAPARAS P.Clusteri-ng aggregation[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):4. [22]VEENMAN C J,REINDERS M J T,BACKER E.A maximum variance cluster algorithm[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(9):1273-1280. [23]FRANTI P,VIRMAJOKI O,HAUTAMAKI V.Fast agglomera-tive clustering using a k-nearest neighbor graph[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(11):1875-1881. [24]BACHE K,LICHMAN M.UCI machine learning repository[EB/OL].[2019-12-20].http://archive.ics.uci.edu/ml. |
[1] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[2] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[3] | 李杉, 许新征. 基于双角度并行剪枝的VGG16优化方法 Parallel Pruning from Two Aspects for VGG16 Optimization 计算机科学, 2021, 48(6): 227-233. https://doi.org/10.11896/jsjkx.200800016 |
[4] | 王茂光, 杨行. 一种基于AP-Entropy选择集成的风控模型和算法 Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble 计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110 |
[5] | 王卫东, 徐金慧, 张志峰, 杨习贝. 基于密度峰值聚类的高斯混合模型算法 Gaussian Mixture Models Algorithm Based on Density Peaks Clustering 计算机科学, 2021, 48(10): 191-196. https://doi.org/10.11896/jsjkx.200800191 |
[6] | 张煜, 陆亿红, 黄德才. 基于密度峰值的加权犹豫模糊聚类算法 Weighted Hesitant Fuzzy Clustering Based on Density Peaks 计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043 |
[7] | 徐守坤, 倪楚涵, 吉晨晨, 李宁. 基于YOLOv3的施工场景安全帽佩戴的图像描述 Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 计算机科学, 2020, 47(8): 233-240. https://doi.org/10.11896/jsjkx.190600109 |
[8] | 张琴, 陈红梅, 封云飞. 一种基于粗糙集和密度峰值的重叠社区发现方法 Overlapping Community Detection Method Based on Rough Sets and Density Peaks 计算机科学, 2020, 47(5): 72-78. https://doi.org/10.11896/jsjkx.190400160 |
[9] | 陈俊芬,张明,赵佳成. 复杂高维数据的密度峰值快速搜索聚类算法 Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional Data 计算机科学, 2020, 47(3): 79-86. https://doi.org/10.11896/jsjkx.190400123 |
[10] | 田献珍, 孙立强, 田振中. 基于蚁群算法的图像重建 Image Reconstruction Based on Ant Colony Algorithm 计算机科学, 2020, 47(11A): 231-235. https://doi.org/10.11896/jsjkx.191000128 |
[11] | 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用 Application of Improved DBSCAN Algorithm on Spark Platform 计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071 |
[12] | 李晓光, 邵超. 基于网格数据中心的密度峰值聚类算法 Density Peak Clustering Algorithm Based on Grid Data Center 计算机科学, 2019, 46(6A): 457-460. |
[13] | 张建新, 刘弘, 李焱. 一种面向人群疏散的高效分组方法 Efficient Grouping Method for Crowd Evacuation 计算机科学, 2019, 46(6): 231-238. https://doi.org/10.11896/j.issn.1002-137X.2019.06.035 |
[14] | 李长镜,赵书良,池云仙. 一种基于谱嵌入和局部密度的离群点检测算法 Outlier Detection Algorithm Based on Spectral Embedding and Local Density 计算机科学, 2019, 46(3): 260-266. https://doi.org/10.11896/j.issn.1002-137X.2019.03.039 |
[15] | 胡闯, 杨庚, 白云璐. 面向差分隐私保护的聚类算法 Clustering Algorithm in Differential Privacy Preserving 计算机科学, 2019, 46(2): 120-126. https://doi.org/10.11896/j.issn.1002-137X.2019.02.019 |
|