计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 97-105.doi: 10.11896/jsjkx.230500226
王心耕1, 杜韬1,2, 周劲1,2, 陈迪1, 仵匀政1
WANG Xingeng1, DU Tao1,2, ZHOU Jin1,2, CHEN Di1, WU Yunzheng1
摘要: 密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,该算法虽能自动发现簇中心,实现任意形状数据的高效聚类,但依然存在一些缺陷。针对密度峰值聚类算法在定义相关度量值时未考虑数据的位置信息、聚类中心数目需要人工预先设定且分配样本点时易出现连锁反应这3个缺陷,提出一种基于共享最近邻的自适应密度峰值聚类算法。首先,利用共享最近邻重新定义局部密度等度量值,充分考虑了数据分布的局部特点,使样本点的空间分布特征得以更好地体现;其次,通过引入密度衰减现象让样本点自动聚集成微簇,实现了簇个数自适应确定和簇中心自适应选取;最后,提出一种两阶段的分配方法,先将微簇合并形成簇的主干部分,再用上一步分配好的簇主干指导剩余点的分配,避免了链式反应的发生。在二维合成数据集以及UCI数据集上的实现表明,相较于经典的密度峰值聚类算法及近年来对其提出的改进算法,在大多数情况下,所提算法表现出更优异的性能。
中图分类号:
[1]SUN L,QIN X Y,XU J C,et al.Density Peak Clustering Algorithm Based on K-Nearest Neighbors and Optimal Assignment Strategy[J].Journal of Software,2022,33(4):1390-141. [2]CERQUITELLI T,VENTURA F,APILETTI D,et al.Enhancing manufacturing intelligence through an unsupervised data-driven methodology for cyclic industrial processes[J].Expert Systems with Applications,2021,182(3):115269. [3]YANG L,CHEUNG Y M,YUAN Y T.Self-Adaptive Multiprototype-Based Competitive Learning Approach:Ak-Means-Type Algorithm for Imbalanced Data Clustering[J].IEEE Transactions on Cybernetics,2019,51(3):1598-1612. [4]AHMAD A,KHAN S S.initKmix-A novel initial partitiongeneration algorithm for clustering mixed data using k-means-based clustering[J].Expert Systems with Applications,2020,167(2):114149. [5]JIANG J T,ZHENG C H.Density Peak and Grid Based Clustering for Large-scale Node Partition[J].Journal of Chinese Mini-Micro Computer Systems,2022,43(3):498-505. [6]SUN L,LIANG Y Q.Improved Clustering Algorithm FusingGrid Partition and DBSCAN[J].Computer Engineering and Applications,2022,58(14):73-79. [7]HU C A,WANG J X,MAO Y M.Density-based clustering algorithm based on groups and improve gravitational search[J].Application Research of Computers,2021,38(11):3293-3299. [8]GUO W J,WANG W H,ZHAO S P,et al.Density Peak Clustering with connectivity estimation[J].Knowledge-Based Systems,2022,243(5):108501. [9]ZHANG T,RAMAKRISHNAN R,LIVNY M.BIRCH:an efficient data clustering method for very large databases[J].ACM Sigmod Record,1996,25(2):103-114. [10]WANG R,ZHOU J,JIANG H,et al.A general transfer lear-ning-based Gaussian mixture modelfor clustering[J].Interna-tional Journal of Fuzzy Systems,2021,23(3):776-793. [11]LI K,ZHANG K X.Structural α-Entropy Weighting Gaussian Mixture Model for Subspace Clustering[J].Chinese Journal of Electronics,2022,50(3):718-725. [12]HARTIGAN J A,WONG M A.Algorithm AS 136:A K-Means Clustering Algorithm[J].Journal of the Royal Statistical Society,1979,28(1):100-108. [13]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.1996:226-231. [14]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [15]WEI L,GAO L,LI J H,et al.Traffic sub-area division method based on density peak clustering[J].Journal of Jilin University(Engineering and Technology Edition),2023,53(1):124-131. [16]WANG F Y,ZHANG D S,XIAO Y T.Density Peak Algorithm Based on Weighted Shared Nearest Neighbor and Accumulated Sequence[J].Computer Engineering,2022,48(4):61-69. [17]ZHANG X Y,YUN W G.Sharing K-nearest Neighbors andMultiple Assignment PoliciesDensity Peaks Clustering Algorithm[J].Journal of Chinese Computer Systems,2023,44(1):75-82. [18]DU M,DING S,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99(5):135-145. [19]JIANG J,CHEN Y,MENG X,et al.A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process[J].Physica A:Statistical Mechanics and Its Applications,2019,523(6):702-713. [20]LIU R,WANG H,YU X.Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226. [21]CHEN J,YU P.A domain adaptive density clustering algorithm for data with varying density distribution[J].IEEE Transactions on Knowledge and Data Engineering,2021,33(6):2310-2321. [22]LIU Y H,MA Z M,YU F.Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J].Knowledge-Based Systems,2017,133(10):208-220. [23]ZHANG Z,ZHU Q,ZHU F,et al.Density decay graph-based density peak clustering[J].Knowledge-Based Systems,2021,224:107075. [24]XIE J,GAO H,XIE W,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(8):19-40. [25]SUN L,QIN X,DING W,et al.Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy[J].Neurocomputing,2022,473:159-181. [26]GUO W,WANG W,ZHAO S,et al.Density peak clusteringwith connectivity estimation[J].Knowledge-Based Systems,2022,243:108501. |
|