计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 100-108.doi: 10.11896/jsjkx.220800074
申秋萍1, 张清华1,2, 高满1, 代永杨1
SHEN Qiuping1, ZHANG Qinghua1,2, GAO Man1, DAI Yongyang1
摘要: DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种经典的基于密度的聚类算法,它通过两个全局参数即半径Eps和最少点数MinPts,能够对任意形状的数据进行聚类,并自动确定类个数。但是,使用全局半径的DBSCAN对于密度不均匀数据集的聚类效果较差,且无法对重叠数据集进行聚类。因此,定义了密度递减原则和局部半径,并根据k-近邻距离自动确定局部半径,从而提出了基于局部半径的DBSCAN算法(LE-DBSCAN);然后,通过考虑近邻的标签,对二支聚类结果的临界点和噪声点进行重新划分,从而提出了基于局部半径的三支DBSCAN算法(LE3W-DBSCAN)。将LE-DBSCAN和LE3W-DBSCAN与该领域的相关算法在UCI数据集和人工数据集上进行对比,实验结果表明,所提算法在常用的硬聚类指标和软聚类指标上都具有较好的表现。
中图分类号:
[1]DAYANAM A,ELBA C T,MARCEL B,et al.Cluster analysis of urban ultrafine particles size distributions[J].Atmospheric Pollution Research,2019,10(1):45-52. [2]WU X D,ZHU X Q,WU G Q,et al.Data mining with big data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107. [3]HUANG J J,CHEN W,LIU A,et al.Cluster query:a new query pattern on temporal knowledge graph[J].World Wide Web,2020,23(2):755-779. [4]BROWN D,JAPA A,SHI Y.A fast density-grid based clustering method[C]//IEEE 9th Annual Computing and Communication Workshop and Conference.Piscataway,NJ:IEEE,2019:48-54. [5]GONDEAU A,AOUABED Z,HIJRI M,et al.Object weigh-ting:A new clustering approach to deal with outliers and cluster overlap in computational biology[J].IEEE ACM Transactions on Computational Biology and Bioinformatics,2021,18(2):633-643. [6]XU C Y,LIN R J,CAI J Y,et al.Deep image clustering by fusing contrastive learning and neighbor relation mining[J].Knowledge Based Systems,2022,238:107967. [7]SINGH S,GANIE A H.Applications of picture fuzzy similaritymeasures in pattern recognition,clustering,and MADM[J].Expert Systems with Applications,2021,168:114264. [8]XU Z Z,SHEN D R,NIE T Z,et al.A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data[J].Information Sciences,2021,572:574-589. [9]WANG X,WANG J,YU G X,et al.Network regularized bi-clustering for cancer subtype categorization[J].Chinese Journal of Computers,2019,42(6):1274-1288. [10]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [11]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226. [12]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.Portland,Oregon:AAAI Press,1996:226-231. [13]JAIN A K.Data clustering:50 years beyond k-means[J].Pattern Recognition Letters,2010,31(8):651-666. [14]D'URSO P.Informational Paradigm,management of uncertainty and theoretical formalisms in the clustering framework:A review[J].Information Sciences,2017,400:30-62. [15]PETERS G,CRESPO F A,LINGRAS P,et al.Soft clustering-Fuzzy and rough approaches and their extensions and derivatives[J].International Journal of Approximate Reasoning,2013,54(2):307-322. [16]YU H,JIAO P,YAO Y Y,et al.Detecting and refining overlapping regions in complex networks with three-way decisions[J].Information Sciences,2016,373:21-41. [17]YU H,ZHANG C,WANG G Y.A tree-based incremental overlapping clustering method using the three-way decision theory[J].Knowledge Based Systems,2016,91:189-203. [18]WANG P X,YAO Y Y.CE3:A three-way clustering methodbased on mathematical morphology[J].Knowledge Based Systems,2018,155:54-65. [19]ZHANG K.A three-way c-means algorithm[J].Applied Soft Computing,2019,82:105536. [20]MAJI P,PAL S K.RFCM:A hybrid clustering algorithm using rough and Fuzzy sets[J].Fundamenta Informaticae,2007,80(4):475-496. [21]HU L H,LIU H K,ZHANG J F,et al.KR-DBSCAN:A density-based clustering algorithm based on reverse nearest neighbor and influence space[J].Expert Systems With Applications,2021,186:115763. [22]CHEN F Y.An improved DBSCAN algorithm for adaptively determining parameters in multi-density environment[C]//2nd International Conference on Artificial Intelligence and Information Systems.New York:ACM,2021:1-4. [23]YU H,CHEN L Y,YAO J T,et al.A three-way clusteringmethod based on an improved DBSCAN algorithm[J].Physica A:Statistical Mechanics and its Applications,2019,535:122289. [24]GIONIS A,MANNILA H,TSAPARAS P.Clustering aggregation[J].ACM Transactions on Knowledge Discovery from Data,2007,1(1):1-30. [25]ZAHN C T.Graph-theoretical methods for detecting and describing gestalt clusters[J].IEEE Transactions on Computers,1971,100(1):68-86. [26]CHANG H,YEUNG D Y.Robust path-based spectral cluste-ring[J].Pattern Recognition,2008,41(1):191-203. [27]JAIN A,LAW M.Data clustering:A user's dilemma[J].Lecture Notes in Computer Science,2005,3776:1-10. [28]FU L M,MEDICO E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data[J].BMC bioinforma-tics,2007,8(1):3-3. [29]HOU J,ZHANG A H,QI N M.Density peak clustering based on relative density relationship[J].Pattern Recognition,2020,108:107554. [30]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors[J].Information Sciences,2016,354:19-40. |
|