计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 100-108.doi: 10.11896/jsjkx.220800074

• 粒计算与知识发现 • 上一篇    下一篇

基于局部半径的三支DBSCAN算法

申秋萍1, 张清华1,2, 高满1, 代永杨1   

  1. 1 重庆邮电大学计算智能重庆市重点实验室 重庆 400065
    2 重庆邮电大学旅游多源数据感知与决策技术文化和旅游部重点实验室 重庆 400065
  • 收稿日期:2022-08-08 修回日期:2022-11-21 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 张清华(zhangqh@cqupt.edu.cn)
  • 作者简介:(sqp6750@163.com)
  • 基金资助:
    国家重点研究发展计划(2020YFC2003502);国家自然科学基金(61876201);重庆英才名家名师项目(CQYC20210202215)

Three-way DBSCAN Algorithm Based on Local Eps

SHEN Qiuping1, ZHANG Qinghua1,2, GAO Man1, DAI Yongyang1   

  1. 1 Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
    2 Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism,Chongqing University of Posts and Telecommunitions,Chongqing 400065,China
  • Received:2022-08-08 Revised:2022-11-21 Online:2023-06-15 Published:2023-06-06
  • About author:SHEN Qiuping,born in 1997,postgra-duate.Her main research interests include three-way decisions,machine learning and uncertain information processing.ZHANG Qinghua,born in 1974.Ph.D,professor,Ph.D supervisor.His main research interests include rough sets,fuzzy sets,granular computing and uncertain information processing.
  • Supported by:
    National Key Research and Development Program of China(2020YFC2003502),National Natural Science Foundation of China(61876201) and Talented Masters and Teachers Project of Chongqing(CQYC20210202215).

摘要: DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种经典的基于密度的聚类算法,它通过两个全局参数即半径Eps和最少点数MinPts,能够对任意形状的数据进行聚类,并自动确定类个数。但是,使用全局半径的DBSCAN对于密度不均匀数据集的聚类效果较差,且无法对重叠数据集进行聚类。因此,定义了密度递减原则和局部半径,并根据k-近邻距离自动确定局部半径,从而提出了基于局部半径的DBSCAN算法(LE-DBSCAN);然后,通过考虑近邻的标签,对二支聚类结果的临界点和噪声点进行重新划分,从而提出了基于局部半径的三支DBSCAN算法(LE3W-DBSCAN)。将LE-DBSCAN和LE3W-DBSCAN与该领域的相关算法在UCI数据集和人工数据集上进行对比,实验结果表明,所提算法在常用的硬聚类指标和软聚类指标上都具有较好的表现。

关键词: 三支聚类, DBSCAN, 局部半径, 多密度, 重叠数据集

Abstract: Density-based spatial clustering of applications with noise(DBSCAN) is a classical density-based clustering algorithm.It clusters dataset of arbitrary shape through two global parameters:radius Eps and minimum number of points MinPts,and automatically determines the number of clusters.However,DBSCAN with global Eps has poor clustering effect on datasets with uneven density,and cannot cluster overlapping datasets.Therefore,the clustering principle of decreasing density and local Eps are defined in this paper.The local Eps is automatically determined according to the k-nearest neighbor distance,and the DBSCAN algorithm based on local Eps(LE-DBSCAN) is proposed.Then,by considering the labels of the nearest neighbors,the border points and noises in the two-way clustering results are redivided,and the three-way DBSCAN algorithm based on local Eps(LE3W-DBSCAN) is proposed.LE-DBSCAN and LE3W-DBSCAN are compared with the related algorithms in this field on UCI datasets and artificial datasets.Experimental results show that the proposed algorithm has good performance both in common hard clustering indices and soft clustering indices.

Key words: Three-way clustering, DBSCAN, Local Eps, Multi-density, Overlapping datasets

中图分类号: 

  • TP391
[1]DAYANAM A,ELBA C T,MARCEL B,et al.Cluster analysis of urban ultrafine particles size distributions[J].Atmospheric Pollution Research,2019,10(1):45-52.
[2]WU X D,ZHU X Q,WU G Q,et al.Data mining with big data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107.
[3]HUANG J J,CHEN W,LIU A,et al.Cluster query:a new query pattern on temporal knowledge graph[J].World Wide Web,2020,23(2):755-779.
[4]BROWN D,JAPA A,SHI Y.A fast density-grid based clustering method[C]//IEEE 9th Annual Computing and Communication Workshop and Conference.Piscataway,NJ:IEEE,2019:48-54.
[5]GONDEAU A,AOUABED Z,HIJRI M,et al.Object weigh-ting:A new clustering approach to deal with outliers and cluster overlap in computational biology[J].IEEE ACM Transactions on Computational Biology and Bioinformatics,2021,18(2):633-643.
[6]XU C Y,LIN R J,CAI J Y,et al.Deep image clustering by fusing contrastive learning and neighbor relation mining[J].Knowledge Based Systems,2022,238:107967.
[7]SINGH S,GANIE A H.Applications of picture fuzzy similaritymeasures in pattern recognition,clustering,and MADM[J].Expert Systems with Applications,2021,168:114264.
[8]XU Z Z,SHEN D R,NIE T Z,et al.A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data[J].Information Sciences,2021,572:574-589.
[9]WANG X,WANG J,YU G X,et al.Network regularized bi-clustering for cancer subtype categorization[J].Chinese Journal of Computers,2019,42(6):1274-1288.
[10]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[11]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226.
[12]ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.Portland,Oregon:AAAI Press,1996:226-231.
[13]JAIN A K.Data clustering:50 years beyond k-means[J].Pattern Recognition Letters,2010,31(8):651-666.
[14]D'URSO P.Informational Paradigm,management of uncertainty and theoretical formalisms in the clustering framework:A review[J].Information Sciences,2017,400:30-62.
[15]PETERS G,CRESPO F A,LINGRAS P,et al.Soft clustering-Fuzzy and rough approaches and their extensions and derivatives[J].International Journal of Approximate Reasoning,2013,54(2):307-322.
[16]YU H,JIAO P,YAO Y Y,et al.Detecting and refining overlapping regions in complex networks with three-way decisions[J].Information Sciences,2016,373:21-41.
[17]YU H,ZHANG C,WANG G Y.A tree-based incremental overlapping clustering method using the three-way decision theory[J].Knowledge Based Systems,2016,91:189-203.
[18]WANG P X,YAO Y Y.CE3:A three-way clustering methodbased on mathematical morphology[J].Knowledge Based Systems,2018,155:54-65.
[19]ZHANG K.A three-way c-means algorithm[J].Applied Soft Computing,2019,82:105536.
[20]MAJI P,PAL S K.RFCM:A hybrid clustering algorithm using rough and Fuzzy sets[J].Fundamenta Informaticae,2007,80(4):475-496.
[21]HU L H,LIU H K,ZHANG J F,et al.KR-DBSCAN:A density-based clustering algorithm based on reverse nearest neighbor and influence space[J].Expert Systems With Applications,2021,186:115763.
[22]CHEN F Y.An improved DBSCAN algorithm for adaptively determining parameters in multi-density environment[C]//2nd International Conference on Artificial Intelligence and Information Systems.New York:ACM,2021:1-4.
[23]YU H,CHEN L Y,YAO J T,et al.A three-way clusteringmethod based on an improved DBSCAN algorithm[J].Physica A:Statistical Mechanics and its Applications,2019,535:122289.
[24]GIONIS A,MANNILA H,TSAPARAS P.Clustering aggregation[J].ACM Transactions on Knowledge Discovery from Data,2007,1(1):1-30.
[25]ZAHN C T.Graph-theoretical methods for detecting and describing gestalt clusters[J].IEEE Transactions on Computers,1971,100(1):68-86.
[26]CHANG H,YEUNG D Y.Robust path-based spectral cluste-ring[J].Pattern Recognition,2008,41(1):191-203.
[27]JAIN A,LAW M.Data clustering:A user's dilemma[J].Lecture Notes in Computer Science,2005,3776:1-10.
[28]FU L M,MEDICO E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data[J].BMC bioinforma-tics,2007,8(1):3-3.
[29]HOU J,ZHANG A H,QI N M.Density peak clustering based on relative density relationship[J].Pattern Recognition,2020,108:107554.
[30]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors[J].Information Sciences,2016,354:19-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!