计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 166-173.doi: 10.11896/jsjkx.240600002
薛任煊1, 伊士超2, 王平心2
XUE Renxuan1, YI Shichao2, WANG Pingxin2
摘要: 聚类用于将数据集中的对象划分为具有相似特征的组或类别,使得同一组内的对象之间的相似度较高,而不同组之间的相似度较低。密度聚类是无监督聚类方法之一,它不需要提前指定类簇的数量,而是根据数据的密度来自动确定。与K均值等方法相比,密度聚类对初始点的选择不敏感,因此更容易得到稳健的聚类结果。在众多的密度聚类算法中,DENCLUE(DENsity-based CLUstEring)算法采取了爬山策略,它具有坚实的数学基础,在大量噪声的数据集中具有良好的聚类性能,且在高维数据集中允许对任意形状进行聚类。但其在处理大规模数据集时,需要耗费大量的计算资源和时间。为此,使用粒计算的粒化模型来构建数据集。首先构建一个粗粒度的粒球,然后将粗粒度的粒球划分为细粒球,最后以粒球的形式作为DENCLUE算法的输入,从而进行聚类。实验结果表明,该算法在多个数据集上具有有效性。
中图分类号:
[1]ZHANG D Q,CHEN S C.A novel kernelized fuzzy c-means algorithm with application in medical image segmentation[J].Artificial Intelligence in Medicine,2004,32(1):37-50. [2]WANG Q,WANG C,FENG Z Y,et al.A Review of K-means Clustering Algorithm Research[J].Electronic Design Enginee-ring,2012,20(7):21-24. [3]CAO G H,JIAO Y Y,CHENG Q.Research on Label Clustering Based on Agglomerative Hierarchical Clustering Algorithm[J].New Technology of Library and Information Service,2008(4):23-28. [4]WANG S,XIN S J,LIU C.Review of Density Peak Clustering Algorithm[J].Journal of East China Jiaotong University,2023,40(1):106-116. [5]KRIEGEL H P,KROGER P,SANDER J,et al.Density-based clustering[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2011,1(3):231-240. [6]ZADEH L A.Toward human level machine intelligenceis itachievable? the need for a paradigm shift[J].IEEE Computational Intelligence Magazine,2008,3(3):11-22. [7]YAO J T,VASILAKOS A V,PEDRYCZ W.Granular computing:perspectives and challenges[J].IEEE Transactions on Cybernetics,2013,43(6):1977-1989. [8]HINNEBURG A,KEIM D A.Ageneral approach to clusteringin large databases with noise[J].Knowledge and Information Systems,2003,5:387-415. [9]HINNEBURG A,GABRIEL HH.Denclue 2.0:Fast clustering based on kernel density estimation[C]//International Sympo-sium on Intelligent Data Analysis.Berlin,Heidelberg:Springer,2007:70-80. [10]HE J,PAN W.A DENCLUE based approach to neuro-fuzzy system modeling[C]//2010 2nd International Conference on Advanced Computer Control.IEEE,2010:42-46. [11]IDRISSI A,REHIOUI H,LAGHRISSI A,et al.An improvement of DENCLUE algorithm for the data clustering[C]//2015 5th International Conference on Information & Communication Technology and Accessibility(ICTA).IEEE,Marrakech,Morocco,2015:1-6. [12]KHADER M S,Al-NAYMAT G.VDENCLUE:An EnhancedVariant of DENCLUE Algorithm[C]//Intelligent Systems and Applications:Proceedings of the 2020 Intelligent Systems Conference(IntelliSys).Springer,2021:425-436. [13]XIA S Y,LIU Y,DING X,et al.Granular ball computing classifiers for efficient,scalable and robust learning[J].Information Sciences,2019,483:136-152. [14]XIA S Y,ZHANG H,LI W H,et al.GBNRS:A novel rough set algorithm for fast adaptive attribute reduction in classification[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(3):1231-1242. [15]XIA S Y,PENG D W,MENGD Y,et al.A fast adaptive k-means with no Bounds[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2022,44(1):87-99. [16]HU Q H,YU D R,XIE Z X.Numerical attribute reductionbased on neighborhood granulation and rough approximation[J].Journal of Software,2008,19(3):640-649. [17]CHENG D D,LI Y,XIA S Y,et al.A fast granular-ball-baseddensity peaks clustering algorithm for large-scale data[J].IEEE Transactions on Neural Networks and Learning Systems,2023,31(12):4878-4889. [18]XIE J,KONG W Y,XIA S Y,et al.An Efficient Spectral Clustering Algorithm Based on Granular-Ball[J].IEEE Transactions on Knowledge & Data Engineering,2023,35(9):9743-9753. [19]WU M.A Local Learning Approach for Clustering[C]//Twentieth Annual Conference on Neural Information Processing Systems(NIPS 2006).MIT Press,Vancouver,British Columbia,Canada,2007:1529-1536. [20]STEINLEY D.Properties of thehubert-arable adjusted rand index[J].Psychological Methods,2004,9(3):386. [21]MCDAID A F,GREENE D,HURLEY N.Normalized mutual information to evaluate overlapping community finding algorithms[J].arXiv:1110.2515,2011. |
|