计算机科学 ›› 2024, Vol. 51 ›› Issue (12): 166-173.doi: 10.11896/jsjkx.240600002

• 数据库&大数据&数据科学 • 上一篇    下一篇

GBDEN:一种基于粒球的大规模数据快速聚类方法

薛任煊1, 伊士超2, 王平心2   

  1. 1 江苏科技大学计算机学院 江苏 镇江 212100
    2 江苏科技大学理学院 江苏 镇江 212100
  • 收稿日期:2024-06-03 修回日期:2024-08-30 出版日期:2024-12-15 发布日期:2024-12-10
  • 通讯作者: 伊士超(shichaoyi@just.edu.cn)
  • 作者简介:(231210702103@stu.just.edu.cn)
  • 基金资助:
    国家自然科学基金(62076111)

GBDEN:A Fast Clustering Algorithm for Large-scale Data Based on Granular Ball

XUE Renxuan1, YI Shichao2, WANG Pingxin2   

  1. 1 School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212100, China
    2 School of Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212100, China
  • Received:2024-06-03 Revised:2024-08-30 Online:2024-12-15 Published:2024-12-10
  • About author:XUE Renxuan,born in 2001,postgra-duate.Her main research interests include machine learning and so on.
    YI Shichao,born in 1983,Ph.D,asso-ciate professor,master supervisor.His main research interests include fast computation,matrix analysis and three-way decision.
  • Supported by:
    National Natural Science Foundation of China(62076111).

摘要: 聚类用于将数据集中的对象划分为具有相似特征的组或类别,使得同一组内的对象之间的相似度较高,而不同组之间的相似度较低。密度聚类是无监督聚类方法之一,它不需要提前指定类簇的数量,而是根据数据的密度来自动确定。与K均值等方法相比,密度聚类对初始点的选择不敏感,因此更容易得到稳健的聚类结果。在众多的密度聚类算法中,DENCLUE(DENsity-based CLUstEring)算法采取了爬山策略,它具有坚实的数学基础,在大量噪声的数据集中具有良好的聚类性能,且在高维数据集中允许对任意形状进行聚类。但其在处理大规模数据集时,需要耗费大量的计算资源和时间。为此,使用粒计算的粒化模型来构建数据集。首先构建一个粗粒度的粒球,然后将粗粒度的粒球划分为细粒球,最后以粒球的形式作为DENCLUE算法的输入,从而进行聚类。实验结果表明,该算法在多个数据集上具有有效性。

关键词: 聚类, 粒计算, 粒球, DENCLUE, 核函数

Abstract: Clustering is a technique used to partition the objects in a dataset into groups or clusters based on their similar features,aiming to form groups where objects within each group are more similar to each other than to those in other groups.Density-based clustering is one of the unsupervised clustering methods that does not require the number of clusters to be specified in advance.On the contrary,it adaptively determines the clusters based on the density of the data.Compared to methods like K-MEANS,density-based clustering is less sensitive to the selection of initial points.It also can produce more robust and reliable clustering results.Among various density-based clustering algorithms,DENCLUE(DENsity-based CLUstEring) utilizes a hill-climbing approach,which is grounded in a solid mathematical foundation.At the same time,it performs well in datasets with considerable noise,allowing clustering of arbitrarily shaped clusters in high-dimensional datasets.However,processing large-scale datasets with DENCLUE requires significant computational resources and time.To address this challenge,this paper proposes a fast clustering algorithm for large-scale data based on granular ball.This involves creating a coarse-grained granular ball initially,which is then refined into fine-grained granular balls.These granular balls served as input for the DENCLUE algorithm for clustering.Experimental findings demonstrate the effectiveness of this approach across multiple datasets.

Key words: Clustering, Granular computing, Granular ball, DENCLUE, Kernel function

中图分类号: 

  • TP391
[1]ZHANG D Q,CHEN S C.A novel kernelized fuzzy c-means algorithm with application in medical image segmentation[J].Artificial Intelligence in Medicine,2004,32(1):37-50.
[2]WANG Q,WANG C,FENG Z Y,et al.A Review of K-means Clustering Algorithm Research[J].Electronic Design Enginee-ring,2012,20(7):21-24.
[3]CAO G H,JIAO Y Y,CHENG Q.Research on Label Clustering Based on Agglomerative Hierarchical Clustering Algorithm[J].New Technology of Library and Information Service,2008(4):23-28.
[4]WANG S,XIN S J,LIU C.Review of Density Peak Clustering Algorithm[J].Journal of East China Jiaotong University,2023,40(1):106-116.
[5]KRIEGEL H P,KROGER P,SANDER J,et al.Density-based clustering[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2011,1(3):231-240.
[6]ZADEH L A.Toward human level machine intelligenceis itachievable? the need for a paradigm shift[J].IEEE Computational Intelligence Magazine,2008,3(3):11-22.
[7]YAO J T,VASILAKOS A V,PEDRYCZ W.Granular computing:perspectives and challenges[J].IEEE Transactions on Cybernetics,2013,43(6):1977-1989.
[8]HINNEBURG A,KEIM D A.Ageneral approach to clusteringin large databases with noise[J].Knowledge and Information Systems,2003,5:387-415.
[9]HINNEBURG A,GABRIEL HH.Denclue 2.0:Fast clustering based on kernel density estimation[C]//International Sympo-sium on Intelligent Data Analysis.Berlin,Heidelberg:Springer,2007:70-80.
[10]HE J,PAN W.A DENCLUE based approach to neuro-fuzzy system modeling[C]//2010 2nd International Conference on Advanced Computer Control.IEEE,2010:42-46.
[11]IDRISSI A,REHIOUI H,LAGHRISSI A,et al.An improvement of DENCLUE algorithm for the data clustering[C]//2015 5th International Conference on Information & Communication Technology and Accessibility(ICTA).IEEE,Marrakech,Morocco,2015:1-6.
[12]KHADER M S,Al-NAYMAT G.VDENCLUE:An EnhancedVariant of DENCLUE Algorithm[C]//Intelligent Systems and Applications:Proceedings of the 2020 Intelligent Systems Conference(IntelliSys).Springer,2021:425-436.
[13]XIA S Y,LIU Y,DING X,et al.Granular ball computing classifiers for efficient,scalable and robust learning[J].Information Sciences,2019,483:136-152.
[14]XIA S Y,ZHANG H,LI W H,et al.GBNRS:A novel rough set algorithm for fast adaptive attribute reduction in classification[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(3):1231-1242.
[15]XIA S Y,PENG D W,MENGD Y,et al.A fast adaptive k-means with no Bounds[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2022,44(1):87-99.
[16]HU Q H,YU D R,XIE Z X.Numerical attribute reductionbased on neighborhood granulation and rough approximation[J].Journal of Software,2008,19(3):640-649.
[17]CHENG D D,LI Y,XIA S Y,et al.A fast granular-ball-baseddensity peaks clustering algorithm for large-scale data[J].IEEE Transactions on Neural Networks and Learning Systems,2023,31(12):4878-4889.
[18]XIE J,KONG W Y,XIA S Y,et al.An Efficient Spectral Clustering Algorithm Based on Granular-Ball[J].IEEE Transactions on Knowledge & Data Engineering,2023,35(9):9743-9753.
[19]WU M.A Local Learning Approach for Clustering[C]//Twentieth Annual Conference on Neural Information Processing Systems(NIPS 2006).MIT Press,Vancouver,British Columbia,Canada,2007:1529-1536.
[20]STEINLEY D.Properties of thehubert-arable adjusted rand index[J].Psychological Methods,2004,9(3):386.
[21]MCDAID A F,GREENE D,HURLEY N.Normalized mutual information to evaluate overlapping community finding algorithms[J].arXiv:1110.2515,2011.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!