Computer Science ›› 2024, Vol. 51 ›› Issue (12): 166-173.doi: 10.11896/jsjkx.240600002

• Database & Big Data & Data Science • Previous Articles     Next Articles

GBDEN:A Fast Clustering Algorithm for Large-scale Data Based on Granular Ball

XUE Renxuan1, YI Shichao2, WANG Pingxin2   

  1. 1 School of Computer, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212100, China
    2 School of Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu 212100, China
  • Received:2024-06-03 Revised:2024-08-30 Online:2024-12-15 Published:2024-12-10
  • About author:XUE Renxuan,born in 2001,postgra-duate.Her main research interests include machine learning and so on.
    YI Shichao,born in 1983,Ph.D,asso-ciate professor,master supervisor.His main research interests include fast computation,matrix analysis and three-way decision.
  • Supported by:
    National Natural Science Foundation of China(62076111).

Abstract: Clustering is a technique used to partition the objects in a dataset into groups or clusters based on their similar features,aiming to form groups where objects within each group are more similar to each other than to those in other groups.Density-based clustering is one of the unsupervised clustering methods that does not require the number of clusters to be specified in advance.On the contrary,it adaptively determines the clusters based on the density of the data.Compared to methods like K-MEANS,density-based clustering is less sensitive to the selection of initial points.It also can produce more robust and reliable clustering results.Among various density-based clustering algorithms,DENCLUE(DENsity-based CLUstEring) utilizes a hill-climbing approach,which is grounded in a solid mathematical foundation.At the same time,it performs well in datasets with considerable noise,allowing clustering of arbitrarily shaped clusters in high-dimensional datasets.However,processing large-scale datasets with DENCLUE requires significant computational resources and time.To address this challenge,this paper proposes a fast clustering algorithm for large-scale data based on granular ball.This involves creating a coarse-grained granular ball initially,which is then refined into fine-grained granular balls.These granular balls served as input for the DENCLUE algorithm for clustering.Experimental findings demonstrate the effectiveness of this approach across multiple datasets.

Key words: Clustering, Granular computing, Granular ball, DENCLUE, Kernel function

CLC Number: 

  • TP391
[1]ZHANG D Q,CHEN S C.A novel kernelized fuzzy c-means algorithm with application in medical image segmentation[J].Artificial Intelligence in Medicine,2004,32(1):37-50.
[2]WANG Q,WANG C,FENG Z Y,et al.A Review of K-means Clustering Algorithm Research[J].Electronic Design Enginee-ring,2012,20(7):21-24.
[3]CAO G H,JIAO Y Y,CHENG Q.Research on Label Clustering Based on Agglomerative Hierarchical Clustering Algorithm[J].New Technology of Library and Information Service,2008(4):23-28.
[4]WANG S,XIN S J,LIU C.Review of Density Peak Clustering Algorithm[J].Journal of East China Jiaotong University,2023,40(1):106-116.
[5]KRIEGEL H P,KROGER P,SANDER J,et al.Density-based clustering[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2011,1(3):231-240.
[6]ZADEH L A.Toward human level machine intelligenceis itachievable? the need for a paradigm shift[J].IEEE Computational Intelligence Magazine,2008,3(3):11-22.
[7]YAO J T,VASILAKOS A V,PEDRYCZ W.Granular computing:perspectives and challenges[J].IEEE Transactions on Cybernetics,2013,43(6):1977-1989.
[8]HINNEBURG A,KEIM D A.Ageneral approach to clusteringin large databases with noise[J].Knowledge and Information Systems,2003,5:387-415.
[9]HINNEBURG A,GABRIEL HH.Denclue 2.0:Fast clustering based on kernel density estimation[C]//International Sympo-sium on Intelligent Data Analysis.Berlin,Heidelberg:Springer,2007:70-80.
[10]HE J,PAN W.A DENCLUE based approach to neuro-fuzzy system modeling[C]//2010 2nd International Conference on Advanced Computer Control.IEEE,2010:42-46.
[11]IDRISSI A,REHIOUI H,LAGHRISSI A,et al.An improvement of DENCLUE algorithm for the data clustering[C]//2015 5th International Conference on Information & Communication Technology and Accessibility(ICTA).IEEE,Marrakech,Morocco,2015:1-6.
[12]KHADER M S,Al-NAYMAT G.VDENCLUE:An EnhancedVariant of DENCLUE Algorithm[C]//Intelligent Systems and Applications:Proceedings of the 2020 Intelligent Systems Conference(IntelliSys).Springer,2021:425-436.
[13]XIA S Y,LIU Y,DING X,et al.Granular ball computing classifiers for efficient,scalable and robust learning[J].Information Sciences,2019,483:136-152.
[14]XIA S Y,ZHANG H,LI W H,et al.GBNRS:A novel rough set algorithm for fast adaptive attribute reduction in classification[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(3):1231-1242.
[15]XIA S Y,PENG D W,MENGD Y,et al.A fast adaptive k-means with no Bounds[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2022,44(1):87-99.
[16]HU Q H,YU D R,XIE Z X.Numerical attribute reductionbased on neighborhood granulation and rough approximation[J].Journal of Software,2008,19(3):640-649.
[17]CHENG D D,LI Y,XIA S Y,et al.A fast granular-ball-baseddensity peaks clustering algorithm for large-scale data[J].IEEE Transactions on Neural Networks and Learning Systems,2023,31(12):4878-4889.
[18]XIE J,KONG W Y,XIA S Y,et al.An Efficient Spectral Clustering Algorithm Based on Granular-Ball[J].IEEE Transactions on Knowledge & Data Engineering,2023,35(9):9743-9753.
[19]WU M.A Local Learning Approach for Clustering[C]//Twentieth Annual Conference on Neural Information Processing Systems(NIPS 2006).MIT Press,Vancouver,British Columbia,Canada,2007:1529-1536.
[20]STEINLEY D.Properties of thehubert-arable adjusted rand index[J].Psychological Methods,2004,9(3):386.
[21]MCDAID A F,GREENE D,HURLEY N.Normalized mutual information to evaluate overlapping community finding algorithms[J].arXiv:1110.2515,2011.
[1] LI Zekai, ZHONG Jiaqing, FENG Shaojun, CHEN Juan, DENG Rongyu, XU Tao, TAN Zhengyuan, ZHOU Kexing, ZHU Pengzhi, MA Zhaoyang. CPU Power Modeling Accuracy Improvement Method Based on Training Set Clustering Selection [J]. Computer Science, 2024, 51(9): 59-70.
[2] ZHOU Yu, YANG Junling, DANG Kelin. Change Detection in SAR Images Based on Evolutionary Multi-objective Clustering [J]. Computer Science, 2024, 51(9): 140-146.
[3] WANG Yiyang, LIU Fagui, PENG Lingxia, ZHONG Guoxiang. Out-of-Distribution Hard Disk Failure Prediction with Affinity Propagation Clustering and Broad Learning Systems [J]. Computer Science, 2024, 51(8): 63-74.
[4] WANG Xingeng, DU Tao, ZHOU Jin, CHEN Di, WU Yunzheng. Adaptive Density Peak Clustering Algorithm Based on Shared Nearest Neighbor [J]. Computer Science, 2024, 51(8): 97-105.
[5] SUN Haowen, DING Jiaman, LI Bowen, JIA Lianyin. Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity [J]. Computer Science, 2024, 51(7): 124-132.
[6] CHEN Jie, JIN Linjiang, ZHENG Hongbo, QIN Xujia. Deep Feature Learning and Feature Clustering of Streamlines in 3D Flow Fields [J]. Computer Science, 2024, 51(7): 221-228.
[7] LI Shuai, YU Juan, WU Shaocheng. Cross-lingual Text Topic Discovery Based on Ensemble Learning [J]. Computer Science, 2024, 51(6A): 230300201-8.
[8] SU Ruqi, BIAN Xiong, ZHU Songhao. Few-shot Images Classification Based on Clustering Optimization Learning [J]. Computer Science, 2024, 51(6A): 230300227-7.
[9] LEI Chao, LIU Jiang, SONG Jiawen. Time Cost Model and Optimal Configuration Method for GPU Parallel Computation of Matrix Multiplication [J]. Computer Science, 2024, 51(6A): 230300200-8.
[10] LI Zi, ZHOU Yu. Sequence-based Program Semantic Rule Mining and Violation Detection [J]. Computer Science, 2024, 51(6): 78-84.
[11] HE Yifan, HE Yulin, CUI Laizhong, HUANG Zhexue. Subspace-based I-nice Clustering Algorithm [J]. Computer Science, 2024, 51(6): 153-160.
[12] ZHANG Zhiyuan, ZHANG Weiyan, SONG Yuqiu, RUAN Tong. Multilingual Event Detection Based on Cross-level and Multi-view Features Fusion [J]. Computer Science, 2024, 51(5): 208-215.
[13] WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai. Large-scale Network Community Detection Algorithm Based on MapReduce [J]. Computer Science, 2024, 51(4): 11-18.
[14] KANG Wei, LI Lihui, WEN Yimin. Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse [J]. Computer Science, 2024, 51(4): 124-131.
[15] QIAO Fan, WANG Peng, WANG Wei. Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion [J]. Computer Science, 2024, 51(2): 36-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!