Computer Science ›› 2024, Vol. 51 ›› Issue (7): 124-132.doi: 10.11896/jsjkx.231000125

• Database & Big Data & Data Science • Previous Articles     Next Articles

Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity

SUN Haowen, DING Jiaman, LI Bowen, JIA Lianyin   

  1. Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    Artificial Intelligence Key Laboratory of Yunnan Province,Kunming University of Science and Technology,Kunming 650500,China
  • Received:2023-10-18 Revised:2024-03-29 Online:2024-07-15 Published:2024-07-10
  • About author:SUN Haowen,born in 1999,master,is a member of CCF(No.P9555G).His main research interests include machine learning and data mining.
    DING Jiaman,born in 1974,professor,is a member of CCF(No.77726M).His main research interests include data mining,cloud computing and machine learning.
  • Supported by:
    National Natural Science Foundation of China(62262034,62262035) andRanking the Top of the List for Science and Technology Project of Yunnan Province(202204BW050001).

Abstract: According to different data characteristics,clustering analysis adopts different similarity measures.However,the data distribution is complex in the real world,and there are various phenomena such as irregular distribution and uneven density.Considering attribute similarity or distribution structure connectivity alone will reduce clustering performance.Therefore,this paper proposes a clustering algorithm based on attribute similarity and distributed structure connectivity(ASDSC).Firstly,a completely undirected graph is constructed using all data instances,and a novel similarity measurement method is defined to calculate the node similarity by the topology structure and the attributes similarity,and the adjacency matrix are constructed to update the weights of edges.Secondly,based on the adjacency matrix,random walk with increasing step is performed.Subsequently,the cluster centers and their numbers are obtained according to the connected centrality of nodes,and the connectivity of other nodes is also acquired.Then,the connectivity is used to calculate the dependencies among nodes,and the propagation process of cluster number is carried out accordingly until the clustering process is completed.Finally,comparative experiments with 5 advanced clustering algorithms are conducted on 16 synthetic datasets and 10 real datasets,and the result show that the ASDSC algorithm has achieved excellent performance.

Key words: Clustering, Similarity measure, Attribute similarity, Distributed structure connectivity, Cluster number propagation

CLC Number: 

  • TP391
[1]ZHANG Y,XIA Y Q,LIU Y,et al.Clustering sentences withdensity peaks for multi-document summarization[C]//Procee-dings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.DC:Association for Computational Linguistics,2015:1262-1267.
[2]LAHIRI S,MISRA S,SANJOY KUMAR S,et al.Clustering-Based Semi-supervised Technique for Credit Card Fraud Detection[C]//Proceedings of Computational Intelligence in Communications and Business Analytics.Springer,2022:260-268.
[3]LI C W,CHEN H M,LI T R,et al.A stable community detection approach for complex network based on density peak clustering and label propagation[J].Applied Intelligence,2022,52:1188-1208.
[4]INUWA-DUTSE I,LIPTROTT M,KORKONTZELOS I.Amultilevel clustering technique for community detection[J].Neurocomputing,2021,441:64-78.
[5]QU J L,CUI Y H.Gene set analysis with graph-embedded kernel association test[J].Bioinformatics,2022,38(6):1560-1567.
[6]GAO C M,ZHAO Y,WU R Z,et al.Semantic trajectory compression via multi-resolution synchronization-based clustering[J].Knowledge-Based Systems,2019,174:177-193.
[7]ZHAO K,CHONG P,QIANG C.Twin learning for similarityand clustering:a unified kernel approach[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence(AAAI'17).USA,2017:2080-2086.
[8]TIAN X Y,XU D C,DU D L,et al.The spherical k-means++ algorithm via local search scheme[J].Journal of Combinatorial Optimization,2022,44:2375-2394.
[9]HANAFI N,SAADATFAR H.A fast DBSCAN algorithm forbigdata based on efficient density calculation[J].Expert Systems with Applications,2022,203:117501.
[10]KUMAR U,LEGENDRE C P,LEE J C,et al.On analyzingGNSS displacement field variability of Taiwan:Hierarchical Agglomerative Clustering based on Dynamic Time Warping technique[J].Computers & Geosciences,2022,169:105243.
[11]LI H M,YE X C,IMAKURA A,et al.Divide-and-conquer based Large-Scale Spectral Clustering[J].Neurocomputing,2022,501:664-678.
[12]FRÄNTI P,SIERANOJA S.How much can k-means be im-proved by using better initialization and repeats?[J].Pattern Recognition,2019,93:95-112.
[13]GUPTA A,DATTA S,DAS S.Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering[J].Pattern Recognition Letters,2018,116:72-79.
[14]OSKOUEI A G,BALAFAR M A,MOTAMED C.FK-MAWCW:Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning[J].Chaos,Solitons &Fractals,2021,153(1):111494.
[15]YANG Y M,LIU H M,GUAN Z Y,et al.CoHomo:A cluster-attribute correlation aware graph clustering framework[J].Neurocomputing,2020,412:327-338.
[16]SCHUBERT E,SANDER J,ESTER M,et al.DBSCAN Revisi-ted,Revisited:Why and How You Should(Still)Use DBSCAN[J].ACM Transactions on Database Systems,2017,42(3):19-21.
[17]ABBAS M,EL-ZOGHABI A,SHOUKRY A.DenMune:Density peak based clustering using mutual nearest neighbors[J].Pattern Recognition,2021,109:107589.
[18]ZHU X T,LOY C C,GONG S G.Constructing Robust Affinity Graphs for Spectral Clustering[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.OH:IEEE,2014:1450-1457.
[19]WANG Y,MA Y,HUANG H,et al.A split-merge clustering algorithm based on the k-nearest neighbor graph[J].Information Systems,2023,111:102124.
[20]CAI Y D,HUANG Z X,YIN J F.A new method to build theadaptive k-nearest neighbors similarity graph matrix for spectral clustering[J].Neurocomputing,2022,493:191-203.
[21]ULTSCH A,LÖTSCH J.The fundamental clustering and projection suite(fcps):a dataset collection to test the performance of clustering and data projection algorithms[J].Data,2020,5(1):13.
[1] CHEN Jie, JIN Linjiang, ZHENG Hongbo, QIN Xujia. Deep Feature Learning and Feature Clustering of Streamlines in 3D Flow Fields [J]. Computer Science, 2024, 51(7): 221-228.
[2] LI Shuai, YU Juan, WU Shaocheng. Cross-lingual Text Topic Discovery Based on Ensemble Learning [J]. Computer Science, 2024, 51(6A): 230300201-8.
[3] GAO Jianlei, LUO Minxia. Similarity Measure Between Picture Fuzzy Sets and Its Application in Pattern Recognition [J]. Computer Science, 2024, 51(6A): 230500153-5.
[4] SU Ruqi, BIAN Xiong, ZHU Songhao. Few-shot Images Classification Based on Clustering Optimization Learning [J]. Computer Science, 2024, 51(6A): 230300227-7.
[5] LI Zi, ZHOU Yu. Sequence-based Program Semantic Rule Mining and Violation Detection [J]. Computer Science, 2024, 51(6): 78-84.
[6] HE Yifan, HE Yulin, CUI Laizhong, HUANG Zhexue. Subspace-based I-nice Clustering Algorithm [J]. Computer Science, 2024, 51(6): 153-160.
[7] ZHANG Zhiyuan, ZHANG Weiyan, SONG Yuqiu, RUAN Tong. Multilingual Event Detection Based on Cross-level and Multi-view Features Fusion [J]. Computer Science, 2024, 51(5): 208-215.
[8] WANG Hancheng, DAI Haipeng, CHEN Zhipeng, CHEN Shusen, CHEN Guihai. Large-scale Network Community Detection Algorithm Based on MapReduce [J]. Computer Science, 2024, 51(4): 11-18.
[9] KANG Wei, LI Lihui, WEN Yimin. Semi-supervised Classification of Data Stream with Concept Drift Based on Clustering Model Reuse [J]. Computer Science, 2024, 51(4): 124-131.
[10] QIAO Fan, WANG Peng, WANG Wei. Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion [J]. Computer Science, 2024, 51(2): 36-46.
[11] YANG Bo, LUO Jiachen, SONG Yantao, WU Hongtao, PENG Furong. Time Series Clustering Method Based on Contrastive Learning [J]. Computer Science, 2024, 51(2): 63-72.
[12] GUO Shangzhi, LIAO Xiaofeng, XIAN Kaiyi. Logical Regression Click Prediction Algorithm Based on Combination Structure [J]. Computer Science, 2024, 51(2): 73-78.
[13] HUANG Lulu, TANG Shuyu, ZHANG Wei, DAI Xiangguang. Non-negative Matrix Factorization Parallel Optimization Algorithm Based on Lp-norm [J]. Computer Science, 2024, 51(2): 100-106.
[14] XU Jie, WANG Lisong. Contrastive Clustering with Consistent Structural Relations [J]. Computer Science, 2023, 50(9): 123-129.
[15] HU Shen, QIAN Yuhua, WANG Jieting, LI Feijiang, LYU Wei. Super Multi-class Deep Image Clustering Model Based on Contrastive Learning [J]. Computer Science, 2023, 50(9): 192-201.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!