计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 124-132.doi: 10.11896/jsjkx.231000125
孙浩文, 丁家满, 李博文, 贾连印
SUN Haowen, DING Jiaman, LI Bowen, JIA Lianyin
摘要: 聚类分析针对不同的数据特点采用不同的相似性度量,现实世界中数据分布复杂,存在分布无规律、密度不均匀等现象,单独考虑实例属性相似性或分布结构连通性会影响聚类效果。为此,提出了一种基于属性相似性和分布结构连通性的聚类算法(A Clustering Algorithm Based on Attribute Similarity and Distributed Structure Connectivity,ASDSC)。首先,利用待聚类数据集中的所有数据实例构建完全无向图,定义了一种兼顾属性相似和分布结构连通的新颖相似性度量方式,用于计算节点相似性,并构造邻接矩阵更新边的权重;其次,借助邻接矩阵执行递增步长的随机游走,依据顶点的连通中心性来识别簇中心并给定簇编号,同时获取其他顶点的连通性;然后,利用连通性计算顶点间的依赖关系,并据此进行簇编号的传播,直至完成聚类。最后,为了验证该方法的聚类性能,在16个合成数据集和10个真实数据集上与5种先进聚类算法进行了对比实验,ASDSC算法取得了优异性能。
中图分类号:
[1]ZHANG Y,XIA Y Q,LIU Y,et al.Clustering sentences withdensity peaks for multi-document summarization[C]//Procee-dings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.DC:Association for Computational Linguistics,2015:1262-1267. [2]LAHIRI S,MISRA S,SANJOY KUMAR S,et al.Clustering-Based Semi-supervised Technique for Credit Card Fraud Detection[C]//Proceedings of Computational Intelligence in Communications and Business Analytics.Springer,2022:260-268. [3]LI C W,CHEN H M,LI T R,et al.A stable community detection approach for complex network based on density peak clustering and label propagation[J].Applied Intelligence,2022,52:1188-1208. [4]INUWA-DUTSE I,LIPTROTT M,KORKONTZELOS I.Amultilevel clustering technique for community detection[J].Neurocomputing,2021,441:64-78. [5]QU J L,CUI Y H.Gene set analysis with graph-embedded kernel association test[J].Bioinformatics,2022,38(6):1560-1567. [6]GAO C M,ZHAO Y,WU R Z,et al.Semantic trajectory compression via multi-resolution synchronization-based clustering[J].Knowledge-Based Systems,2019,174:177-193. [7]ZHAO K,CHONG P,QIANG C.Twin learning for similarityand clustering:a unified kernel approach[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence(AAAI'17).USA,2017:2080-2086. [8]TIAN X Y,XU D C,DU D L,et al.The spherical k-means++ algorithm via local search scheme[J].Journal of Combinatorial Optimization,2022,44:2375-2394. [9]HANAFI N,SAADATFAR H.A fast DBSCAN algorithm forbigdata based on efficient density calculation[J].Expert Systems with Applications,2022,203:117501. [10]KUMAR U,LEGENDRE C P,LEE J C,et al.On analyzingGNSS displacement field variability of Taiwan:Hierarchical Agglomerative Clustering based on Dynamic Time Warping technique[J].Computers & Geosciences,2022,169:105243. [11]LI H M,YE X C,IMAKURA A,et al.Divide-and-conquer based Large-Scale Spectral Clustering[J].Neurocomputing,2022,501:664-678. [12]FRÄNTI P,SIERANOJA S.How much can k-means be im-proved by using better initialization and repeats?[J].Pattern Recognition,2019,93:95-112. [13]GUPTA A,DATTA S,DAS S.Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering[J].Pattern Recognition Letters,2018,116:72-79. [14]OSKOUEI A G,BALAFAR M A,MOTAMED C.FK-MAWCW:Categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning[J].Chaos,Solitons &Fractals,2021,153(1):111494. [15]YANG Y M,LIU H M,GUAN Z Y,et al.CoHomo:A cluster-attribute correlation aware graph clustering framework[J].Neurocomputing,2020,412:327-338. [16]SCHUBERT E,SANDER J,ESTER M,et al.DBSCAN Revisi-ted,Revisited:Why and How You Should(Still)Use DBSCAN[J].ACM Transactions on Database Systems,2017,42(3):19-21. [17]ABBAS M,EL-ZOGHABI A,SHOUKRY A.DenMune:Density peak based clustering using mutual nearest neighbors[J].Pattern Recognition,2021,109:107589. [18]ZHU X T,LOY C C,GONG S G.Constructing Robust Affinity Graphs for Spectral Clustering[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.OH:IEEE,2014:1450-1457. [19]WANG Y,MA Y,HUANG H,et al.A split-merge clustering algorithm based on the k-nearest neighbor graph[J].Information Systems,2023,111:102124. [20]CAI Y D,HUANG Z X,YIN J F.A new method to build theadaptive k-nearest neighbors similarity graph matrix for spectral clustering[J].Neurocomputing,2022,493:191-203. [21]ULTSCH A,LÖTSCH J.The fundamental clustering and projection suite(fcps):a dataset collection to test the performance of clustering and data projection algorithms[J].Data,2020,5(1):13. |
|