计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241200068-11.doi: 10.11896/jsjkx.241200068

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于张量图扩散的共享近邻密度峰值聚类算法

刘翘铭1,2,3, 魏千然2, 李智2, 王健4, 李远方2,3   

  1. 1河南大学人工智能学院 郑州 450001
    2 哈尔滨工业大学郑州研究院 郑州 450001
    3 哈尔滨工业大学医学与健康学院 哈尔滨 150001
    4 河南财经政法大学计算机与信息工程学院 郑州 450001
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 李远方(yuanfangli1991@163.com)
  • 作者简介:cslqm@henu.edu.cn
  • 基金资助:
    国家自然科学基金青年项目(62402145);中国博士后科学基金面上项目(GZC20233467);国家资助博士后研究人员计划(2023M740932);河南省自然科学基金青年项目(242300421705)

Tensor Graph Diffusion Share Nearest Neighbor Density Peaks Clustering

LIU Qiaoming1,2,3, WEI Qianran2, LI Zhi2, WANG Jian4, LI Yuanfang2,3   

  1. 1 School of Artifical Intelligence,Henan University,Zhengzhou 450001,China
    2 Zhengzhou Research Institute,Harbin Institute of Technology,Zhengzhou 450001,China
    3 School of Medicine and Health,Harbin Institute of Technology,Harbin 150001,China
    4 School of Computer and Information Engineering,Henan University of Economics and Law,Zhengzhou 450001,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62402145),China Postdoctoral Science Foundation under General Program(GZC20233467),Postdoctoral Fellowship Program of CPSF(2023M740932) and Natural Science Foundation of Henan Province,China(242300421705).

摘要: 密度峰值聚类(Density Peak Clustering,DPC)是一种基于密度划分思想的聚类分析方法。在处理高维数据时,DPC算法在相似度计算过程与聚类分配过程中分别存在“聚集”效应问题和“多米诺”效应问题,限制了DPC在实际应用中的分析效率。针对以上问题,提出基于张量图扩散的共享近邻密度峰值聚类算法TGD-SNN-DPC,该算法首先基于张量图理论设计张量图自适应构建模块,挖掘数据点间多样性局部邻域信息。在此基础上,提出高效张量图扩散学习模块,引入张量图高效更新策略,在不增加模型计算负担的前提下,利用该模块挖掘数据全局高阶拓扑信息,利用以上两个模块获得合理的鲁棒性更强的样本间邻接相似度信息。设计自适应共享邻域聚类模块,以张量图扩散高阶邻接矩阵为基础,引入基于共享近邻信息的样本局部密度与相对距离,利用自适应邻域非聚类中心样本分配策略,提升模型矩阵的准确性。在6个合成数据集和12个真实UCI数据集上的实验表明TGD-SNN-DPC算法在准确度(ACC)、调整兰德系数(ARI)和标准互信息(NMI)方面均优于基准算法。

关键词: 聚类分析, 密度峰值聚类, 张量图, 扩散过程, 共享近邻

Abstract: Density Peak Clustering(DPC) is based on the idea of density clustering.When dealing with high-dimensional data,the DPC algorithm has the problem of the “clustering” effect and “domino” effect in the process of similarity calculation and cluster distribution respectively,which limits the analysis efficiency of DPC in practical application.To solve these problems,a Tensor Graph Diffusion Share Nearest Neighbor Density Peaks Clustering(TGD-SNN-DPC) is proposed.Firstly,an adaptive tensor graph construction module is designed based on tensor graph theory to mine diverse local neighborhood information among data points.On this basis,an efficient tensor graph diffusion learning module is proposed and an efficient update strategy is introduced.On the premise of not increasing the computing burden of the model,this module is used to mine the global high-level topological information of the data,and the adjacency similarity information between samples with stronger robustness is obtained by using the above two modules.Finally,the adaptive shared neighborhood clustering module is designed.Based on the high-order adjacency matrix generated by tensor graph diffusion,the local density and relative distance based on shared neighbor information are introduced,and the adaptive neighborhood non-center sample allocation strategy is designed to improve the accuracy of the model matrix.Experimental results on 6 synthetic datasets and 12 real UCI datasets show that TGD-SNN-DPC algorithm outperforms the benchmark algorithm in clustering.

Key words: Clustering analysis, Density peak clustering, Tensor graph, Diffusion process, Shared nearest neighbors

中图分类号: 

  • TP391.41
[1]DENG Z,CHOI K S,JIANG Y,et al.A survey on soft subspaceclustering [J].Information Sciences,2016,348:84-106.
[2]JAIN A K.Data clustering:50 years beyond K-means [J].Pattern Recognition Letters,2010,31(8):651-666.
[3]PARK H S,JUN C H.A simple and fast algorithm for K-me-doids clustering [J].Expert Systems with Applications,2009,36(2):3336-3341.
[4]YANG M S,LAI C Y,LIN C Y.A robust EM clustering algorithm for Gaussian mixture models [J].Pattern Recognition,2012,45(11):3950-3961.
[5]CHEN X,CAI D.Large scale spectral clustering with landmark-based representation[C]//Proceedings of the Twenty-fifth AAAI Conference on Artificial Intelligence.2011.
[6]GUHA S,RASTOGI R,SHIM K.CURE:An efficient cluste-ring algorithm for large databases [J].ACM Sigmod Record,1998,27(2):73-84.
[7]KARYPIS G,HAN E H,KUMAR V.Chameleon:Hierarchical clustering using dynamic modeling [J].Computer,1999,32(8):68-75.
[8]KHAN K,REHMAN S U,AZIZ K,et al.DBSCAN:Past,present and future[C]//Proceedings of the Fifth International Conference on the Applications of Digital Information and Web Technologies(ICADIWT 2014).IEEE,2014.
[9]ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OP-TICS:Ordering points to identify the clustering structure [J].ACM Sigmod Record,1999,28(2):49-60.
[10]FORSTER A,MURPHY A L.CLIQUE:Role-free clusteringwith Q-learning for wireless sensor networks[C]//Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems.IEEE,2009.
[11]TAO Z,LI J,FU H,et al.From Ensemble Clustering to Subspace Clustering:Cluster Structure Encoding [J].IEEE Transactions on Neural Networks and Learning Systems,2021,34(5):2670-2681.
[12]LIKAS A,VLASSIS N,VERBEEK J J.The global k-meansclustering algorithm [J].Pattern Recognition,2003,36(2):451-461.
[13]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks [J].Science,2014,344(6191):1492-1496.
[14]ZHANG Z,ZHU Q,ZHU F,et al.Density decay graph-baseddensity peak clustering [J].Knowledge-Based Systems,2021,224:107075.
[15]GUAN J,LI S,HE X,et al.Peak-graph-based fast density peak clustering for image segmentation [J].IEEE Signal Processing Letters,2021,28:897-901.
[16]SHI Y,CHEN Z,QI Z,et al.A novel clustering-based imagesegmentation via density peaks algorithm with mid-level feature [J].Neural Computing and Applications,2017,28:29-39.
[17]ANANDARAO S,CHELLASAMY S H.Detection of Hot To-pic in Tweets Using Modified Density Peak Clustering [J].Ingénierie Des Systèmes D’Information,2021,26(6):523-531.
[18]PENG D,GUI Z,WANG D,et al.Clustering by measuring localdirection centrality for data with heterogeneous density and weak connectivity [J].Nature Communications,2022,13(1):5455.
[19]KAUSAR S,MEHMOOD R,IQBAL M S,et al.Density peaks based clustering for single-cell interpretation via multikernel learning [J].Procedia Computer Science,2019,147:71-76.
[20]SUN L,QIN X,DING W,et al.Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy [J].Neurocomputing,2022,473:159-181.
[21]GUO W,WANG W,ZHAO S,et al.Density peak clusteringwith connectivity estimation [J].Knowledge-Based Systems,2022,243:108501.
[22]SUN L,QIN X,DING W,et al.Density peaks clustering based on k-nearest neighbors and self-recommendation [J].International Journal of Machine Learning and Cybernetics,2021,12:1913-1938.
[23]LAOHAKIAT S,SA-ING V.An incremental density-basedclustering framework using fuzzy local clustering [J].Information Sciences,2021,547:404-426.
[24]YU H,CHEN L,YAO J.A three-way density peak clustering method based on evidence theory [J].Knowledge-Based Systems,2021,211:106532.
[25]HOU J,ZHANG A,QI N.Density peak clustering based on re-lative density relationship [J].Pattern Recognition,2020,108:107554.
[26]XU X,DING S,WANG L,et al.A robust density peaks clustering algorithm with density-sensitive similarity [J].Knowledge-Based Systems,2020,200:106028.
[27]TAO X,GUO W,REN C,et al.Density peak clustering using global and local consistency adjustable manifold distance [J].Information Sciences,2021,577:769-804.
[28]DE LATHAUWER L.A survey of tensor methods[C]//Proceedings of the 2009 IEEE International Symposium on Circuits and Systems.IEEE,2009.
[29]PANAGAKIS Y,KOSSAIFI J,CHRYSOS G G,et al.Tensor methods in computer vision and deep learning [C]//Proceedings of the IEEE.2021:863-890.
[30]SAHA P K.Tensor scale:A local morphometric parameter with applications to computer vision and image processing [J].Computer Vision and Image Understanding,2005,99(3):384-413.
[31]BOUCHARD G,NARADOWSKY J,RIEDEL S,et al.Matrix and tensor factorization methods for natural language processing[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing:Tutorial Abstracts.2015.
[32]CHANG K W,YIH W T,YANG B,et al.Typed tensor decomposition of knowledge bases for relation extraction[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).2014.
[33]PAPALEXAKIS E E,FALOUTSOS C,SIDIROPOULOS N D.Tensors for data mining and data fusion:Models,applications,and scalable algorithms [J].ACM Transactions on Intelligent Systems and Technology(TIST),2016,8(2):1-44.
[34]KOLDA T G,SUN J.Scalable tensor decompositions for multi-aspect data mining[C]//Proceedings of the 2008 Eighth IEEE International Conference on Data Mining.IEEE,2008.
[35]WANG J,XU C,ZHANG J,et al.Big data analytics for intelligent manufacturing systems:A review [J].Journal of Manufacturing Systems,2022,62:738-752.
[36]WANG X,YANG L T,SONG L,et al.A tensor-based multiattributes visual feature recognition method for industrial intelligence [J].IEEE Transactions on Industrial Informatics,2020,17(3):2231-2241.
[37]COIFMAN R R,LAFON S.Diffusion maps [J].Applied andComputational Harmonic Analysis,2006,21(1):5-30.
[38]VAN DIJK D,SHARMA R,NAINYS J,et al.Recovering Gene Interactions from Single-Cell Data Using Data Diffusion [J].Cell,2018,174(3):716-729.
[39]TALMON R,COHEN I,GANNOT S,et al.Diffusion Maps for Signal Processing:A Deeper Look at Manifold-Learning Techniques Based on Kernels and Graphs [J].IEEE Signal Proces-sing Magazine,2013,30(4):75-86.
[40]YANG X,PRASAD L,LATECKI L J.Affinity learning with diffusion on tensor product graph [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(1):28-38.
[41]LIU R,WANG H,YU X.Shared-nearest-neighbor-based clustering by fast search and find of density peaks [J].Information Sciences,2018,450:200-226.
[42]WANG H,WANG W,YANG J,et al.Clustering by patternsimilarity in large data sets[C]//Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.2002.
[43]WANG Y,WANG D,PANG W,et al.A systematic density-based clustering method using anchor points [J].Neurocompu-ting,2020,400:352-370.
[44]DU M,DING S,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].Knowledge-Based Systems,2016,99:135-145.
[45]XIE J,GAO H,XIE W,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors [J].Information Sciences,2016,354:19-40.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!