计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 79-89.doi: 10.11896/jsjkx.240800102

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于相对邻近度的自适应谱聚类算法

原泽菲, 张正军, 姜国林   

  1. 南京理工大学数学与统计学院 南京 210094
  • 收稿日期:2024-08-20 修回日期:2024-11-29 出版日期:2025-10-15 发布日期:2025-10-14
  • 通讯作者: 张正军(zzjnj@163.com)
  • 作者简介:(daichongtou1999@163.com)
  • 基金资助:
    国家自然科学基金(61773014)

Adaptive Spectral Clustering Algorithm Based on Relative Proximity

YUAN Zefei, ZHANG Zhengjun, JIANG Guolin   

  1. School of Mathematics and Statistics,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2024-08-20 Revised:2024-11-29 Online:2025-10-15 Published:2025-10-14
  • About author:YUAN Zefei,born in 1999,postgra-duate.Her main research interests include data mining and machine lear-ning.
    ZHANG Zhengjun,born in 1965,Ph.D,associate professor.His main research interests include data mining,graphics technology and image processing.
  • Supported by:
    National Natural Science Foundation of China(61773014).

摘要: 针对以高斯核函数为相似性度量的传统谱聚类算法需人为设置尺度参数,相似度与样本分布结构无关的问题,定义了在自然k近邻基础上的共享邻居,结合数据点的近邻信息构造了能反映区域密度的多尺度参数,以新的尺度参数重新定义了相似性度量,提出了一种基于相对邻近度的自适应谱聚类算法(Adaptive Spectral Clustering based on Relative Proximity,RPASC)。改进的尺度参数结合了间隔尺度、顺序尺度及比例尺度等特性,体现了数据点之间的相对位置关系,反映了不同密度簇的分布特征和空间结构,提高了算法对不同分布数据集的适应性。新的相似性度量通过灵活调整局部尺度参数的大小,自适应地缩小不同密度簇边界上数据点的相似度,使聚类的簇边界更明确,有利于发现真实的簇形态。通过在人工合成数据集和UCI真实数据集上进行的实验,验证了RPASC算法在多个聚类性能指标上的有效性。

关键词: 谱聚类, 多尺度参数, 共享邻居, 自然k近邻, 相似性度量

Abstract: The traditional spectral clustering algorithm with Gaussian kernel function as the similarity measure has the problem that the scale parameter needs to be artificially set,and the similarity is not related to the sample distribution structure.In order to solve this problem,the shared neighbors based on the natural k-nearest neighbors are defined,and a multi-scale parameter reflecting the regional density is constructed based on the nearest neighbors information of the data points,and the similarity mea-sure is redefined with the new scale parameter.This paper proposes an adaptive spectral clustering algorithm based on relative proximity(RPASC).The improved scale parameter combines the characteristics of interval scale,sequence scale and proportional scale,embodying the relative position relationship between data points and reflecting the distribution characteristics and spatial structure of different density clusters,which improves the adaptability of the algorithm to different distribution datasets.The new similarity measure adaptively reduces the similarity of data points on cluster boundaries of different densities by flexibly adjusting the size of local scale parameter,making cluster boundaries more explicit,which is conducive to discovering the true cluster morphologies.Experiments on synthetic datasets and UCI real datasets verify the effectiveness of the RPASC algorithm on multiple clustering performance indicators.

Key words: Spectral clustering,Multi-scale parameter,Shared neighbors,Natural k-nearest neighbors,Similarity measure

中图分类号: 

  • TP301.6
[1]ZHANG Y L,ZHOU Y J.Review of clustering algorithms [J].Journal of Computer Applications,2019,39(7):1869-1882.
[2]SHI J,MALIK J.Normalized Cuts and Image Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[3]LI P,LIU L J,HUANG Y D.Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation[J].Computer Science,2021,48(S1):220-225.
[4]XU X,ZHANG H,YANG C M,et al.Fair Method for Spectral Clustering to Improve Intra-cluster Fairness[J].Computer Science,2023,50(2):158-165.
[5]LI L L.A survey of spectral clustering algorithms and their applications[J].Software Guide,2016,15(7):54-56.
[6]LIAO L C,JIANG X H,ZOU F M,et al.A Spectral Clustering Method for Big Trajectory Data Mining with Latent Semantic Correlation[J].Acta Automatica Sinica,2015,43(5):956-964.
[7]HE L,LI Y,ZHANG X,et al.Incremental spectral clusteringvia fastfood features and its application to stream image segmentation[J].Symmetry,2018,10(7):272.
[8]ALSHAMMARI M,TAKATSUKA M.Approximate spectralclustering with eigenvector selection and self-tuned k[J].Pattern Recognition Letters,2019,122:31-37.
[9]XIA K J,GU X Q,ZHANG Y D.Oriented groupingconstrained spectral clustering for medical imaging segmentation[J].Multimedia Systems,2020,26(1):27-36.
[10]XU D H,LI C,CHEN T,et al.A novel low rank spectral clustering method for face identification[J].Recent Patents on Engineering,2019,13(4):387-394.
[11]WANG L,BO L F,JIAO L C.Density-Sensitive Semi-Super-vised Spectral Clustering[J].Journal of Software,2007,18(10):2412-2422.
[12]TAO X M,WANG R T,CHANG R,et al.Low Density Separation Density Sensitive Distance-based Spectral Clustering Algorithm[J].Acta Automatica Sinica,2020,46(7):1479-1495.
[13]MANOR L Z,PERONA P.Self-Tuning Spectral Clustering[C]//Proceeding of NIPS.2005:1601-1608.
[14]KONG W Z,SUN C S H,ZHANG J H,et al.Spectral clustering based onneighboringadaptive local scale[J].Journal of Image and Graphics,2012,17(4):523-529.
[15]ZHANG X,LI J,YU H.Local density adaptive similarity measurement for spectral clustering[J].Pattern Recognition Letters,2011,32(2):352-358.
[16]ZHAO X Q,LIU X L.Improved adaptive spectral clustering algorithm based on density sensitivity[J].Journal of Lanzhou University of Technology,2018,44(6):102-106.
[17]ZHANG T,GE H W.Spectral Clustering Based on Density Coefficient and Shared Nearest Neighbors[J].Journal of Chinese Computer Systems,2017,38(8):1829-1833.
[18]ZHAO Y L,CHE W G,JIN R Z.A self-adaptive spectral clustering algorithm based on an improved coefficient of variation between samples[J].Journal of Lanzhou University(Natural Sciences),2022,58(6):812-818.
[19]GE J W,YANG G X.Spectral Clustering Algorithm for Density Adaptive Neighborhood Based on Shared Nearest Neighbors[J].Computer Engineering,2021,47(8):116-123.
[20]BAI L,ZHAO X,KONG Y T,et al.Survey of Spectral Cluste-ring Algorithms[J].Computer Engineering and Applications,2021,57(14):15-26.
[21]REBAGLIATI N,VERRI A.Spectral clustering with more than K eigenvectors[J].Neurocomputing,2011,74(9):1391-1401.
[22]STEVENS S S.Mathematics,measurement,and psychophysics[M]//Handbook of Experimental Psychology.London:Wiley,1951:1-49.
[23]ZHANG L P,ZHAO J Q,LI S,et al.Research on Methods of Construction of Voronoi Diagram and Nearest Neighbor Query in Constrained Regions[J].Computer Science,2014,41(9):220-224.
[24]VINH N X,EPPS J,BAILEY J.Information theoretic measures for clusterings comparison:Variants,properties,normalization and correction for chance[J].Journal of Machine Learning Research,2010,11:2837-2854.
[25]FOWLKES E B,MALLOWS C L.A method for comparing two hierarchical clusterings[J].Journal of the American Statistical Association,1983,78(383):553-569.
[26]YIN S Z,WANG T,CHEN Q C,et al.A Class of Classification Algorithm for Binary Protocol Based on Adjusting Mutual Information[J].Ordnance Industry Automation,2020,39(6):37-41.
[27]WEI Y,ZHANG Z J,HE K L,et al.Density Peak Clustering Algorithm Based on Relative Density[J].Computer Enginee-ring,2023,49(6):53-61.
[28]FISHER R A.The UCI Machine Learning Repository[EB/OL].[1988-06-30].https://archive.ics.uci.edu/dataset/53/iris.
[29]ALCOCK R.The UCI Machine Learning Repository[EB/OL].[1999-06-07].https://archive.ics.uci.edu/dataset/139/synthetic+control+chart+time+series.
[30]ROSS Q.The UCI Machine Learning Repository[EB/OL].[1986-12-31].https://archive.ics.uci.edu/dataset/102/thyroid+disease.
[31]STREET W, WOLBERG W,MANGASARIAN O.The UCIMachine Learning Repository[EB/OL].[1995-10-31].https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic.
[32]ALPAYDIN E,KAYNAK C.The UCI Machine Learning Repository[EB/OL].[1998-06-30].https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits.
[33]CHAPMAN D,JAIN A.The UCI Machine Learning Repository[EB/OL].[1994-09-11].https://archive.ics.uci.edu/dataset/75/musk+version+2.
[34]MARKELLE K,RACHEL L,KOLBY N.The UCI MachineLearning Repository[EB/OL].[2019-10-05].https://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik.
[35]ALPAYDIN E,ALIMOGLU F.The UCI Machine Learning Repository[EB/OL].[1998-06-30].https://archive.ics.uci.edu/dataset/81/pen+based+recognition+of+handwritten+digits.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!