计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 59-70.doi: 10.11896/jsjkx.230600010

• 粒计算与知识发现 • 上一篇    下一篇

结合共享近邻和流形距离的自适应谱聚类算法

张喜梅1,3, 解滨1,2,3, 米据生4, 徐童童1,3, 张祎玲1,3   

  1. 1 河北师范大学计算机与网络空间安全学院 石家庄050024
    2 河北师范大学供应链大数据分析与数据安全河北省工程研究中心 石家庄050024
    3 河北师范大学河北省网络与信息安全重点实验室 石家庄050024
    4 河北师范大学数学科学学院 石家庄050024
  • 收稿日期:2023-06-01 修回日期:2023-08-08 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 解滨(xiebin_hebtu@126.com)
  • 作者简介:(zhangximei1219@163.com)
  • 基金资助:
    国家自然科学基金(62076088);北京市自然科学基金(Z210002)

Adaptive Spectral Clustering Algorithm Combining Shared Nearest Neighbors and Manifold Distance

ZHANG Ximei1,3, XIE Bin1,2,3, MI Jusheng4, XU Tongtong1,3, ZHANG Yiling1,3   

  1. 1 College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China
    2 Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics & Data Security,Hebei Normal University,Shijiazhuang 050024,China
    3 Hebei Provincial Key Laboratory of Network & Information Security,Hebei Normal University,Shijiazhuang 050024,China
    4 School of Mathematical Sciences,Hebei Normal University,Shijiazhuang 050024,China
  • Received:2023-06-01 Revised:2023-08-08 Online:2023-10-10 Published:2023-10-10
  • About author:ZHANG Ximei,born in 1998,postgra-duate,is a member of China Computer Federation.Her main research interests include machine learning and network security.XIE Bin,born in 1976,Ph.D,professor,is a member of China Computer Federation.His main research interests include machine learning,granular computing and intelligent data analysis.
  • Supported by:
    National Natural Science Foundation of China(62076088) and Natural Science Foundation of Beijing,China(Z210002).

摘要: 谱聚类算法是建立在图论的基础上,将聚类问题转化为图的划分问题,能识别任意形状的类簇且易于实现,因此比传统聚类算法具有更强的适应性。然而,该算法中常用的距离度量不能同时考虑全局和局部一致性,且易受到噪声影响;聚类结果依赖由输入数据构造的相似度矩阵,且通过特征分解得到松弛划分矩阵和离散化过程的两步独立策略难以得到一个共同最优解。因此,提出一种结合共享近邻和流形距离的自适应谱聚类算法(SNN-MSC),引入一种新的具有指数项和比例因子的流形距离,可以灵活调整同一流形内数据的相似度和不同流形之间数据的相似度之比,并将密度因子纳入流形距离度量中,以消除噪声影响;采用共享近邻重新定义相似度度量,能挖掘数据点之间的空间结构和局部关系;同时,对拉普拉斯矩阵施加秩约束,使相似度矩阵中的连通分量完全等于簇个数,能够在优化求解过程中自适应优化数据相似度矩阵和聚类结构,无须再进行离散化操作。在人工数据集和UCI真实数据集上的对比实验显示,所提算法在多个聚类有效性指标上能体现出更好的性能。

关键词: 谱聚类, 流形距离, 共享近邻, 秩约束, 自适应

Abstract: Spectral clustering algorithm is built on the basis of graph theory.The clustering problem is transformed into the graph division problem,which can identify any shape of the cluster and easy to implement,so it has stronger adaptability than the traditional clustering algorithm.However,the distance measurement commonly used in this algorithm cannot consider both global and local consistency,and is easily affected by noise.The clustering results depend on the similarity matrix constructed from the input data,and the relaxation partition matrix obtained by feature decomposition and the two-step independent strategy of the dissociation process are difficult to obtain a common optimal solution.Therefore,an adaptive spectral clustering algorithm(SNN-MSC) combining shared nearest neighbors and manifold distance is proposed.A new manifold distance with exponential terms and sca-ling factors is introduced.It can flexibly adjust the similarity of data in the same manifold and the similarity ratio of data between different manifolds,and incorporate the density factor into the distance measurement of manifolds to eliminate the noise effect.The shared nearest neighbor is used to redefine the similarity measure,and the spatial structure and local relation between data points can be mined.At the same time,the rank constraint is applied to the Laplacian matrix so that the connected component in the similarity matrix is equal to the number of clusters.This method can adaptively optimize the data similarity matrix and clustering structure in the optimization process without any discretization operation.Through comparison experiments on artificial data sets and real data sets of UCI,the proposed algorithm shows better performance on multiple clustering validity indexes.

Key words: Spectral clustering, Manifold distance, Shared nearest neighbor, Rank constraint, Adaptation

中图分类号: 

  • TP181
[1]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM Computing Surveys,1999,31(3):264-323.
[2]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226.
[3]CHAKRABARTI S.Data mining for hypertext:A tutorial survey[J].ACM SIGKDD Explorations Newsletter,2000,1(2):1-11.
[4]SAMARIA F S,HARTER A C.Parameterisation of a stochastic model for human face identification[C]//Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.Seattle:IEEE,1994:138-142.
[5]DUAN X Y,LIU Y N,WANG X B.SDN enabled 5G-VANET:Adaptive vehicle clustering and beamformed transmission for aggregated traffic[J].IEEE Communications Magazine,2017,55(7):120-127.
[6]HU Z L,TANG J S,WANG Z M,et al.Deep learning for image-based cancer detection and diagnosis-A survey[J].Pattern Recognition,2018,83:134-149.
[7]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20.
[8]LI G B,ZHANG R F,CHEN C,et al.Rotation-invariant Deep Hierarchical Cluster Network for Point Cloud Analysis[J].Journal of Software,2022,33(11):4356-4378.
[9]TAO X M,GUO W J,REN C,et al.Density peak clusteringusing global and local consistency adjustable manifold distance[J].Information Sciences,2021,577:769-804.
[10]HE J,ZHOU J,WANG H Y,et al.DACA:Distributed adaptive grid decision graph based clustering algorithm[J].Software:Practice and Experience,2022,52(5):1199-1215.
[11]MELNYKOV V,WANG Y.Conditional mixture modeling andmodel-based clustering[J].Pattern Recognition,2023,133:108994.
[12]LI L S,WAN Z Q,HE H B.Incomplete multi-view clustering with joint partition and graph learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):589-602.
[13]MACQUEEN J B.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
[14]BEZDEK J C,EHRLICH R,FULL W.FCM:The fuzzy c-means clustering algorithm[J].Computers and Geosciences,1984,10(2/3):191-203.
[15]LI Z,TAO L.semi-supervised hierarchical clustering[C]//Proceedings of the 2011 IEEE 11th International Conference on Data Mining(ICDM).NW Washington:IEEE Computer Society,2011:982-991.
[16]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[17]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:ana-lysis and an algorithm[C]// 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.2001:849-856.
[18]WANG L J,DING S F,JIA H J.An improvement of spectral clustering via message passing and density sensitive similarity[J].IEEE Access,2019,7:101054-101062.
[19]LIU P J,LI H Y,WANG T Y,et al.Multi-stage Method for Online Vertical Data Partitioning Based on Spectral Clustering[J].Journal of Software,2023,34(6):2804-2832.
[20]DING S F,JIA H J,DU M J,et al.p-Spectral clustering based on neighborhood attribute granulation[C]//Intelligent Information Processing VIII:9th IFIP TC 12 International Conference.Melbourne:pringer International Publishing,2016:50-58.
[21]LIU X Y,LI J W,YU H,et al.Adaptive spectral clusteringbased on shared nearest neighbors[J].Journal of Chinese Computer Systems,2011,32(9):1876-1880.
[22]ZHANG T,GE H W.Spectral Clustering Based on Density Coefficient and Shared Nearest Neighbors[J].Journal of Chinese Computer Systems,2017,38(8):1829-1833.
[23]WANG Y R,DING S F,WANG L J,et al.An improved density-based adaptive p-spectral clustering algorithm[J].International Journal of Machine Learning and Cybernetics,2021,12:1571-1582.
[24]FISCHER B,ROTH V,BUHMANN J.Clustering with the connectivity kernel[J].Advances in Neural Information Processing Systems,2003,16:89-96.
[25]YANG P,ZHU Q S,HUANG B.Spectral clustering with density sensitive similarity function[J].Knowledge-Based Systems,2011,24(5):621-628.
[26]BIAN Z,ISHIBUCHI H,WANG S T.Joint learning of spectral clustering structure and fuzzy similarity matrix of data[J].IEEE Transactions on Fuzzy Systems,2018,27(1):31-44.
[27]MONNEY A,ZHAN Y Z,JIANG Z,et al.A multi-kernel me-thod of measuring adaptive similarity for spectral clustering[J].Expert Systems with Applications,2020,159:113570.
[28]HEIN M,AUDIBERT J Y,LUXBURG U V.Graph Laplacians and their convergence on random neighborhood graphs[J].Journal of Machine Learning Research,2007,8(9):1325-1370.
[29]FAN K.On a theorem of Weyl concerning eigenvalues of linear transformations I[J].Proceedings of the National Academy of Sciences,1949,35(11):652-655.
[30]BOYD S.Advances in convex optimization:Interior-point me-thods,cone programming,andapplications[C]//Proceedings of the 41st IEEE Conference on Decision and Control.Las Vegas:IEEE,2002:1-49.
[31]WU T F,TSAI P S,HU N T,et al.Combining turning point detection and Dijkstra's algorithm to search the shortest path[J].Advances in Mechanical Engineering,2017,9(2):1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!