计算机科学 ›› 2021, Vol. 48 ›› Issue (6A): 220-225.doi: 10.11896/jsjkx.210100167

• 大数据&数据科学 • 上一篇    下一篇

基于地标表示的联合谱嵌入和谱旋转的谱聚类算法

李鹏, 刘力军, 黄永东   

  1. 大连民族大学理学院 辽宁 大连116600
  • 出版日期:2021-06-10 发布日期:2021-06-17
  • 通讯作者: 刘力军(manopt@163.com)
  • 作者简介:lipeng17835@163.com
  • 基金资助:
    国家自然科学基金(61002039,61572018,11761001)

Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation

LI Peng, LIU Li-jun, HUANG Yong-dong   

  1. School of Science,Dalian Minzu University,Dalian,Liaoning 116600,China
  • Online:2021-06-10 Published:2021-06-17
  • About author:LI Peng,master student.His main research interests include machine lear-ning and optimization method.
    LIU Li-jun,Ph.D,associate professor.His main research interests include neural networks,machine learning and optimization method.
  • Supported by:
    National Natural Science Foundation of China(61002039,61572018,11761001).

摘要: 经典的谱聚类算法包含两个步骤。(1)谱嵌入过程:求解Laplacian矩阵的特征值分解,得到分类指示矩阵的连续松弛解。(2)后处理过程:对谱嵌入连续松弛矩阵应用k-means或者谱旋转,得到最终的二值指示矩阵。由于有用信息的丢失,这种单独求解步骤不能保证最佳聚类结果。同时,谱聚类算法在处理大规模数据集时,存在聚类精度低、数据相似度矩阵存储开销大和 Laplacian 矩阵特征值分解计算复杂度高的问题。已有的联合谱聚类算法使用标准正交矩阵逼近非标准正交簇指示矩阵,这会导致较大的逼近误差。为了克服这一缺点,提出用一个改进的标准正交簇指示矩阵代替非正交指示矩阵,得到一个新的联合谱嵌入和谱旋转的谱聚类算法。因为两个标准正交矩阵更容易最小化,所以提出的算法可以取得更好的性能。进一步通过地标点方法对原始数据集进行稀疏特征表示,提出一种基于地标表示的联合谱嵌入和谱旋转算法(LJSESR),解决了大规模数据谱聚类的高效求解问题。实验结果表明,提出的LJSESR 算法具有可行性和有效性。

关键词: 地标表示, 联合谱聚类, 谱聚类, 谱嵌入, 谱旋转

Abstract: Classical spectral clustering algorithms consist of two separate stages.One is spectral embedding,computing eigenvalue decomposition of a Laplacian matrix to obtain a relaxed continuous indication matrix.The other is post processing,applying k-means or spectral rotation to round the real matrix into the binary cluster indicator matrix.Such a separate scheme is not guaranteed to achieve jointly optimal result because of the loss of useful information.Meanwhile,there are difficulties of low clustering precision,high storage cost for the similarity matrix and high computational complexity for the eigenvalue decomposition of Laplacian matrix.The existing joint model adopts an orthonormal real matrix to approximate the orthogonal but nonorthonormal cluster indicator matrix.The error of approximating a nonorthonormal matrix is inevitably large.To overcome the drawback,we propose replacing the nonorthonormal cluster indicator matrix with an improved orthonormal cluster indicator matrix.The proposed method is capable of obtaining better performance because it is easy to minimize the difference between two orthonormal matrices.Furthermore,a novel landmark-based joint spectral embedding and spectral rotation algorithm is proposed based on the sparse representation by landmark points,which greatly solves the effective computation of spectral clustering for large scale dataset.Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.

Key words: Joint spectral clustering, Landmark representation, Spectral clustering, Spectral embedding, Spectral rotation

中图分类号: 

  • TP181
[1] JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM Computing Surveys,1999,31(3):264-323.
[2] MACQUEEN J.Some methods for classification and analysis of multivariate observations[J].Berkeley Symposium on Mathematical Statistics and Probability,1967,1(5):281-297.
[3] FU G Y.Optimization methods for fuzzy clustering[J].Fuzzy Sets and Systems,1998,93(3):301-309.
[4] KAUFMAN L,ROUSSEEUW P J.Finding Groups in Data:An Introduction to Cluster Analysis[J].John Wiley & Sons,New York,USA,1990.
[5] LUXBURG U V.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416.
[6] JIN J G.Review of Clustering Method[J].Computer Science,2014,41(S2):288-293.
[7] LI P Q,LI Y D,DENG X L,et al.Spectral Clustering Algorithm Based on SimRank Score[J].Computer Science,2018,45(S2):458-461.
[8] LI X Y,GUO L J.Constructing affinity matrix in spectral clustering based on neighbor propagation[J].Neurocomputing,2012,97:125-130.
[9] TAO X M,WANG R T,CHANG R,et al.Low Density Separation Density Sensitive Distance-based Spectral Clustering Algorithm[J].Acta Automatica Sinica,2020,46(7):1479-1495.
[10] CHEN J F,ZHANG M,HE Q.Heuristically Determining Cluster Numbers Based NJW Spectral Clustering Algorithm[J].Computer Science,2018,45(S2):474-479.
[11] ZHANG R G,YAO X L,ZHAO J,et al.Manifold Spectral Clustering Image Segmentation Algorithm Based on Local Geometry Features[J].Pattern Recognition and Artificial Intelligence,2020,33(4):313-324.
[12] CHARLESS F,SERGE B,FAN C,et al.Spectral grouping using the Nyström method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(2):214-225.
[13] QIU Y F,LIU C.Spectral Clustering Algorithm Based onWeighted Ensemble Nyström Sampling[J].Pattern Recognition and Artificial Intelligence,2019,32(5):420-428.
[14] CHEN X L,CAI D.Large scale spectral clustering with landmark-based representation[C]//AAAI Conference on Artificial Intelligence.San Francisco,USA,2011:313-318.
[15] CAI D,CHEN X L.Large scale spectral clustering via landmark-based sparse representation[J].IEEE Transactions on Cybernetics,2015,45(8):1669-1680.
[16] YE M,LIU W F.Large Scale Spectral Clustering Based on Fast Landmark Sampling[J].Journal of Electronics & Information Technology,2017,39(2):278-284.
[17] ZHANG X C,ZONG L L,YOU Q Z,et al.Sampling forNyström extension based spectral clustering:incremental perspective and novel analysis[J].ACM Transactions on Know-ledge Discovery from Data,2016,11(1):1-25.
[18] CHEN X J,NIE F P,HUANG Z X,et al.Scalable normalized cut with improved spectral rotation[C]//Twenty-Sixth International Joint Conference on Artificial Intelligence.2017:1518-1524.
[19] CHU D R,ZHOU Z P.An autoencoder spectral clustering algorithm for improving landmark representation by weighted PageRank[J].CAAI Transactions on Intelligent Systems,2020,15(2):302-309.
[20] ZHANG M,ZHOU Z P.An autoencoder-based spectral clustering algorithm combined with metric fusion and landmark representation[J].CAAI Transactions on Intelligent Systems,2020,15(4):687-696.
[21] YU S X,SHI J B.Multiclass spectral clustering[C]//Proceedings of the 9th IEEE International Conference on Computer Vision(ICCV'03).USA,2003:313-319.
[22] HUANG J,NIE F P,HUANG H.Spectral rotation versus k-means in spectral clustering[C]//Proceedings of the AAAI Confe rence on Artificial Intelligence(AAAI'13).AAAI Press,2013:431-437.
[23] YANG Y,SHEN F M,HUANG Z,et al.A unified framework for discretespectral clustering[C]//International Joint Confe-rence on Artificial Intelligence.2016:2273-2279.
[24] PANG Y W,XIE J,NIE F P,et al.Spectral clustering by joint spectral embedding and spectral rotation[J].IEEE Transactions on Cybernetics,2020,50(1):247-258.
[25] ZHU J T,JANG J L,LIU T,et al.Joint spectral clustering based on optimal graph and feature selection[J].Neural Processing Letters,2020(11):1-17,
[26] ZHOU L,PING X J,XU S,et al.Cluster Ensemble Based on Spectral Clustering[J].Acta Automatica Sinica,2012,38(8):1335-1342.
[27] STOER M,WAGNER F.A simple min-cut algorithm[J].Journal of the ACM,1997,44(4):585-591.
[28] SHI J B,JITENDRA.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,11(9):888-905.
[29] NG A Y,JORDAN M I,WEISS Y.On spectral clustering:Analysis and an algorithm[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge,MA,USA,2002:849-856.
[30] LIU W,HE J F,CHANG S F.Large graph construction forscalable semi-supervised learning[C]//Proceeding,Twenty-Se-venth International Conference on Machine Learning.Haifa,Israel,2010:679-686.
[31] JOURNÉE M,NESTEROV Y,RICHTÁRIK P,et al.Genera-lized power method for sparse principal component analysis[J].Journal of Machine Learning Research,2010,11(2):517-553.
[32] DHEERU D,TANISKIDOU E K.UCI Machine Learning Repository[EB/OL].http://archive.ics.uci.edu/ml.
[1] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[2] 张晓琴, 安晓丹, 曹付元.
基于谱聚类的二分网络社区发现算法
Detecting Community from Bipartite Network Based on Spectral Clustering
计算机科学, 2019, 46(4): 216-221. https://doi.org/10.11896/j.issn.1002-137X.2019.04.034
[3] 李长镜,赵书良,池云仙.
一种基于谱嵌入和局部密度的离群点检测算法
Outlier Detection Algorithm Based on Spectral Embedding and Local Density
计算机科学, 2019, 46(3): 260-266. https://doi.org/10.11896/j.issn.1002-137X.2019.03.039
[4] 刘树栋, 魏嘉敏.
基于谱聚类和成对数据表示的多层感知机分类算法
Multilayer Perceptron Classification Algorithm Based on Spectral Clusteringand Simultaneous Two Sample Representation
计算机科学, 2019, 46(11A): 194-198.
[5] 王颖,杨余旺.
基于堆和邻域共存信息的KNN相似图算法
KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence
计算机科学, 2018, 45(5): 196-200. https://doi.org/10.11896/j.issn.1002-137X.2018.05.033
[6] 王亮,田萱.
单幅散焦图像的局部特征模糊分割算法
Local Feature Fuzzy Segmentation Algorithm for Single Defocused Image
计算机科学, 2018, 45(2): 318-321. https://doi.org/10.11896/j.issn.1002-137X.2018.02.055
[7] 常家伟, 戴牡红.
基于PageRank和谱方法的个性化推荐算法
Personalized Recommendation Algorithm Based on PageRank and Spectral Method
计算机科学, 2018, 45(11A): 398-401.
[8] 李鹏清, 李扬定, 邓雪莲, 李永钢, 方月.
一种基于SimRank得分的谱聚类算法
Spectral Clustering Algorithm Based on SimRank Score
计算机科学, 2018, 45(11A): 458-461.
[9] 陈俊芬, 张明, 何强.
基于启发式确定类数的NJW谱聚类算法
Heuristically Determining Cluster Numbers Based NJW Spectral Clustering Algorithm
计算机科学, 2018, 45(11A): 474-479.
[10] 王平心,刘强,杨习贝,米据生.
基于动态邻域的三支聚类分析
Three-way Clustering Analysis Based on Dynamic Neighborhood
计算机科学, 2018, 45(1): 62-66. https://doi.org/10.11896/j.issn.1002-137X.2018.01.009
[11] 李金泽,徐喜荣,潘子琦,李晓杰.
改进的自适应谱聚类NJW算法
Improved Adaptive Spectral Clustering NJW Algorithm
计算机科学, 2017, 44(Z6): 424-427. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.095
[12] 李效伦,丁志军.
LBSNs中的群体行程推荐方法
Group Travel Trip Recommendation Method in LBSNs
计算机科学, 2017, 44(6): 199-205. https://doi.org/10.11896/j.issn.1002-137X.2017.06.033
[13] 柯祖福,易本顺,谢秋莹.
基于非局部自相似性的谱聚类图像去噪算法
Image Denoising Method of Spectrum Clustering Based on Non-local Similarity
计算机科学, 2017, 44(5): 299-303. https://doi.org/10.11896/j.issn.1002-137X.2017.05.055
[14] 覃晓,梁伟,元昌安,唐涛.
基于遗传优化谱聚类的图形分割方法
Image Segmentation Algorithm of Spectral Clustering Optimized by Genetic
计算机科学, 2017, 44(1): 100-102. https://doi.org/10.11896/j.issn.1002-137X.2017.01.019
[15] 范菁,阮体洪,吴佳敏,董天阳.
基于二次谱聚类和HMM-RF混合模型的车辆行为识别方法研究
Vehicle Behavior Recognition Method Based on Quadratic Spectral Clustering and HMM-RF Hybrid Model
计算机科学, 2016, 43(5): 288-293. https://doi.org/10.11896/j.issn.1002-137X.2016.05.055
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!