计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 458-461.

• 大数据与数据挖掘 • 上一篇    下一篇

一种基于SimRank得分的谱聚类算法

李鹏清1, 李扬定1, 邓雪莲2, 李永钢1, 方月1   

  1. 广西师范大学广西多源信息挖掘与安全重点实验室 广西 桂林5410041
    广西中医药大学公共卫生与管理学院 南宁5302002
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 邓雪莲(1979-),女,硕士,主要研究方向为数据挖掘、复杂网络,E-mail:2183451435@qq.com
  • 作者简介:李鹏清(1993-),男,硕士生,主要研究方向为数据挖掘、机器学习,E-mail:1263647631@qq.com;李扬定(1986-),男,博士,主要研究方向为医学影像分析、数据挖掘;李永钢(1989-),男,硕士,主要研究方向为数据挖掘、机器学习;方 月(1992-),男,硕士生,主要研究方向为数据挖掘、机器学习。
  • 基金资助:
    本文受国家重点研发计划项目(2016YFB1000905),国家自然科学基金(61363009,61672177,61573270,81701780),广西自然科学/青年基金(2015GXNSFCB139011,2017GXNSFBA198221),广西多源信息挖掘与安全重点实验室开放基金(16-A-01-01,16-A-01-02),广西研究生教育创新计划项目(XYCSZ2017064,XYCSZ2017067,YCSW2017065),广西研究生创新计划项目(YCSW2018094)资助。

Spectral Clustering Algorithm Based on SimRank Score

LI Peng-qing1, LI Yang-ding1, DENG Xue-lian2, LI Yong-gang1, FANG Yue1   

  1. Guangxi Key Lab of Multi-source Information Mining & Security,Guangxi Normal University,Guilin,Guangxi 541004,China1
    School of Public Health and Management,Guangxi University of Chinese Medicine,Nanning 530200,China2
  • Online:2019-02-26 Published:2019-02-26

摘要: 传统的谱聚类算法在建立相似度矩阵时仅考虑数据点与点的距离,忽略了数据点之间隐含的内在联系。针对这一问题,提出了一种基于SimRank的谱聚类算法。该算法首先用无向图数据建立邻接矩阵,并计算出基于SimRank的相似度矩阵;然后根据相似度矩阵建立拉普拉斯矩阵表达式,对其进行归一化后再进行谱分解;最后对分解得到的特征向量进行k-means聚类。在Zoo等UCI标准数据集上的实验结果表明,所提算法在聚类精确度、标准互信息和纯度3个评价指标上均优于现有的LRR(Low Rank Rrepresentation)等基于距离相似度的谱聚类算法。

关键词: k-均值聚类, SimRank得分, 拉普拉斯矩阵, 邻接矩阵, 谱聚类, 相似度矩阵

Abstract: Traditional spectral clustering algorithms only consider distance between data points,ignoring their intrinsic relation.To deal with this problem,a spectral clustering method based on SimRank score was proposed.Firstly,the method computes the adjacency matrix of the undirected graph data,and obtains the similarity matrix based on SimRank.Secondly,a Laplacian matrix expression is constructed based on similarity matrix,which is then normalized followed by spectral decomposition.Finally,a k-means clustering procedure is performed on the obtained eigenvectors to obtain the final clustering results.Experimental results on benchmark datasets from UCI data repository show that the proposed algorithm is superior to the existing spectral clustering algorithms based on distance similarity in terms of clustering accuracy,standard mutual information and purity.

Key words: Adjacency matrix, k-means clustering, Laplace matrix, Similarity matrix, SimRank score, Spectral clustering

中图分类号: 

  • TP181
[1]刘紫涵,吴鹏海,吴艳兰,等.三种谱聚类算法及其应用研究[J].计算机应用研究,2017,34(4):1026-1031.
[2]MIGUEL C.On the diameter of the commuting graph of the matrix ring over a centrally finite division ring[J].Linear Algebra &Its Applications,2016,509:276-285.
[3]LI X,DU Y,WEI Y,et al.The research of concept context graph layer division based on six degrees of separation theory[J].Journal of Computational Information Systems,2013,9(22):9219-9226.
[4]ZHANG J M,SHEN Y X.Review on spectral methods for clustering[C]∥Control Conference.IEEE,2015:3791-3796.
[5]CHE W F,FENG G C.Spectral clustering:A semi-supervised approach[J].Neuro Computing,2012,77(1):119-228.
[6]ZHAO Y C,ZHANG S C.Generalized Dimension-Reduction Framework for Recent-Biased Time Series Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(2):231-244.
[7]LANGONE R,MALL R,ALZATE C,et al.Kernel Spectral Clustering and Applications[M]∥Unsupervised Learning Algorithms.Springer International Publishing,2016.
[8]李瑞琳,赵永华,黄小磊.一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现[J].计算机工程与科学,2016,38(5):839-847.
[9]LIU G,LIN Z,YAN S,et al.Robust recovery of subspace structures by low-rank representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):171-184.
[10]ELHAMIFAR E,VIDAL R.Sparse subspace clustering [C]∥CVPR.2009:2790-2797.
[11]LU C Y,MIN H,ZHAO Z Q,et al.Robust and efficient subspace segmentation via least squares regression[C]∥ECCV.2012:347-360.
[12]邹小林,冯国灿.基于正则割(Ncut)的多阈值图像分割方法[J].计算机工程与应用,2012,48(19):174-178.
[13]WANG S,SISKIND J M.Image Segmentation with Ratio Cut [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2003,25(6):675-690.
[14]SRINIVASARAO P,SURESH K,RAVI K B.Image Segmentation using Clustering Algorithms[J].International Journal of Computer Applications,2015,120:36-38.
[15]刘萍,黄纯万.基于SimRank的作者相似度计算[J].情报理论与实践,2015,38(6):109-114.
[16]ZHENG W,ZOU L,CHEN L,et al.Efficient SimRank-Based Similarity Join[J].Acm Transactions on Database Systems,2017,42(3):16.
[17]CHEN W F,FENG G C.Spectral clustering with discriminate cuts[J].Knowledge-Based Systems,2012,28(7):27-37.
[18]BOOBALAN M P,LOPEZ D,GAO X Z.Graph clustering using k-Neighbourhood Attribute Structural similarity[J].Applied Soft Computing,2016,47:216-223.
[19]ALZATE C,SUYKENS J A.Hierarchical kernel spectral clustering[J].Neural Networks,2012,35(2):21-30.
[20]刘敏,韩宾,郭有倩.一种改进的基于K-means的信息聚类算法研究[J].信息通信,2015(9):35-36.
[21]FANG R,POUYANFAR S,YANG Y,et al.Computational Health Informatics in the Big Data Age:A Survey[J].ACM Computing Surveys,2016,49(1):12.
[22]ZHU X F,LI X L,ZHANG S C.Block-Row Sparse Multiview Multilabel Learning for Image Classification[J].IEEE Transactions on Cybernetics,2016,46(2):450-461.
[23]李翠平.一种基于SimRank的结点相似度计算方法:CN104933312 A[P].2015.
[24]GAO Y,WANG M,TAO D C,et al.3-D object retrieval and recognition with hypergraph analysis [J].IEEE Transactions on Image Processing a Publication of the IEEE Signal Processing Society,2012,21(9):4290-4303.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍.
基于时空自适应图卷积神经网络的脑电信号情绪识别
EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network
计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200
[3] 张杰, 岳韶华, 王刚, 刘家义, 姚小强.
基于Stackelberg与边拉普拉斯矩阵的多智能体系统
Multi-agent System Based on Stackelberg and Edge Laplace Matrix
计算机科学, 2021, 48(8): 253-262. https://doi.org/10.11896/jsjkx.200700032
[4] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[5] 李鹏, 刘力军, 黄永东.
基于地标表示的联合谱嵌入和谱旋转的谱聚类算法
Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation
计算机科学, 2021, 48(6A): 220-225. https://doi.org/10.11896/jsjkx.210100167
[6] 龚追飞, 魏传佳.
基于改进AdaBoost算法的复杂网络链路预测
Link Prediction of Complex Network Based on Improved AdaBoost Algorithm
计算机科学, 2021, 48(3): 158-162. https://doi.org/10.11896/jsjkx.200600075
[7] 张晓琴, 安晓丹, 曹付元.
基于谱聚类的二分网络社区发现算法
Detecting Community from Bipartite Network Based on Spectral Clustering
计算机科学, 2019, 46(4): 216-221. https://doi.org/10.11896/j.issn.1002-137X.2019.04.034
[8] 刘树栋, 魏嘉敏.
基于谱聚类和成对数据表示的多层感知机分类算法
Multilayer Perceptron Classification Algorithm Based on Spectral Clusteringand Simultaneous Two Sample Representation
计算机科学, 2019, 46(11A): 194-198.
[9] 王颖,杨余旺.
基于堆和邻域共存信息的KNN相似图算法
KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence
计算机科学, 2018, 45(5): 196-200. https://doi.org/10.11896/j.issn.1002-137X.2018.05.033
[10] 王亮,田萱.
单幅散焦图像的局部特征模糊分割算法
Local Feature Fuzzy Segmentation Algorithm for Single Defocused Image
计算机科学, 2018, 45(2): 318-321. https://doi.org/10.11896/j.issn.1002-137X.2018.02.055
[11] 常家伟, 戴牡红.
基于PageRank和谱方法的个性化推荐算法
Personalized Recommendation Algorithm Based on PageRank and Spectral Method
计算机科学, 2018, 45(11A): 398-401.
[12] 陈俊芬, 张明, 何强.
基于启发式确定类数的NJW谱聚类算法
Heuristically Determining Cluster Numbers Based NJW Spectral Clustering Algorithm
计算机科学, 2018, 45(11A): 474-479.
[13] 王平心,刘强,杨习贝,米据生.
基于动态邻域的三支聚类分析
Three-way Clustering Analysis Based on Dynamic Neighborhood
计算机科学, 2018, 45(1): 62-66. https://doi.org/10.11896/j.issn.1002-137X.2018.01.009
[14] 李金泽,徐喜荣,潘子琦,李晓杰.
改进的自适应谱聚类NJW算法
Improved Adaptive Spectral Clustering NJW Algorithm
计算机科学, 2017, 44(Z6): 424-427. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.095
[15] 李效伦,丁志军.
LBSNs中的群体行程推荐方法
Group Travel Trip Recommendation Method in LBSNs
计算机科学, 2017, 44(6): 199-205. https://doi.org/10.11896/j.issn.1002-137X.2017.06.033
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!