计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 474-479.

• 大数据与数据挖掘 • 上一篇    下一篇

基于启发式确定类数的NJW谱聚类算法

陈俊芬1, 张明1, 何强2   

  1. 河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室 保定 0710021
    北京建筑大学理学院 北京1000442
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 陈俊芬(1976-),女,博士,副教授,CCF会员,主要研究方向为数据分析与机器学习,E-mail:chenjunfen2010@126.com
  • 作者简介:张 明(1993-),男,硕士生,主要研究方向为数据分析和深度学习;何 强(1977-),男,博士,副教授,主要研究方向为机器学习。
  • 基金资助:
    本文受河北省自然科学基金(F2016201161),国家自然科学基金(61473111)资助。

Heuristically Determining Cluster Numbers Based NJW Spectral Clustering Algorithm

CHEN Jun-fen1, ZHANG Ming1, HE Qiang2   

  1. Hebei Key Laboratory of Machine Learning and Computational Intelligence,College of Mathematics and Information Sciences,Hebei University,Baoding 071002,China1
    School of Science,Beijing University of Civil Engineering and Architecture,Beijing,100044,China2
  • Online:2019-02-26 Published:2019-02-26

摘要: 基于图论理论的NJW谱聚类算法的核心思想是将数据点映射到特征空间后再利用K-means算法进行聚类,从而得到原始数据的聚类结果。NJW算法是K-means算法的推广,并且在任意形状的数据上都具有较好的聚类效果,从而有着广泛的应用。但是,类数C和高斯核函数中的尺度参数σ较大程度地影响着NJW的聚类性能;另外,K-means对随机初始值的敏感性也影响着NJW的聚类结果。为此,一种基于启发式确定类数的谱聚类算法(记为DP-NJW)被提出。该算法先根据数据的密度分布确定类中心点和类数,这些类中心点作为特征空间中K-means聚类的初始类中心,然后用NJW进行聚类。文中通过实验将DP-NJW算法和经典聚类算法在7个公共数据集上进行测试和对比,其中DP-NJW算法在5个数据集上的聚类精度高于NJW的平均聚类精度,在另2个数据集上二者持平。对比DPC算法,所提算法在5个数据集上也有不俗的聚类精度,而且DP-NJW的计算消耗较小,在较大的数据集aggregation上表现更为突出。实验结果表明,文中所提的DP-NJW算法更具优势。

关键词: NJW聚类, 尺度参数, 高斯核函数, 聚类精度, 谱聚类

Abstract: The main idea of NJW is to project data points into feature space and then to cluster using K-means algorithm,while the clustering results is considered as the clustering of the original data points.However,the number of clusters C and scaling parameter σ of Gaussian kernel function greatly influence the clustering performance of NJW algorithm,whilst the fact that K-means algorithm is sensitive to initial cluster centers also influences the clustering result of NJW algorithm.To this end,an improved NJW algorithmis presented referring to DP-NJW algorithm which heuristically determines cluster-center points and the number of clusters according to density distribution of data points,and then implements NJW algorithm to cluster.Note that,the obtained cluster-center points and the number of clusters in the first stage are the initialization of K-means algorithm in the second stage of the proposed DP-NJW algorithm.In the next section,DP-NJW is compared with state-of-the-art clustering algorithms on the given seven public datasets,where DP-NJW provided higher clustering accuracy than NJW on five datasets and even on the other two datasets.DP-NJW algorithm provided better clustering performance than DPC on the given five datasets.In addition,DP-NJW consumed less computing time than the other two algorithms,and this is especially obvious on the larger aggregation dataset.Overall,DP-NJW algorithm is superior to the state-of-the-art clustering algorithms.

Key words: Clustering accuracy, Gaussian kernel function, NJW clustering, Scaling parameter, Spectral clustering

中图分类号: 

  • TP181
[1]MACQUEENJ B.Some Methods for Classification and Analysis of Multivariate Observations[C]∥Proceedings of Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
[2]MACQUEENJ B.The Classification Problem[Z].Western Management Science Institute Working Paper,1962.
[3]COXD R.Note on Grouping[J].Journal of the American Statistical Association,1957,52(280):543-547.
[4]FISHERW D.On Grouping for Maximum Homogeneity[J]. Journal of the American Statistical Association,1958,53(284):789-798.
[5]SEBESTYENG S.Decision Making Process in Pattern Recognition[M].New York:Macmillan,1962.
[6]JAINA K.Data Clustering:50 Years Beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
[7]BOCK H H.Clustering Methods:A History of K-means Algorithms[M]∥Selected Contributions in Data Analysis and Classification.Springer Berlin Heidelberg,2007:161-172.
[8]王千,王成,冯振元,等.K-means聚类算法研究综述[J].电子设计工程,2012,20(7):21-24.
[9]PERONAP,FREEMAN W.A Factorization Approach to Grou-ping [C]∥Proceedings of European Conference on Computer Vision (ECCV).1998:655-670.
[10]NGA Y,JORDANM I,WEISS Y.On Spectral Clustering:Analysis and an Algorithm[C]∥Proceedings of the 14th Advances in Neural Information Processing System.2002:849-856.
[11]LI C G,YOU C,VIDAL R.Structured Sparse Subspace Clustering:A Joint Affinity Learning and Subspace Clustering Framework[J].IEEE Transactions on Image Processing,2017,26(6):2988-3001.
[12]LI C G,YOU C,VIDAL R.On Geometric Analysis of Affine Sparse Subspace Clustering[J].IEEE Journal on Selected Topics in Signal Processing,2018,12(6).
[13]LI C G,ZHANG J J,GUO J.Constrained Sparse Subspace Clustering with Side Information[C]∥Proceedings of the 24th International Conference on Pattern Recognition (ICPR).2018:2093-2099.
[14]LIANG J Q,HAN Y H,HU Q H.Semi-Supervised Image Clustering with Multimodal Information[J].Multimedia Systems,2016,22:149-160.
[15]LUXBURGU V.A Tutorial on Spectral Clustering[J].Statistics & Computing,2007,17(4):395-416.
[16]金建国.聚类方法综述[J].计算机科学,2014,41(S2):288-293.
[17]ZELNIK-MANORL,PERONA P.Self-Tuning Spectral Clustering[C]∥Proceedings of the 16th Advances in Neural Information Processing System.2004:1601-1608.
[18]WANG F,ZHANG C S.Robust Self-Tuning Semi-Supervised Learning[J].Neurocomputing,2007,70(16):2931-2939.
[19]YANG C,ZHANG X,JIAO L,et al.Self-Tuning Semi-Super-vised Spectral Clustering[C]∥International Conference on Computational Intelligence & Security.2008:1-5.
[20]KUMARV,HAHNJ,ZOUBIR A M.Band Selection for Hyperspectral Images Based on Self-Tuning Spectral Clustering[C]∥Proceedings of the 21st EuropeanSignal Processing Conference (EUSIPCO).2013.
[21]POLITOM,PERONAP.Grouping and Dimensionality Reduction by Locally Linear Embedding[C]∥Proceedings of International Conference on Neural Information Processing Systems:Natural &Synthetic.2001:1255-1262.
[22]ZENGC P,ZHOUA M,ZHANGG X.Self-adaptive Spectra Cluster Number Detecting with Particle Swarm Optimization Algorithm[C]∥Evolutionary Computation.IEEE,2016:4607-4611.
[23]CHAN P,SCHLAG M,ZIEN S.Spectral K-Way Ratio-Cut Partitioning and Clustering[C]∥Proceedings of Conference on Design Automation.1993:749-754.
[24]TENENBAUM J B,SILVAV D,LANGFORD J C.A Globa Geometric Framework for Nonlinear Dimensionality Reduction[J].Science,2000,290(5500):2319-2323.
[25]RODRIGUEZ A,LAIO A.Clustering by Fast Search and Find of Density Peaks[J].Science,2014,344(6191):1492-1496.
[26]李金泽,徐喜荣,潘子琦,等.改进的自适应谱聚类NJW 算法[J].计算机科学,2017,44(S1):424-427.
[27]周海松,黄德才.密度自适应的半监督谱聚类算法[J].计算机科学,2016,43(12):209-212.
[28]杨虎,付宇,范丹.噪音特征对聚类内部有效性的影响[J].计算机科学,2018,45(7):22-30.
[1] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[2] 李鹏, 刘力军, 黄永东.
基于地标表示的联合谱嵌入和谱旋转的谱聚类算法
Landmark-based Spectral Clustering by Joint Spectral Embedding and Spectral Rotation
计算机科学, 2021, 48(6A): 220-225. https://doi.org/10.11896/jsjkx.210100167
[3] 张晓琴, 安晓丹, 曹付元.
基于谱聚类的二分网络社区发现算法
Detecting Community from Bipartite Network Based on Spectral Clustering
计算机科学, 2019, 46(4): 216-221. https://doi.org/10.11896/j.issn.1002-137X.2019.04.034
[4] 刘树栋, 魏嘉敏.
基于谱聚类和成对数据表示的多层感知机分类算法
Multilayer Perceptron Classification Algorithm Based on Spectral Clusteringand Simultaneous Two Sample Representation
计算机科学, 2019, 46(11A): 194-198.
[5] 王颖,杨余旺.
基于堆和邻域共存信息的KNN相似图算法
KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence
计算机科学, 2018, 45(5): 196-200. https://doi.org/10.11896/j.issn.1002-137X.2018.05.033
[6] 王亮,田萱.
单幅散焦图像的局部特征模糊分割算法
Local Feature Fuzzy Segmentation Algorithm for Single Defocused Image
计算机科学, 2018, 45(2): 318-321. https://doi.org/10.11896/j.issn.1002-137X.2018.02.055
[7] 常家伟, 戴牡红.
基于PageRank和谱方法的个性化推荐算法
Personalized Recommendation Algorithm Based on PageRank and Spectral Method
计算机科学, 2018, 45(11A): 398-401.
[8] 李鹏清, 李扬定, 邓雪莲, 李永钢, 方月.
一种基于SimRank得分的谱聚类算法
Spectral Clustering Algorithm Based on SimRank Score
计算机科学, 2018, 45(11A): 458-461.
[9] 王平心,刘强,杨习贝,米据生.
基于动态邻域的三支聚类分析
Three-way Clustering Analysis Based on Dynamic Neighborhood
计算机科学, 2018, 45(1): 62-66. https://doi.org/10.11896/j.issn.1002-137X.2018.01.009
[10] 李金泽,徐喜荣,潘子琦,李晓杰.
改进的自适应谱聚类NJW算法
Improved Adaptive Spectral Clustering NJW Algorithm
计算机科学, 2017, 44(Z6): 424-427. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.095
[11] 李效伦,丁志军.
LBSNs中的群体行程推荐方法
Group Travel Trip Recommendation Method in LBSNs
计算机科学, 2017, 44(6): 199-205. https://doi.org/10.11896/j.issn.1002-137X.2017.06.033
[12] 柯祖福,易本顺,谢秋莹.
基于非局部自相似性的谱聚类图像去噪算法
Image Denoising Method of Spectrum Clustering Based on Non-local Similarity
计算机科学, 2017, 44(5): 299-303. https://doi.org/10.11896/j.issn.1002-137X.2017.05.055
[13] 覃晓,梁伟,元昌安,唐涛.
基于遗传优化谱聚类的图形分割方法
Image Segmentation Algorithm of Spectral Clustering Optimized by Genetic
计算机科学, 2017, 44(1): 100-102. https://doi.org/10.11896/j.issn.1002-137X.2017.01.019
[14] 范菁,阮体洪,吴佳敏,董天阳.
基于二次谱聚类和HMM-RF混合模型的车辆行为识别方法研究
Vehicle Behavior Recognition Method Based on Quadratic Spectral Clustering and HMM-RF Hybrid Model
计算机科学, 2016, 43(5): 288-293. https://doi.org/10.11896/j.issn.1002-137X.2016.05.055
[15] 王有华,陈笑蓉.
基于Kolmogorov复杂性的文本聚类算法改进
Improved Text Clustering Algorithm Based on Kolmogorov Complexity
计算机科学, 2016, 43(5): 243-246. https://doi.org/10.11896/j.issn.1002-137X.2016.05.045
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!