大规模数据集下谱聚类算法的求解

计算机科学 ›› 2012, Vol. 39 ›› Issue (Z6): 312-314.

大规模数据集下谱聚类算法的求解

史卫亚，郭跃飞

(粮食信息处理与控制教育部重点实验室郑州450001)(河南工业大学信息科学与工程学院郑州450001) (复旦大学计算机科学与技术系上海200433)

出版日期:2018-11-16 发布日期:2018-11-16

Computation of Spectral Clustering Algorithm for Large-scale Data Set

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 谱聚类算法是一种流行的数据聚类方法，该算法使用特征分解技术计算邻接矩阵的特征解，但是在大规模数据集的情况下，因储存和计算的问题而无法进行求解。基于线性代数中对称矩阵的性质，提出使用部接矩阵的每一列作为迭代算法的输入样本，通过迭代计算出部接矩阵的特征解。所提算法的空间复杂度只有O(m)，时间复杂度也降低为O(pkm)。实验结果验证了算法的有效性。

关键词: 谱方法，部接矩阵，大数据集，特征分解

Abstract: Spectral clustering algorithm is a popular data clustering method. It uses eigen-decomposition technictue to extract the cigenvectors of the affinity matrix. But the method is infcasible for largcscale data set because of the store and computational problem. Motivated by the property of symmetric matrix, in this paper each column of the affinity matrix was used as the input sample for the iterative algorithm. hhe eigcnvectors of the affinity matrix could be iteratively computed. The space complexity of proposed method was only O(n)，the time complexity was reduced to O(pkm). The effectiveness of proposed method was validated from experimental results.

Key words: Spectral algorithm, Affinity matrix, I_argcscale data set, Eigen-decomposition

史卫亚，郭跃飞. 大规模数据集下谱聚类算法的求解[J]. 计算机科学, 2012, 39(Z6): 312-314. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed