计算机科学 ›› 2020, Vol. 47 ›› Issue (6): 92-97.doi: 10.11896/jsjkx.190500074

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于投影的鲁棒低秩子空间聚类算法

邢毓华, 李明星   

  1. 西安理工大学自动化与信息工程学院 西安710048 (704189141 qq.com)
  • 收稿日期:2019-05-15 出版日期:2020-06-15 发布日期:2020-06-10
  • 通讯作者: 李明星(15129012577@163.com)
  • 作者简介:704189141 qq.com
  • 基金资助:
    国家自然科学基金(51307140)

Robust Low Rank Subspace Clustering Algorithm Based on Projection

XING Yu-hua, LI Ming-xing   

  1. College of Automation and Information Engineering,Xi’an University of Technology,Xi’an 710048,China
  • Received:2019-05-15 Online:2020-06-15 Published:2020-06-10
  • About author:XING Yu-hua,born in 1966,master,associate professor.His main research interests include IoT communicationtech-nology and so on.
    LI Ming-xing,born in 1993,master.Her main research interests include communication of internet of things and so on.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (51307140)

摘要: 随着大数据时代的来临,如何对海量高维数据进行有效的聚类分析并充分利用,已成为当下的热门研究课题。传统的聚类算法在处理高维数据时,聚类结果的精确度和稳定性较低,而子空间聚类算法通过分割原始数据的特征空间来得到不同的特征子集,可以大幅减小数据之间不相关特征对聚类结果的影响,挖掘出高维数据中不易展现的信息,在处理高维数据时具有显著的优势。针对现有基于图的子空间聚类算法在处理未知类型噪声以及复杂的凸问题时存在局限性的问题,在子空间聚类算法的基础上,结合空间投影理论,提出了一种基于投影的鲁棒低秩子空间聚类算法。首先对原始数据进行投影,利用编码消除投影空间的噪声,并对缺失的数据进行弥补;然后利用一种新的方法l2图来构造稀疏相似图;最后在l2图的基础上进行子空间聚类。该算法不需要对噪声的类型具有先验知识,且l2图能够很好地描述高维数据稀疏性和空间分散的特征。选取3种人脸数据集作为实验数据集,首先确定影响聚类效果的最优参数,然后从准确度、鲁棒性、时间复杂度3个方面对算法进行验证。实验结果表明,在3种人脸数据集中混入未知类型的噪声时,该算法具有较高的准确率和较低的时间复杂度,并且具有好的鲁棒性。

关键词: l2图, 高维数据, 空间投影, 噪声, 子空间聚类

Abstract: With the advent of the era of big data,how to effectively cluster,analyze and effectively use massive amounts of high-dimensional data has become a hot research topic.When the traditional clustering algorithms are used to process high-dimensional data,the accuracy and stability of the clustering results are low.The subspace clustering algorithm can reduce the feature space of the original data to form different feature subsets,reduce the influence of uncorrelated features between data on clustering results.It can mine the information that is difficult to display in high-dimensional data,and has significant advantages in processing high-dimensional data.Aiming at the limitations of existing graph-based subspace clustering algorithms in dealing with unknown type noise and solving complex convex problems,based on subspace clustering algorithm,combined with spatial projection theory,this paper proposes a projection-based robust low-rank subspace clustering algorithm.Firstly,the original data is projected,the noise of the projection space is eliminated by coding and the missing data is compensated.Then a new method map is used to construct the sparse similarity l2 graph,and finally the subspace clustering is performed on the basis of the l2 graph.The algorithm does not need a priori knowledge of the type of noise,and the l2 graph can well describe the characteristics of high-dimensional data sparsity and spatial dispersion.Three datasets of face recognition are selected as experimental datasets.Firstly,the optimal parameters affecting the clustering effect are determined,and then the algorithm is verified from three aspects:accuracy,robustness and time complexity.The experimental results show that the algorithm has high accuracy,low time complexity and good robustness,when the unknown type of noise is mixed in the datasets of face recognition.

Key words: l2 graph, High dimensional data, Noise, Space projection, Subspace clustering

中图分类号: 

  • TP311
[1]LU L.Combined central and subspace clustering for computer vision applications[C]//International Conference on Machine Learning.ACM,2006:593-600.
[2]LEE M,CHO J,CHOI C H,et al.Procrustean Normal Distribution for Non-rigid Structure from Motion[C]//Computer Vision and Pattern Recognition.IEEE,2013:1280-1287.
[3]BOULT T E,BROWN L G.Factorization-based segmentation of motions[C]//Proceedings of the IEEE Workshop on Visual Motion.IEEE,2002:179-186.
[4]LIU D,JIANG M H,YANG X F,et al.Analyzing documents with Quantum Clustering:A novel pattern recognition algorithm based on quantum mechanics[J].Pattern Recognition Letters,2016,77:8-13.
[5]LU S,CHANG D.An image segmentation method based on dynamic artificial fish swarm algorithm[C]//2012 IEEE 11th International Conference on Signal Processing.IEEE,2012:980-983.
[6]ELHAMIFAR E,VIDAL R.Clustering disjoint subspaces via sparse representation[C]//IEEE International Conference on Acoustics Speech and Signal Processing.IEEE,2010:1926-1929.
[7]WU X,CHEN X M,LI X,et al.Adaptive subspace learning:an iterative approach for document clustering[J].Neural Computing and Applications,2014,25(2):333-342.
[8]BAI J C,LI J C,DAI P F,et al.General parameterized proximal point algorithm with applications in statistical learning[J].International Journal of Computer Mathematics,2019,96(1):199-215.
[9]WU Y,ZHANG Z,HUANG T S,et al.Multibody Grouping via Orthogonal Subspace Decomposition[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR 2001).IEEE,2001:II-252-II-257 vol.2.
[10]YAN J,POLLEFEYS M.A General Framework for Motion Segmentation:Independent,Articulated,Rigid,Non-rigid,Degenerate and Non-degenerate[C]//European Conference on Computer Vision.Springer Berlin Heidelberg,2006:94-106.
[11]LU C Y,MIN H,ZHAO Z Q,et al.Robust and Efficient Subspace Segmentation via Least Squares Regression[C]//European Conference on Compater Vision.Berlin:Springer,2014:347-360.
[12]LIU G,LIN Z,YAN S,et al.Robust recovery of subspace structures by low-rank representation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2012,35(1):171-184.
[13]ELHAMIFAR E,VIDAL R.Sparse Subspace Clustering:Algorithm,Theory,and Applications[J].IEEE transactions on pattern analysis and machine intelligence,2013,35(11):2765-2781.
[14]CHEN J,ZHANG H,MAO H,et al.Symmetric low-rank representation for subspace clustering[J].Neurocomputing,2014,173(P3):1192-1202.
[15]XIA C Q,HAN K,QI Y,et al.A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data[J].IEEE/ACM Tran-sactions on Computational Biology and Bioinformatics,2018,15(4):1315-1324.
[16]LUXBURG,ULRIKE.A tutorial on spectral clustering[J].Statistics & Computing,2007,17(4):395-416.
[17]PAPADAKIS N,BUGEAU A.Tracking with occlusions via graph cuts[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2010,33(1):144-157.
[18]HE X,YAN S,HU Y,et al.Face recognition using laplacian faces[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence.2005:328-340.
[19]WANG J,YANG J,YU K,et al.Locality-constrained linear coding for image classification[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:3360-3367.
[20]FAVARO P,VIDAL R,RAVICHANDRAN A.A closed form solution to robust subspace estimation and clustering[C]//Computer Vision and Pattern Recognition.IEEE,2011:1801-1807.
[1] 周慧, 施皓晨, 屠要峰, 黄圣君.
基于主动采样的深度鲁棒神经网络学习
Robust Deep Neural Network Learning Based on Active Sampling
计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044
[2] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[3] 唐超尘, 仇洪冰, 刘鑫, 唐清华.
非均匀白噪声条件下的相干MIMO雷达角度估计
Angle Estimation of Coherent MIMO Radar Under the Condition of Non-uniform Noise
计算机科学, 2022, 49(5): 262-265. https://doi.org/10.11896/jsjkx.210300162
[4] 赵亮, 张洁, 陈志奎.
基于双图正则化的自适应多模态鲁棒特征学习
Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization
计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[5] 刘意, 毛莺池, 程杨堃, 高建, 王龙宝.
基于邻域一致性的异常检测序列集成方法
Locality and Consistency Based Sequential Ensemble Method for Outlier Detection
计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156
[6] 郑建炜, 黄娟娟, 秦梦洁, 徐宏辉, 刘志.
基于非局部相似及加权截断核范数的高光谱图像去噪
Hyperspectral Image Denoising Based on Non-local Similarity and Weighted-truncated NuclearNorm
计算机科学, 2021, 48(9): 160-167. https://doi.org/10.11896/jsjkx.200600135
[7] 陶星朋, 徐宏辉, 郑建炜, 陈婉君.
基于非凸低秩矩阵逼近和全变分正则化的高光谱图像去噪
Hyperspectral Image Denoising Based on Nonconvex Low Rank Matrix Approximation and TotalVariation Regularization
计算机科学, 2021, 48(8): 125-133. https://doi.org/10.11896/jsjkx.200400143
[8] 赵敏, 刘惊雷.
基于高斯场和自适应图正则的半监督聚类
Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization
计算机科学, 2021, 48(7): 137-144. https://doi.org/10.11896/jsjkx.200800190
[9] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[10] 黄雪冰, 魏佳艺, 沈文宇, 凌力.
基于自适应加权重复值滤波和同态滤波的MR图像增强
MR Image Enhancement Based on Adaptive Weighted Duplicate Filtering and Homomorphic Filtering
计算机科学, 2021, 48(6A): 21-27. https://doi.org/10.11896/jsjkx.200800183
[11] 王中元, 刘惊雷.
基于二阶近邻的核子空间聚类
Kernel Subspace Clustering Based on Second-order Neighbors
计算机科学, 2021, 48(6): 86-95. https://doi.org/10.11896/jsjkx.200800180
[12] 林云, 黄桢航, 高凡.
扩散式变阶数最大相关熵准则算法
Diffusion Variable Tap-length Maximum Correntropy Criterion Algorithm
计算机科学, 2021, 48(5): 263-269. https://doi.org/10.11896/jsjkx.200300043
[13] 石琳姗, 马创, 杨云, 靳敏.
基于SSC-BP神经网络的异常检测算法
Anomaly Detection Algorithm Based on SSC-BP Neural Network
计算机科学, 2021, 48(12): 357-363. https://doi.org/10.11896/jsjkx.201000086
[14] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[15] 巫勇, 刘永坚, 唐瑭, 王洪林, 郑建成.
基于鲁棒低秩张量恢复的高光谱图像去噪
Hyperspectral Image Denoising Based on Robust Low Rank Tensor Restoration
计算机科学, 2021, 48(11A): 303-307. https://doi.org/10.11896/jsjkx.210200103
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!