计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 137-144.doi: 10.11896/jsjkx.200800190
赵敏, 刘惊雷
ZHAO Min, LIU Jing-lei
摘要: 聚类是将给定的样本分成几个不同的簇,它在机器学习、数据挖掘等领域得到了广泛应用,并受到研究人员的广泛关注。但是,传统的聚类方法仍然存在3个方面的不足。首先,由于一些数据中存在噪声和异常值,传统的聚类方法容易产生误差较大的目标函数。其次,传统的聚类方法没有使用监督信息来指导构建相似矩阵。最后,加入图正则的聚类方法在计算相似度矩阵时,邻居关系都是确定的,一旦计算错误就会导致构造图的质量低,进而影响聚类性能。因此,提出了一种基于高斯场和自适应图正则化的半监督聚类(SCGFAG)模型。该模型通过高斯场及谐波函数法引入监督信息,来指导构建相似度矩阵,实现半监督学习,还引入稀疏误差矩阵来表示稀疏噪声,如脉冲噪声、死线和条纹,并且使用l1范数来缓解稀疏噪声。此外,所提模型还引入l2,1范数来处理异常值的影响。因此,SCGFAG对数据噪声和异常值不敏感。更重要的是,SCGFAG通过引入自适应图的正则化提高了聚类性能。为了实现优化聚类的目标,提出了一种迭代更新算法—增广拉格朗日法(Augmented Lagrangian Method,ALM),分别对优化变量进行更新。在4个数据集上进行的实验表明,所提方法优于相比较的8种经典聚类方法获得了更好的聚类性能。
中图分类号:
[1]RAO S R,TRON R,VIDAL R,et al.Motion segmentation in the presence of outlying,incomplete,or corrupted trajectories[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(10):1832-1845. [2]SHI J,MALIK J.Normalized cuts and image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905. [3]LIU G,LIN Z,YAN S,et al.Robust Recovery of SubspaceStructures by Low-Rank Representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):171-184. [4]HARTIGAN J A,WONG M A.A K-means Clustering Algorithm:Algorithm AS 136[J].Applied Statistics,1979,28(1):100-108. [5]VON LUXBURG U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17(4):395-416. [6]ZHOU S H,ZHU E,LIU X W,et al.Subspace segmentation-based robust multiple kernel clustering[J].Information Fusion,2020,53:145-154 [7]ZHOU S,LIU X,ZHU C,et al.Spectral clustering-based local and global structure preservation for feature selection[C]//International Joint Conference on Neural Networks.IEEE,2014. [8]DING C,LI T,JORDAN M.Convex and semi-nonnegative matrix factorizations[J].IEEE Transactions on Software Enginee-ring,2010,32(1):45-55. [9]ZHANG Y,KONG X W,WANG Z F,et al.Cluster analysis based on multi-view matrix Decomposition[J].Acta Automata Sinica,2018,44(12):2160-2169. [10]DING Y,LI Y Z.Intrusion detection Algorithm based on PCA and Semi-supervised clustering[J].Journal of Shandong University (Engineering Science),2012,42(5):41-46. [11]BASU S,BILENKO M,MOONEY R J,et al.A probabilisticframework for semi-supervised clustering[C]//Knowledge Discovery and Data Mining.2004:59-68. [12]LIU H.Constrained nonnegative matrix factorization for image representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(7):1299-1311. [13]CAI D,HE X,HAN J,et al.Graph regularized non-negative matrix factorization for data representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1548-1560. [14]HUANG J, NIE F P,HUANG H,et al.Robust Manifold Nonnegative Matrix Factorization[J].ACM Transactions on Know-ledge Discovery from Data,2014,8(3):1-21. [15]ZENG K,YU J,LI C,et al.Image clustering by hyper-graphregularized non-negative matrix factorization[J].Neurocompu-ting,2014,138:209-217. [16]ZHANG X.Non-negative low rank and sparse graph for semi-supervised learning[C]//Computer Vision & Pattern Recognition.IEEE,2012. [17]NIE F,XU D,TSANG W H,et al.Flexible manifold embedding:a framework for semi-Supervised and unsupervised dimension reduction[J].IEEE Transactions on Image Processing,2010,19(7):1921-1932. [18]ZHU X,GHAHRAMANI Z,LAFFERTY J D.Semi-supervised learning using Gaussian fields and Harmonic functions[C]//Machine Learning,Proceedings of the Twentieth International Conference (ICML 2003).Washington DC,USA,2003. [19]LU G,WANG Y,ZOU J.Low-rank matrix factorization withadaptive graph regularizer [J].IEEE Transactions on Image Processing,2016,25(5):2196-2205. [20]ZHANG L,ZHANG Q,DU B,et al.Adaptive manifold regulari-zed matrix factorization for data clustering[C]//Twenty-sixth International Joint Conference on Artificial Intelligence.2017. [21]HE F,NIE F,WANG R,et al.Fast Semi-supervised learning with bipartite graph for large-scale data[J].IEEE Transactionson Neural Networks,2020,31(2):626-638. [22]LI Z,LIU J,TANG J.Robust Structured Subspace Learning for Data Representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(10):2085-2098. [23]BHARDWAJ A,RAMAN S.Robust PCA-based solution to ima-ge composition using augmented Lagrange multiplier (ALM)[J].Visual Computer,2016,32(5):591-600. [24]TOH K C,YUN S.An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems[J].Pacific Journal of Optimization,2010,6(3):615-640. [25]YANG J,YUAN X.Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization[J].Mathematics of Computation,2012,82(281):301-329. [26]LEE D D,SEUNG H S.Algorithms for Non-negative MatrixFactorization[C]//Neural Information Processing Systems.2000:556-562. [27]PENG C,KANG Z,HU Y,et al.Robust graph regularized nonnegative matrix factorization for clustering[J].ACM Transactions on Knowledge Discovery from Data,2017,11(3):33. [28]HE R,ZHENG W S,HU B G,et al.Nonnegative sparse coding for discriminative semi-supervised learning[C]//IEEE Confe-rence on Computer Vision & Pattern Recognition.IEEE,2011. [29]JIA Y H,KWONG S,HOU J H,et al.Semi-Supervised Non-Negative Matrix Factorization with Dissimilarity and Similarity Regularization [J].IEEE Transaction on Neural Networks and Learning Systems,2020,31(7):2510-2521. |
[1] | 杨帆, 王俊斌, 白亮. 基于安全性的成对约束扩充算法 Extended Algorithm of Pairwise Constraints Based on Security 计算机科学, 2020, 47(9): 324-329. https://doi.org/10.11896/jsjkx.200700092 |
[2] | 秦悦, 丁世飞. 半监督聚类综述 Survey of Semi-supervised Clustering 计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002 |
[3] | 王楠, 孙善武. 基于半监督聚类分析的无人机故障识别 UAV Fault Recognition Based on Semi-supervised Clustering 计算机科学, 2019, 46(6A): 192-195. |
[4] | 贾洪杰, 王良君, 宋和平. HMRF半监督近似核k-means算法 HMRF Semi-supervised Approximate Kernel k-means Algorithm 计算机科学, 2019, 46(12): 31-37. https://doi.org/10.11896/jsjkx.190600159 |
[5] | 程雪梅,杨秋辉,翟宇鹏,陈伟. 基于半监督聚类方法的测试用例选择技术 Test Case Selection Technique Based on Semi-supervised Clustering Method 计算机科学, 2018, 45(1): 249-254. https://doi.org/10.11896/j.issn.1002-137X.2018.01.044 |
[6] | 梁辰,李成海. 一种新的半监督入侵检测方法 Novel Intrusion Detection Method Based on Semi-supervised Clustering 计算机科学, 2016, 43(5): 87-90. https://doi.org/10.11896/j.issn.1002-137X.2016.05.016 |
[7] | 冯晨菲,杨燕,王红军,徐英歌,王韬. 一种基于数据相关性的半监督模糊聚类集成方法 Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation 计算机科学, 2015, 42(6): 41-45. https://doi.org/10.11896/j.issn.1002-137X.2015.06.009 |
[8] | 苏赢彬,杜学绘,夏春涛,曹利峰,陈华成. 基于半监督聚类的文档敏感信息推导方法 Sensitive Information Inference Method Based on Semi-supervised Document Clustering 计算机科学, 2015, 42(10): 132-137. |
[9] | 邱 烨,何振峰. 面向限制K-means算法的迭代学习分配次序策略 Iterative Learning Assignment Order for Constrained K-means Algorithm 计算机科学, 2012, 39(8): 196-198. |
[10] | 兰远东,邓辉舫,陈 涛. 基于图收缩的半监督聚类算法 Semi-supervised Clustering Algorithm Based on Graph Contraction 计算机科学, 2012, 39(4): 236-239. |
[11] | 李岩波.宋琼,郭新辰. 基于流形距离的人工免疫半监督聚类算法 Artificial Immune Clustering Semi-supervised Algorithm Based on Manifold Distance 计算机科学, 2012, 39(11): 204-207. |
[12] | 崔鹏,张汝波. 一种用于处理高维稀疏数据的半监督聚类算法 Novel Semi-supervised Clustering for High Dimensional Data 计算机科学, 2010, 37(7): 205-207. |
[13] | 李琳娜,陈海蕊,王映龙. 基于高阶逻辑的复杂结构数据半监督聚类 Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic 计算机科学, 2009, 36(9): 196-200. |
[14] | 陆伟宙 余顺争. 基于半监督聚类的Web流量分类 计算机科学, 2009, 36(2): 90-94. |
|