计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 324-329.doi: 10.11896/jsjkx.200700092
杨帆1, 王俊斌1, 白亮1,2
YANG Fan1, WANG Jun-bin1, BAI Liang1,2
摘要: 基于成对约束的聚类分析是半监督学习的一个重要研究方向。成对约束的数量已成为影响该类算法有效性的重要因素。然而,在现实应用中,成对约束的获取需要耗费大量的成本。因此,文中提出了一种基于安全性的成对约束扩充方法(Extended Algorithm of Pairwise Constraints Based on Security,PCES)。该算法将传递闭包中最大局部连通距离作为安全值,并根据安全值来修改传递闭包之间的相似性,减少合并传递闭包带来的风险,最后利用图聚类方法合并相似的传递闭包达到扩充成对约束的目的。该算法不仅可以安全有效地扩充成对约束,同时可以将扩充后的成对约束应用到不同半监督聚类算法中。文中在8个基准数据集上进行了成对约束扩充算法的比较。实验结果表明,该算法可以安全有效地扩充成对约束。
中图分类号:
[1] WU X,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge andInformation Systems,2008,14(1):1-37. [2] JAIN A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666. [3] GALLEGO A J,CALVO-ZARAGOZA J,VALERO-MAS J J,et al.Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation[J].Pattern Recognition,2018,74:531-543. [4] PATERLINI S,KRINK T.Differential evolution and particleswarm optimisation in partitional clustering[J].ComputationalStatistics & Data Analysis,2006,50(5):1220-1247. [5] LU Y,WAN Y.PHA:A fast potential-based hierarchical ag-glomerative clustering method[J].Pattern Recognition,2013,46(5):1227-1239. [6] KUMAR K M,REDDY A R M.A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups me-thod[J].Pattern Recognition,2016,58:39-48. [7] DINLER D,TURAL M K.Robust semi-supervised clusteringwith polyhedral and circular uncertainty[J].Neurocomputing,2017,265:4-27. [8] ZHOU X,BELKIN M.Semi-supervised learning[J].Academic Press Library in Signal Processing,2014,1:1239-1269. [9] ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[J].Advances in Neural Information Processing Systems,2003,16(3):321-328. [10] WANG H,LI T,LI T,et al.Constraint neighborhood projections for semi-supervised clustering[J].IEEETransactions on Cybernetics,2014,44(5):636-643. [11] ZHOU Z H.Ensemble methods:foundations and algorithms[M].Boca Raton:CRC Press,2012:72-73. [12] WAGSTAFF K,CARDIE C,ROGERS S,et al.Constrained k-means clustering with background knowledge[C]// Proceedings of the Eighteenth International Conference on Machine Lear-ning.San Francisco:Morgan Kaufmann,2001:577-584. [13] YANG Y,TAN W,LI T,et al.Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems[J].Knowledge-Based Systems,2012,32:101-115. [14] WEI S,LI Z,ZHANG C.A semi-supervised clustering ensemble approach integrated constraint-based and metric-based[C]//Proceedings of the 7th International Conference on Internet Multimedia Computing and Service.New York:Association for Computing Machinery,2015:1-6. [15] LI T,DING C,JORDAN M I.Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization[C]//Seventh IEEE International Conference on Data Mi-ning (ICDM 2007).Piscataway:IEEE,2007:577-582. [16] REN Y,HU X,SHI K,et al.Semi-supervised denpeak clustering with pairwise constraints[C]//Pacific Rim InternationalConfe-rence on Artificial Intelligence.Berlin:Springer,2018:837-850. [17] MIYAMOTO S,TERAMI A.Constrained agglomerative hierarchical clustering algorithms with penalties [C]//2011 IEEE International Conference on Fuzzy Systems.Piscataway:IEEE,2011:422-427. [18] KAMVAR K,SEPANDAR S,KLEIN K,et al.Spectral learning[C]//International Joint Conference of Artificial Intelligence.Stanford Info Lab,2003:561-566. [19] XU Q,DESJARDINS M,WAGSTAFF K.Constrained spectral clustering under a local proximity structure assumption[C]//Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference.Clearwater Beach,Florida,USA:DBLP,2005:866-867. [20] JI X,XU W.Document clustering with prior knowledge[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:Association for Computing Machinery,2006:405-412. [21] WANG F,DING C,LI T.Integrated KL (K-means-Laplacian) clustering:a new clustering approach by combining attribute data and pairwise relations[C]//Proceedings of the 2009 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics.2009:38-48. [22] KULIS B,BASU S,DHILLON I,et al.Semi-supervised graph clustering:a kernel approach[J].MachineLearning,2009,74(1):1-22. [23] DING S,JIA H,DU M,et al.A semi-supervised approximate spectral clustering algorithm based on HMRF model[J].Information Sciences,2018,429:215-228. [24] KLEIN D,KAMVAR S D,MANNING C D.From instance-level constraints to space-level constraints:Making the most of prior knowledge in data clustering[C]//International Conference on Machine Learning.Stanford:IMLS,2002:307-314. [25] LU Z,PENG Y.Exhaustive and efficient constraint propaga-tion:A graph-based learning approach and its applications[J].International Journal of Computer Vision,2013,103(3):306-325. [26] WANG L,BO L F,JIAO L C.Density-sensitive semi-supervised spectral clustering[J].Journal of Software,2007,18(10):2412-2422. [27] WEI S,LI Z,ZHANG C.Combined constraint-based with metric-based in semi-supervised clustering ensemble[J].International Journal of Machine Learning and Cybernetics,2018,9(7):1085-1100. [28] YU Z,KUANG Z,LIU J,et al.Adaptive ensembling of semi-supervised clustering solutions[J].IEEE Transactions onKnow-ledge and Data Engineering,2017,29(8):1577-1590. [29] EISENBERG D,MARCOTTE E M,XENARIOS I,et al.Protein function in the post-genomic era[J].Nature,2000,405(6788):823-826. [30] WAGSTAFF K,CARDIE C.Clustering with instance-level constraints[J].AAAI/IAAI,2000,1097:577-584. [31] TENENBAUM J B,DE SILVA V,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323. [32] CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):066111. [33] DOM B E.An information-theoretic external cluster-validitymeasure[J].Uncertainty in Artificial Intelligence,2001,27(3):137-145. [34] HUBERT L,ARABIE P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218. |
[1] | 赵敏, 刘惊雷. 基于高斯场和自适应图正则的半监督聚类 Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization 计算机科学, 2021, 48(7): 137-144. https://doi.org/10.11896/jsjkx.200800190 |
[2] | 秦悦, 丁世飞. 半监督聚类综述 Survey of Semi-supervised Clustering 计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002 |
[3] | 王楠, 孙善武. 基于半监督聚类分析的无人机故障识别 UAV Fault Recognition Based on Semi-supervised Clustering 计算机科学, 2019, 46(6A): 192-195. |
[4] | 贾洪杰, 王良君, 宋和平. HMRF半监督近似核k-means算法 HMRF Semi-supervised Approximate Kernel k-means Algorithm 计算机科学, 2019, 46(12): 31-37. https://doi.org/10.11896/jsjkx.190600159 |
[5] | 程雪梅,杨秋辉,翟宇鹏,陈伟. 基于半监督聚类方法的测试用例选择技术 Test Case Selection Technique Based on Semi-supervised Clustering Method 计算机科学, 2018, 45(1): 249-254. https://doi.org/10.11896/j.issn.1002-137X.2018.01.044 |
[6] | 梁辰,李成海. 一种新的半监督入侵检测方法 Novel Intrusion Detection Method Based on Semi-supervised Clustering 计算机科学, 2016, 43(5): 87-90. https://doi.org/10.11896/j.issn.1002-137X.2016.05.016 |
[7] | 王纵虎,刘速. 一种成对约束限制的半监督文本聚类算法 Pairwise Constrained Semi-supervised Text Clustering Algorithm 计算机科学, 2016, 43(12): 183-188. https://doi.org/10.11896/j.issn.1002-137X.2016.12.033 |
[8] | 冯晨菲,杨燕,王红军,徐英歌,王韬. 一种基于数据相关性的半监督模糊聚类集成方法 Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation 计算机科学, 2015, 42(6): 41-45. https://doi.org/10.11896/j.issn.1002-137X.2015.06.009 |
[9] | 苏赢彬,杜学绘,夏春涛,曹利峰,陈华成. 基于半监督聚类的文档敏感信息推导方法 Sensitive Information Inference Method Based on Semi-supervised Document Clustering 计算机科学, 2015, 42(10): 132-137. |
[10] | 邱 烨,何振峰. 面向限制K-means算法的迭代学习分配次序策略 Iterative Learning Assignment Order for Constrained K-means Algorithm 计算机科学, 2012, 39(8): 196-198. |
[11] | 兰远东,邓辉舫,陈 涛. 基于图收缩的半监督聚类算法 Semi-supervised Clustering Algorithm Based on Graph Contraction 计算机科学, 2012, 39(4): 236-239. |
[12] | 齐鸣鸣,向阳. 融合稀疏保持的成对约束投影 Pairwise Constraint Projections Inosculating Sparsity Preserving 计算机科学, 2012, 39(11): 212-215. |
[13] | 李岩波.宋琼,郭新辰. 基于流形距离的人工免疫半监督聚类算法 Artificial Immune Clustering Semi-supervised Algorithm Based on Manifold Distance 计算机科学, 2012, 39(11): 204-207. |
[14] | 崔鹏,张汝波. 一种用于处理高维稀疏数据的半监督聚类算法 Novel Semi-supervised Clustering for High Dimensional Data 计算机科学, 2010, 37(7): 205-207. |
[15] | 李琳娜,陈海蕊,王映龙. 基于高阶逻辑的复杂结构数据半监督聚类 Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic 计算机科学, 2009, 36(9): 196-200. |
|