计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 324-329.doi: 10.11896/jsjkx.200700092

• 信息安全 • 上一篇    下一篇

基于安全性的成对约束扩充算法

杨帆1, 王俊斌1, 白亮1,2   

  1. 1 山西大学计算机与信息技术学院 太原030006
    2 山西大学计算机智能与中文信息处理教育部重点实验室 太原030006
  • 收稿日期:2020-04-30 发布日期:2020-09-10
  • 通讯作者: 白亮(bailiang@sxu.edu.cn)
  • 作者简介:18335149237@163.com
  • 基金资助:
    国家自然科学基金(61773247,61876103);山西省基础研究计划(201901D211192)

Extended Algorithm of Pairwise Constraints Based on Security

YANG Fan1, WANG Jun-bin1, BAI Liang1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory Computation Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China
  • Received:2020-04-30 Published:2020-09-10
  • About author:YANG Fan,born in 1995,postgraduate.Her main research interests include semi-supervised learning and so on.
    BAI Liang,born in 1982,Ph.D,professor,is a member of China Computer Fedaration.His main research interest include cluster analysis and so on.
  • Supported by:
    National Natural Science Foundation of China (61773247,61876103) and Technology Research Development Projects of Shanxi (201901D211192).

摘要: 基于成对约束的聚类分析是半监督学习的一个重要研究方向。成对约束的数量已成为影响该类算法有效性的重要因素。然而,在现实应用中,成对约束的获取需要耗费大量的成本。因此,文中提出了一种基于安全性的成对约束扩充方法(Extended Algorithm of Pairwise Constraints Based on Security,PCES)。该算法将传递闭包中最大局部连通距离作为安全值,并根据安全值来修改传递闭包之间的相似性,减少合并传递闭包带来的风险,最后利用图聚类方法合并相似的传递闭包达到扩充成对约束的目的。该算法不仅可以安全有效地扩充成对约束,同时可以将扩充后的成对约束应用到不同半监督聚类算法中。文中在8个基准数据集上进行了成对约束扩充算法的比较。实验结果表明,该算法可以安全有效地扩充成对约束。

关键词: 半监督聚类, 成对约束, 监督信息的扩展, 监督信息的有效性

Abstract: Cluster analysis based on pairwise constraints is an important research direction of semi-supervised learning.The number of pairwise constraints has become an important factor affecting the effectiveness of this type of the algorithm.However,in practical applications,the acquisition of pairwise constraints requires a lot of costs.Therefore,the extended algorithm of pairwise constraints based on security (PCES) is proposed.This algorithm takes the maximum local connected distance in the transitive closures as the safe value.According to the safe value,the similarity between the different transitive closures is modified to reduce the risk of merging transitive closures.Finally,the method of graph clustering is used to merge similar transitive closures to extend the pairwise constraints.This algorithm can not only safely and effectively expand pairwise constraints,but also apply the extended pairwise constraints to different semi-supervised clustering algorithms.This paper compares the extended algorithm of pairwise constraints on eight benchmark data sets.The experimental results show that the proposed algorithm can extend pairwise constraints safely and effectively.

Key words: Effectiveness of supervision information, Expansion of supervision information, Pairwise constraints, Semi-supervised clustering

中图分类号: 

  • TP391
[1] WU X,KUMAR V,QUINLAN J R,et al.Top 10 algorithms in data mining[J].Knowledge andInformation Systems,2008,14(1):1-37.
[2] JAIN A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666.
[3] GALLEGO A J,CALVO-ZARAGOZA J,VALERO-MAS J J,et al.Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation[J].Pattern Recognition,2018,74:531-543.
[4] PATERLINI S,KRINK T.Differential evolution and particleswarm optimisation in partitional clustering[J].ComputationalStatistics & Data Analysis,2006,50(5):1220-1247.
[5] LU Y,WAN Y.PHA:A fast potential-based hierarchical ag-glomerative clustering method[J].Pattern Recognition,2013,46(5):1227-1239.
[6] KUMAR K M,REDDY A R M.A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups me-thod[J].Pattern Recognition,2016,58:39-48.
[7] DINLER D,TURAL M K.Robust semi-supervised clusteringwith polyhedral and circular uncertainty[J].Neurocomputing,2017,265:4-27.
[8] ZHOU X,BELKIN M.Semi-supervised learning[J].Academic Press Library in Signal Processing,2014,1:1239-1269.
[9] ZHOU D,BOUSQUET O,LAL T N,et al.Learning with local and global consistency[J].Advances in Neural Information Processing Systems,2003,16(3):321-328.
[10] WANG H,LI T,LI T,et al.Constraint neighborhood projections for semi-supervised clustering[J].IEEETransactions on Cybernetics,2014,44(5):636-643.
[11] ZHOU Z H.Ensemble methods:foundations and algorithms[M].Boca Raton:CRC Press,2012:72-73.
[12] WAGSTAFF K,CARDIE C,ROGERS S,et al.Constrained k-means clustering with background knowledge[C]// Proceedings of the Eighteenth International Conference on Machine Lear-ning.San Francisco:Morgan Kaufmann,2001:577-584.
[13] YANG Y,TAN W,LI T,et al.Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems[J].Knowledge-Based Systems,2012,32:101-115.
[14] WEI S,LI Z,ZHANG C.A semi-supervised clustering ensemble approach integrated constraint-based and metric-based[C]//Proceedings of the 7th International Conference on Internet Multimedia Computing and Service.New York:Association for Computing Machinery,2015:1-6.
[15] LI T,DING C,JORDAN M I.Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization[C]//Seventh IEEE International Conference on Data Mi-ning (ICDM 2007).Piscataway:IEEE,2007:577-582.
[16] REN Y,HU X,SHI K,et al.Semi-supervised denpeak clustering with pairwise constraints[C]//Pacific Rim InternationalConfe-rence on Artificial Intelligence.Berlin:Springer,2018:837-850.
[17] MIYAMOTO S,TERAMI A.Constrained agglomerative hierarchical clustering algorithms with penalties [C]//2011 IEEE International Conference on Fuzzy Systems.Piscataway:IEEE,2011:422-427.
[18] KAMVAR K,SEPANDAR S,KLEIN K,et al.Spectral learning[C]//International Joint Conference of Artificial Intelligence.Stanford Info Lab,2003:561-566.
[19] XU Q,DESJARDINS M,WAGSTAFF K.Constrained spectral clustering under a local proximity structure assumption[C]//Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference.Clearwater Beach,Florida,USA:DBLP,2005:866-867.
[20] JI X,XU W.Document clustering with prior knowledge[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:Association for Computing Machinery,2006:405-412.
[21] WANG F,DING C,LI T.Integrated KL (K-means-Laplacian) clustering:a new clustering approach by combining attribute data and pairwise relations[C]//Proceedings of the 2009 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics.2009:38-48.
[22] KULIS B,BASU S,DHILLON I,et al.Semi-supervised graph clustering:a kernel approach[J].MachineLearning,2009,74(1):1-22.
[23] DING S,JIA H,DU M,et al.A semi-supervised approximate spectral clustering algorithm based on HMRF model[J].Information Sciences,2018,429:215-228.
[24] KLEIN D,KAMVAR S D,MANNING C D.From instance-level constraints to space-level constraints:Making the most of prior knowledge in data clustering[C]//International Conference on Machine Learning.Stanford:IMLS,2002:307-314.
[25] LU Z,PENG Y.Exhaustive and efficient constraint propaga-tion:A graph-based learning approach and its applications[J].International Journal of Computer Vision,2013,103(3):306-325.
[26] WANG L,BO L F,JIAO L C.Density-sensitive semi-supervised spectral clustering[J].Journal of Software,2007,18(10):2412-2422.
[27] WEI S,LI Z,ZHANG C.Combined constraint-based with metric-based in semi-supervised clustering ensemble[J].International Journal of Machine Learning and Cybernetics,2018,9(7):1085-1100.
[28] YU Z,KUANG Z,LIU J,et al.Adaptive ensembling of semi-supervised clustering solutions[J].IEEE Transactions onKnow-ledge and Data Engineering,2017,29(8):1577-1590.
[29] EISENBERG D,MARCOTTE E M,XENARIOS I,et al.Protein function in the post-genomic era[J].Nature,2000,405(6788):823-826.
[30] WAGSTAFF K,CARDIE C.Clustering with instance-level constraints[J].AAAI/IAAI,2000,1097:577-584.
[31] TENENBAUM J B,DE SILVA V,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[32] CLAUSET A,NEWMAN M E J,MOORE C.Finding community structure in very large networks[J].Physical Review E,2004,70(6):066111.
[33] DOM B E.An information-theoretic external cluster-validitymeasure[J].Uncertainty in Artificial Intelligence,2001,27(3):137-145.
[34] HUBERT L,ARABIE P.Comparing partitions[J].Journal of Classification,1985,2(1):193-218.
[1] 赵敏, 刘惊雷.
基于高斯场和自适应图正则的半监督聚类
Semi-supervised Clustering Based on Gaussian Fields and Adaptive Graph Regularization
计算机科学, 2021, 48(7): 137-144. https://doi.org/10.11896/jsjkx.200800190
[2] 秦悦, 丁世飞.
半监督聚类综述
Survey of Semi-supervised Clustering
计算机科学, 2019, 46(9): 15-21. https://doi.org/10.11896/j.issn.1002-137X.2019.09.002
[3] 王楠, 孙善武.
基于半监督聚类分析的无人机故障识别
UAV Fault Recognition Based on Semi-supervised Clustering
计算机科学, 2019, 46(6A): 192-195.
[4] 贾洪杰, 王良君, 宋和平.
HMRF半监督近似核k-means算法
HMRF Semi-supervised Approximate Kernel k-means Algorithm
计算机科学, 2019, 46(12): 31-37. https://doi.org/10.11896/jsjkx.190600159
[5] 程雪梅,杨秋辉,翟宇鹏,陈伟.
基于半监督聚类方法的测试用例选择技术
Test Case Selection Technique Based on Semi-supervised Clustering Method
计算机科学, 2018, 45(1): 249-254. https://doi.org/10.11896/j.issn.1002-137X.2018.01.044
[6] 梁辰,李成海.
一种新的半监督入侵检测方法
Novel Intrusion Detection Method Based on Semi-supervised Clustering
计算机科学, 2016, 43(5): 87-90. https://doi.org/10.11896/j.issn.1002-137X.2016.05.016
[7] 王纵虎,刘速.
一种成对约束限制的半监督文本聚类算法
Pairwise Constrained Semi-supervised Text Clustering Algorithm
计算机科学, 2016, 43(12): 183-188. https://doi.org/10.11896/j.issn.1002-137X.2016.12.033
[8] 冯晨菲,杨燕,王红军,徐英歌,王韬.
一种基于数据相关性的半监督模糊聚类集成方法
Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation
计算机科学, 2015, 42(6): 41-45. https://doi.org/10.11896/j.issn.1002-137X.2015.06.009
[9] 苏赢彬,杜学绘,夏春涛,曹利峰,陈华成.
基于半监督聚类的文档敏感信息推导方法
Sensitive Information Inference Method Based on Semi-supervised Document Clustering
计算机科学, 2015, 42(10): 132-137.
[10] 邱 烨,何振峰.
面向限制K-means算法的迭代学习分配次序策略
Iterative Learning Assignment Order for Constrained K-means Algorithm
计算机科学, 2012, 39(8): 196-198.
[11] 兰远东,邓辉舫,陈 涛.
基于图收缩的半监督聚类算法
Semi-supervised Clustering Algorithm Based on Graph Contraction
计算机科学, 2012, 39(4): 236-239.
[12] 齐鸣鸣,向阳.
融合稀疏保持的成对约束投影
Pairwise Constraint Projections Inosculating Sparsity Preserving
计算机科学, 2012, 39(11): 212-215.
[13] 李岩波.宋琼,郭新辰.
基于流形距离的人工免疫半监督聚类算法
Artificial Immune Clustering Semi-supervised Algorithm Based on Manifold Distance
计算机科学, 2012, 39(11): 204-207.
[14] 崔鹏,张汝波.
一种用于处理高维稀疏数据的半监督聚类算法
Novel Semi-supervised Clustering for High Dimensional Data
计算机科学, 2010, 37(7): 205-207.
[15] 李琳娜,陈海蕊,王映龙.
基于高阶逻辑的复杂结构数据半监督聚类
Semi-supervised Clustering of Complex Structured Data Based on Higher-order Logic
计算机科学, 2009, 36(9): 196-200.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!