计算机科学 ›› 2015, Vol. 42 ›› Issue (6): 41-45.doi: 10.11896/j.issn.1002-137X.2015.06.009

• 第十届和谐人机环境联合学术会议 • 上一篇    下一篇

一种基于数据相关性的半监督模糊聚类集成方法

冯晨菲,杨燕,王红军,徐英歌,王韬   

  1. 西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031,西南交通大学信息科学与技术学院 成都610031
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61170111,2),西南交通大学牵引动力国家重点实验室自主研究课题(2012TPL_T15)资助

Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation

FENG Chen-fei, YANG Yan, WANG Hong-jun, XU Ying-ge and WANG Tao   

  • Online:2018-11-14 Published:2018-11-14

摘要: 现有的半监督聚类集成方法能利用先验信息,使集成的准确性、鲁棒性和稳定性得到提高,但在集成阶段加入成对约束信息时,只考虑了给定的约束信息而忽视了约束点与被约束点的邻域点之间的关系。针对此问题,提出了一种基于数据相关性的半监督模糊聚类集成方法。该方法首先利用半监督模糊聚类算法建立集成信息矩阵,并将其转换为相似性矩阵;然后,利用已知的约束信息及约束点与被约束点的邻域点之间的关系来修改相似性矩阵;最后,利用图划分算法得到最终的聚类结果。真实数据上的实验结果表明,提出的方法可以有效提高聚类质量。

关键词: 半监督聚类集成,模糊聚类,成对约束,邻域点

Abstract: Semi-supervised clustering ensemble has emerged as a powerful machine learning paradigm that provides improved precision,robustness and stability by taking advantage of prior information,while most of them only consider the given pairwise constraints and do not consider the neighbors around the data points constrained in the ensemble step.In this paper,a semi-supervised fuzzy clustering ensemble with data correlation(SFCEDC)was proposed to overcome this defect.Firstly,an ensemble information matrix is built by primarily exploiting the results of semi-supervised fuzzy clustering and a similarity matrix is constructed by aggregating much information of the ensemble information matrix.And then this matrix is modified by using the given constraints and the neighbors around the data points constrained.Finally,a graph partitioning algorithm is employed to get the final clustering results.Experimental results on UCI datasets demonstrate that the proposed approach can improve clustering performance effectively.

Key words: Semi-supervised clustering ensemble,Fuzzy clustering,Pairwise constraints,Neighbors points

[1] Han J,Kamber M,Pei J.Data Mining Concepts and Techniques [M].Morgan Kaufmann Press,2012
[2] Wolpert D H,Macready W G.No free lunch theorems for search [R].Technical Report SFI-TR-9502010.Santa Fe Institute,1995
[3] Topchy A,Jain A K,Puneh W.Clustering ensembles:models of consensus and weak partition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(12):1866-1881
[4] Strehl A,Ghosh J.Cluster ensembles:a knowledge reuse framework for combining multiple partitions [J].Journal of Machine Learning Research,2003,3(3):583-617
[5] 罗会兰,危辉.基于数学形态学的聚类集成算法[J].计算机科学,2010,37(8):214-218 Luo Hui-lan,Wei Hui.Clustering Enseble Algorithm Based on Mathematical Morphology[J].Computer Science,2010,37(8):214-218
[6] Zhou Zhi-hua.Ensemble Methods:Foundations and Algorithms [M].CRC Press,2012
[7] Iam-on N,Boongone T,Garrett S,et al.Link-based cluster ensemble approach for categorical data clustering [J].IEEE Transactions on Knowledge and Data Engineering,2012,24:413-425
[8] Naldi M C,Carvalho A C P L F,Campello R J G B.Cluster ensemble selection based on relative validity indexes [J].Data Mining and Knowledge Discovery,2013,27(2):259-289
[9] Abdala D D,Jiang X.An evidence accumulation approach to constrained clustering combination[C]//Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition.Leipzig,Germany,2009:361-371
[10] 王红军,李志蜀,戚建淮,等.基于贝叶斯网络的半监督聚类集成模型[J].软件学报,2010,21(11):2814-2825 Wang Hong-jun,Li Zhi-shu,Qi Jian-huai,et al.Semi-supervised Cluster Ensemble Model Based on Bayesian Network[J].Journal of Software,2010,21(11):2814-2825
[11] Iqbal A M,Moh’d A,Khan Z A.Semi-supervised clustering ensemble by voting [C]∥Proceeding of the International Confe-rence on Information and Communication System.Amman,Jordan,2009:1-5
[12] Yang Yan,Wang Hong-jun,Lin Chao,et al.Semi-supervisedClustering Ensemble Based on Multi-ant Colonies Algorithm [C]∥Rough Sets and Knowledge Technology 7th International Conference.Chengdu,China,2012:302-309
[13] Wagstaff K,Cardie C,Rogers S.Constrained k-means clustering with background knowledge [C]//Proceedings of the 18th International Conference on Machine Learning.San Francisco,CA,USA,2001:577-584
[14] Klein D,Kamvar S D,Manning C.From instance-level con-straints to space-level constraints:marking the most of prior knowledge in data clustering [C]∥Proceedings of the 19th International Conference on Machine Learning.San Francisco,CA,USA,2002:307-314
[15] Pedrycz W,Waletzky J.Fuzzy Clustering with Partial Supervi-sion [J].IEEE Transactions on Systems,Man,and Cybernetics,1997,27(5):787-795
[16] Karypis G,Kumar V.Multilevel K-Way partitioning scheme for irregular graphs [J].Journal of Parallel Distributed Computing,1998,41(2):278-300
[17] Ng A,Jordan M,Weiss Y.On Spectral Clustering:Analysis and an Algorithm[C]∥Advances in Neural Information Processing Systems.2001,14:849-856
[18] Blake C L,Merz C J.UCI repository of machine learning data-bases [EB/OL].2012-05-01[2012-12-01].http://archive.ics.uci.edu/ml
[19] Modha D,Spangler W S.Feature weighting in k-means clustering [J].Machine Learning,2003,52(3):217-237

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .