计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 457-460.doi: 10.11896/JsJkx.190700044

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

基于Xie-Beni指数的选择性聚类集成

邵超, 马进家   

  1. 河南财经政法大学计算机与信息工程学院 郑州 450046
  • 发布日期:2020-07-07
  • 通讯作者: 邵超(sc_flying@163.com)
  • 基金资助:
    国家自然科学基金(61806073,61907011)

Selective Clustering Ensemble Based on Xie-Beni Index

SHAO Chao and MA Jin-Jia   

  1. School of Computer & Information Engineering,Henan University of Economics and Law,Zhengzhou 450046,China
  • Published:2020-07-07
  • About author:SHAO Chao, born in 1977, professor, is a member of China Computer Federation.His main research interests include machine learning and so on.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61806073,61907011).

摘要: 选择性聚类集成是选择一部分精度高、差异性大的基聚类结果进行集成,从而得到更为有效的聚类集成结果。然而,聚类结果的准确性难以客观度量。为此,文中提出了一种基于Xie-Beni指数的选择性聚类集成算法,该算法采用Xie-Beni指数来度量基聚类结果的有效性,利用并结合NMI(互信息)选择出精度较高的基聚类结果,从而提升聚类结果的准确性。实验结果证实了该算法的有效性。

关键词: NMI, Xie-beni, 聚类有效性指数, 选择性聚类集成

Abstract: Selective clustering ensemble is to select some of the basic clustering results with high accuracy and large diversity for integration,so as to obtain more effective clustering ensemble results.In the cluster analysis application,the cluster validity index is used to measure the goodness of the clustering results.In this paper,a selective clustering ensemble algorithm based on Xie-Beni index is proposed.The algorithm uses Xie-Beni index to measure the validity of the basic clustering results,and uses NMI(normalized mutual information) to select the better basic clustering results to enhance the aggregation,thereby improving the accuracy of the clustering results.Experimental results confirm the effectiveness of the algorithm.

Key words: Clustering validity index, NMI, Selective clustering ensemble, Xie-beni

中图分类号: 

  • TP181
[1] HAN J W,KAMBER M,PEI J.Data Ming and Technology (Third Edition).BeiJing:Mechanical Industry Press,2012.
[2] NALDI M,ANDRE C P L,CARVALHO R.Campello Cluster ensemble selection based on relative validity indexes.Data Min Konwl Disc,2013,27:259-289.
[3] XU S,CHAN K S,GAO J,et al.An integrated K-means-Laplacian cluster ensemble approach for document datasets.Neurocomputing,2016,214:495-507.
[4] FERN B,ZHANG X L,BRODLEY C E.Random proJection for high dimensional data clustering:A cluster ensembleapproach//Proceedings of the International Conference on Machine Learning (ICML).2003:186-193.
[5] KHAN Y,CHEN Y Y,KE C.Temporal data clustering viaweighted clustering ensemble with different representations.IEEE Transactions on Knowledge and Data Engineering,2011,23(2):307-320.
[6] CHEN S,GUO G D,CHEN L F.A new over-sampling method based on cluster ensembles//Proceedings ofthe 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA).IEEE,2010:599-604.
[7] JAIN A K,FLYNN P J.Data Clustering,A Review.ACMComputing Surveys,1999,31(3):264-323.
[8] YANG L Z,ZHOU H J,ZHUO Q,et al.Weighted ClusteringFusion Based on Attribute Importance.Computer Science,2009,36(4):243-245.
[9] LU X Y.Research on Selective Clustering Integration Based on Covariance.Chengdu:Southwest Jiaotong University,2013.
[10] STREHL A,GHOSH J,CARDIE C.Cluster ensembles:Aknowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research,2002(3):583-617.
[11] TOPCHY A,JAIN A K,PUNCH W.A Mixture Model for Clustering Ensembles//Proceedings of the 4th SIAM International Conference on Data Mining.2004:379-390.
[12] YANG L Z,WANG W Y.Overview of clustering fusion methods.Application Research of Computers,2005,22(12):8-10.
[13] LI S.Selective Clustering Integration Research.Jinan:Shandong Normal University,2010.
[14] HOU S S.Research and Analysis of Clustering Effectiveness Index.Qingdao:China University of Petroleum,Master of Engineering,2016.
[15] XIE X L,BENI G.A validity measure for fuzzy clustering.IEEE Trans.Pattern Anal.Mach.Intell,1991,13:841-847.
[1] 张辉,朱家明,唐文杰.
基于聚类和改进型水平集的图像分割算法
Image Segmentation Algorithm Based on Clustering and Improved Double Level Set
计算机科学, 2017, 44(Z6): 198-201. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.045
[2] 李炯 卢显良 董仕.
基于GridSim模拟器的网格资源调度算法研究

计算机科学, 2008, 35(8): 95-97.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!