一种基于抽样的大规模混合数据聚类集成算法

doi:10.11896/j.issn.1002-137X.2016.09.041

Abstract

Abstract: In clustering analysis,one of the important problems is mixed data clustering.The clustering of existing algorithms is mainly based on similarity measurement of all samples.Therefore,the efficiency of clustering for large-scale data is not high.So we designed a new sampling strategy and proposed an ensemble algorithm for large-scale mixed data based on sampling.This new algorithm clusters subsets which are obtained by the use of the new sampling strategy respectively and the final clustering results can be gotten by clustering ensemble.Experiment shows that the efficiency of algorithm is improved significantly and the clustering validity indexes are almost the same compared with the modified K-prototypes algorithm.

Key words: Clustering,Large-scale mixed data,Clustering ensembles,Sampling,Validity index

PANG Tian-jie and LIANG Ji-ye. Clustering Ensemble Algorithm for Large-scale Mixed Data Based on Sampling[J].Computer Science, 2016, 43(9): 209-212.

References

[1] MacQueen J B.Some methods for classification and analysis of multivariate observations[C]∥Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California,1967:281-297
[2] Ruspini E R.A new Approach to clustering [J].Information andControl,1969,15(1):22-32
[3] Camastra F,Verri A.A novel kernel method for clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):801-805
[4] Zhang T,Ramakrishnan R,Livny M.BIRCH [C]∥Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data.Quebec:ACM,1996:103-114
[5] Guha S,Rastogi R,Shim K.CURE:An efficient clustering algorithm for clustering large databases [C]∥Proceedings of the Symposium on Management of Data (SIGMOD).Seattle:ACM,1998:73-84
[6] Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters inlarge spatial databases with noise [C]∥Proceedings of the 2th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.USA:AAAI,1996:226-231
[7] Huang Zhe-xue.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304
[8] Liang Ji-ye,Zhao Xing-wang,Li De-yu,et al.Determining the number of clusters using information entropy for mixed data[J].Pattern Recognition,2012,5(6):2251-2265
[9] He Zeng-you,Xu Xiao-fei,Deng Sheng-chun.Clustering Mixed Numeric and Categorical Data:A Cluster Ensemble Approach[J].Computer Science Artificial Intelligence,2005,5(4):225-268
[10] Luo Hui-lan, Wei Hui.Clustering Algorithm for Mixed DataBased on Clustering Ensemble Technique[J].Computer Scien-ce,2010,37(11):234-238(in Chinese) 罗慧兰,危辉.一种基于聚类集成技术的混合型数据聚类方法[J].计算机科学,2010,7(11):234-238
[11] Zhou Zhi-hua,Tang Wei.Clusterer ensemble[J].Knowledge-Based Systems,2006,9(1):77-83
[12] Yang Cao-yuan,Liu Da-you,Yang Bo,et al.Research on Cluster Aggregation Approaches[J].Computer Science, 2011,8(2):166-170(in Chinese) 杨草原,刘大有,杨博,等.聚类集成方法研究[J].计算机科学,2011,8(2):166-170

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Clustering Ensemble Algorithm for Large-scale Mixed Data Based on Sampling

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0