Computer Science ›› 2016, Vol. 43 ›› Issue (2): 101-104.doi: 10.11896/j.issn.1002-137X.2016.02.023

Previous Articles     Next Articles

Algorithm to Determine Number of Clusters for Mixed Data Based on Prior Information

PANG Tian-jie and ZHAO Xing-wang   

  • Online:2018-12-01 Published:2018-12-01

Abstract: In cluster analysis,one of the most challenging and difficult problem is the determination of the number of clusters.The strategies for choosing initial prototypes randomly are used to determine the number of clusters in most of the existing methods,resulting in weak stability of iterations in clustering process.So we proposed an prior information based algorithm to determine the number of cluster for mixed data by using priori information which includes class labels to optimize initial prototype .Experiments show that the algorithm is effective.

Key words: Clustering analysis,Number of cluster,Mixed data,Prior information,Max-min distance

[1] MacQueen J B.Some methods for classification and analysis of multivariate observations[C]∥ Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press,1967:281-297
[2] Ruspini E R.A new approach to clustering [J].Information and Control,1969,5(1):22-32
[3] Dempster A P,Laird N M,Rubin D B.Maximum likelihood from incomplete data via the EM algorithm [J].Journal of the Royal Statistical Society,1977,9(1):1-38
[4] Camastra F,Verri A.A novel kernel method for clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):801-805
[5] Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise [C]∥Proceedings of the 2th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1996,226-231
[6] Sun Hao-jun,Wang Sheng-rui,Jiang Qing-shan.FCM-basedmodel selection algorithms for determining the number of clusters[J].Pattern Recognition,2004,7(10):2027-2037
[7] Bai Liang,Liang Ji-ye,Dang Chuang-yin.An initialization me-thod to simultaneously find initial cluster centers and the number of clusters for clustering categorical data[J].Knowledge-Based Systems,2011,4(6):785-795
[8] Liang Ji-ye,Zhao Xing-wang,Li De-yu,et al.Determining the number of clusters using information entropy for mixed data[J].Pattern Recognition,2012,5:2251-2265
[9] Tou J,Gonzales R.Pattern Recognition Principles[M].MA:Addison-Wesley.Reading,1974
[10] Pal N R,Bezdek J C.On clustering validity for the fuzzy c-means model[J].IEEE Transactions on Fuzzy Systems,1995,3(3):370-379
[11] Xiao Yu,Yu Jian.Semi-Supervised Clustering Based on Affinity Propagation Algorithm[J].Journal of Software,2008,9(11):2803-2813(in Chinese) 肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,9(11):2803-2813
[12] Bilenko M,Basu S,Mooney R J.Integrating constraints andmetric learning in semi-supervised clustering [C]∥ Russ G,Dale S,eds.Proc.of the 21st Int’l Conf.on Machine Learning (ICML 2004).Banff:ACM Press,2004:81-88
[13] Basu S,Banerjee A,Mooney R J.Semi-supervised clustering by seeding[C]∥ Claude S,Achim GH,eds.Proc.of 19th Int’l Conf.on Machine Learning (ICML 2002).Sydney:Morgan Kaufmann Publishers,2002:27-34
[14] Kamvar S D,Klein D,Manning C D.Spectral learning[C]∥ Proc.of the 18th Int’l Joint Conf.on Artificial Intelligence (IJCAI 2003).Acapulco,Mexico:Morgan Kaufmann Publishers,2003:561-566

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!