计算机科学 ›› 2016, Vol. 43 ›› Issue (2): 101-104.doi: 10.11896/j.issn.1002-137X.2016.02.023

• 2015年中国计算机学会人工智能会议 • 上一篇    下一篇

一种基于先验信息的混合数据聚类个数确定算法

庞天杰,赵兴旺   

  1. 太原师范学院计算机系 晋中030619,山西大学计算智能与中文信息处理教育部重点实验室 太原030006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目:“用户行为数据”稀疏表示的理论与方法研究(61273294),山西省回国留学人员科研资助

Algorithm to Determine Number of Clusters for Mixed Data Based on Prior Information

PANG Tian-jie and ZHAO Xing-wang   

  • Online:2018-12-01 Published:2018-12-01

摘要: 聚类个数的确定是聚类分析中一个富有挑战性的难题。现有的聚类个数确定方法主要采用随机选取初始聚类中心的策略,导致聚类过程中迭代次数的稳定性不强。基于此,在利用含有类标签的先验信息优化初始类中心的基础上,提出了一种基于先验信息的混合数据聚类个数确定算法。实验证明,该算法是有效的。

关键词: 聚类分析,聚类个数,混合数据,先验信息,最大最小距离

Abstract: In cluster analysis,one of the most challenging and difficult problem is the determination of the number of clusters.The strategies for choosing initial prototypes randomly are used to determine the number of clusters in most of the existing methods,resulting in weak stability of iterations in clustering process.So we proposed an prior information based algorithm to determine the number of cluster for mixed data by using priori information which includes class labels to optimize initial prototype .Experiments show that the algorithm is effective.

Key words: Clustering analysis,Number of cluster,Mixed data,Prior information,Max-min distance

[1] MacQueen J B.Some methods for classification and analysis of multivariate observations[C]∥ Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press,1967:281-297
[2] Ruspini E R.A new approach to clustering [J].Information and Control,1969,5(1):22-32
[3] Dempster A P,Laird N M,Rubin D B.Maximum likelihood from incomplete data via the EM algorithm [J].Journal of the Royal Statistical Society,1977,9(1):1-38
[4] Camastra F,Verri A.A novel kernel method for clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):801-805
[5] Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise [C]∥Proceedings of the 2th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1996,226-231
[6] Sun Hao-jun,Wang Sheng-rui,Jiang Qing-shan.FCM-basedmodel selection algorithms for determining the number of clusters[J].Pattern Recognition,2004,7(10):2027-2037
[7] Bai Liang,Liang Ji-ye,Dang Chuang-yin.An initialization me-thod to simultaneously find initial cluster centers and the number of clusters for clustering categorical data[J].Knowledge-Based Systems,2011,4(6):785-795
[8] Liang Ji-ye,Zhao Xing-wang,Li De-yu,et al.Determining the number of clusters using information entropy for mixed data[J].Pattern Recognition,2012,5:2251-2265
[9] Tou J,Gonzales R.Pattern Recognition Principles[M].MA:Addison-Wesley.Reading,1974
[10] Pal N R,Bezdek J C.On clustering validity for the fuzzy c-means model[J].IEEE Transactions on Fuzzy Systems,1995,3(3):370-379
[11] Xiao Yu,Yu Jian.Semi-Supervised Clustering Based on Affinity Propagation Algorithm[J].Journal of Software,2008,9(11):2803-2813(in Chinese) 肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,9(11):2803-2813
[12] Bilenko M,Basu S,Mooney R J.Integrating constraints andmetric learning in semi-supervised clustering [C]∥ Russ G,Dale S,eds.Proc.of the 21st Int’l Conf.on Machine Learning (ICML 2004).Banff:ACM Press,2004:81-88
[13] Basu S,Banerjee A,Mooney R J.Semi-supervised clustering by seeding[C]∥ Claude S,Achim GH,eds.Proc.of 19th Int’l Conf.on Machine Learning (ICML 2002).Sydney:Morgan Kaufmann Publishers,2002:27-34
[14] Kamvar S D,Klein D,Manning C D.Spectral learning[C]∥ Proc.of the 18th Int’l Joint Conf.on Artificial Intelligence (IJCAI 2003).Acapulco,Mexico:Morgan Kaufmann Publishers,2003:561-566

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!