计算机科学 ›› 2015, Vol. 42 ›› Issue (3): 261-265.doi: 10.11896/j.issn.1002-137X.2015.03.054

• 人工智能 • 上一篇    下一篇

基于平均互信息的混合条件属性聚类算法

刘晋胜   

  1. 广东石油化工学院计算机与电子信息学院 茂名525000
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受广东省教育部产学研结合项目(2011A090200088),广东省茂名市科技计划项目(2012B009),广东省石化装备故障诊断重点实验室资助

Clustering with Mixed Condition Attributes Based on Average Mutual Information

LIU Jin-sheng   

  • Online:2018-11-14 Published:2018-11-14

摘要: 混合条件属性参数间的距离值存在较大的差异,导致仅聚合距离数量级较大、较规律的数值条件属性对象,而忽视数量级较小、混沌,但类别特征更加明显的分类条件属性对象。提出了一种基于平均互信息的聚类算法。通过熵量化参数类别特性的大小,再根据熵的平均互信息计算方法衡量数据对象间类别的相同、相异特征量,统一数值和分类条件属性参数间距离的数量级,最后通过优化 迭代自适应过程得到最终聚类结果。实验结果表明,该算法具有良好的聚类质量和自适应性。

关键词: 混合条件属性,平均互信息,聚类

Abstract: There is a great difference between the distances of mixed condition attributes parameter.The numeric condition attributes object with larger and law magnitude tends to be clustered only.With small and chaos magnitude,the cate-gorical condition attributes object which has obvious category characteristics will be ignored.A clustering algorithm based on average mutual information was proposed.First,the size of parameter category characteristics is quantified through entropy.Then,the similarity and the difference between category characteristics are measured according ave-rage mutual information of entropy.The magnitude between distances of numeric and categorical condition attributes parameter is unified.At last,the final clustering result is got by optimizing iterative adaptive process.The experimental results show that the proposed algorithm was high clustering quality and good adaptability.

Key words: Mixed condition attributes,Average mutual information,Clustering

[1] Tang Pang-ning,Michael Steinbach,Vipin Kumar.Introduction to data mining [M].Beijing:Post&Telecom Press,2006
[2] 史忠植.高级人工智能[M].北京:科学出版社,2006
[3] Jain A K.Data clustering:50 years beyond k-means[J].Pattern Recognition Letters,2010,31(8):651-666
[4] Aggarwal C C,Han J,Wang J,et al.A framework for clustering evolving data streams[C]∥Proc of VLDB.2003:81-92
[5] Aggarwal C C,Han J,Wang J,et al.A framework for projected clustering of high dimensional data streams [C]∥Proc.of VLDB.2004:852-863
[6] Cao F,Estery M,Qian W,et al.Density-based clustering over-ran evolving data stream with noise[C]∥Proc of the SIAM Conference on Data Mining (SDM).2006:326-337
[7] Huang Z.Extension to K-means algorithm for clustering largedatasets with categorical values[J].Data Mining and Know-ledge Discovery II,1998(2):283-304
[8] Aggarwal C C,Yu P S.A framework for clustering massive text and categorical data streams[C]∥Proc of 6th Siam IntConf on Data Mining.Bethesda,2006:477-481
[9] Guha S,Rastogi R,Shim K.ROCK:a robust clustering algo-rithm for categorical attributes[C]∥Proc of ICDE.1999:512-521
[10] Barbara D,Couto J,Yi L.COOLCAT:an entropy-based algo-rithm for categorical clustering[C]∥Proc of CIKM.2002:582-589
[11] Ralambondrainy H.A conceptual version of the k-means algorithm[J].Pattern Recognition Letters,1995:1147-1157
[12] Huang Z.Clustering large data sets with mixed numeric and categorical values[C]∥Pro of 1th Pacific-Asic Conf.1997:21-34
[13] Yin Jian,Tan Zhi-fang,Ren Jiang-tao,et al.An efficient clustering algorithm for mixed type attributes in large dataset[J].IEEE transactions on Machine Learning and Cybernetics,2005,8(3):1611-1614
[14] He Zeng-you,Xu Xiao-fei,Deng Sheng-chun.Scalable algo-rithms for clustering large datasets with mixed type attributes[J].International Journal of Intelligent Systems,2005,0(10):1077-1089
[15] 杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,0(8):1364-1372
[16] Hsu C C,Huang Y.Incremental clustering of mixed data based on distance hierarchy[J].Expert Systems with Applications,2008,5(3):1177-1185
[17] 黄德才,沈仙桥,陆亿红.混合属性数据流的二重k近邻聚类算法[J].计算机科学,2013,0(10):226-230
[18] 陈新泉.面向混合属性数据集的双重聚类方法[J].计算机工程与科学,2013,5(2):127-132
[19] Liang Ji-ye,Zhao Xing-wang,Li De-yu,et al.Detemining thenumber of clusters using information entropy for minxed data[J].Pattern Recognition,2012,5(6):2251-2265
[20] 王述云,胡运发,范颖捷,等.基于距离和熵的混合属性数据流聚类算法[J].小型微型计算机系统,2012,2(12):2365-2371

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!