基于平均互信息的混合条件属性聚类算法

doi:10.11896/j.issn.1002-137X.2015.03.054

Abstract

Abstract: There is a great difference between the distances of mixed condition attributes parameter.The numeric condition attributes object with larger and law magnitude tends to be clustered only.With small and chaos magnitude,the cate-gorical condition attributes object which has obvious category characteristics will be ignored.A clustering algorithm based on average mutual information was proposed.First,the size of parameter category characteristics is quantified through entropy.Then,the similarity and the difference between category characteristics are measured according ave-rage mutual information of entropy.The magnitude between distances of numeric and categorical condition attributes parameter is unified.At last,the final clustering result is got by optimizing iterative adaptive process.The experimental results show that the proposed algorithm was high clustering quality and good adaptability.

Key words: Mixed condition attributes,Average mutual information,Clustering

LIU Jin-sheng. Clustering with Mixed Condition Attributes Based on Average Mutual Information[J].Computer Science, 2015, 42(3): 261-265.

References

[1] Tang Pang-ning,Michael Steinbach,Vipin Kumar.Introduction to data mining [M].Beijing:Post＆Telecom Press,2006
[2] 史忠植.高级人工智能[M].北京:科学出版社,2006
[3] Jain A K.Data clustering:50 years beyond k-means[J].Pattern Recognition Letters,2010,31(8):651-666
[4] Aggarwal C C,Han J,Wang J,et al.A framework for clustering evolving data streams[C]∥Proc of VLDB.2003:81-92
[5] Aggarwal C C,Han J,Wang J,et al.A framework for projected clustering of high dimensional data streams [C]∥Proc.of VLDB.2004:852-863
[6] Cao F,Estery M,Qian W,et al.Density-based clustering over-ran evolving data stream with noise[C]∥Proc of the SIAM Conference on Data Mining (SDM).2006:326-337
[7] Huang Z.Extension to K-means algorithm for clustering largedatasets with categorical values[J].Data Mining and Know-ledge Discovery II,1998(2):283-304
[8] Aggarwal C C,Yu P S.A framework for clustering massive text and categorical data streams[C]∥Proc of 6th Siam IntConf on Data Mining.Bethesda,2006:477-481
[9] Guha S,Rastogi R,Shim K.ROCK:a robust clustering algo-rithm for categorical attributes[C]∥Proc of ICDE.1999:512-521
[10] Barbara D,Couto J,Yi L.COOLCAT:an entropy-based algo-rithm for categorical clustering[C]∥Proc of CIKM.2002:582-589
[11] Ralambondrainy H.A conceptual version of the k-means algorithm[J].Pattern Recognition Letters,1995:1147-1157
[12] Huang Z.Clustering large data sets with mixed numeric and categorical values[C]∥Pro of 1th Pacific-Asic Conf.1997:21-34
[13] Yin Jian,Tan Zhi-fang,Ren Jiang-tao,et al.An efficient clustering algorithm for mixed type attributes in large dataset[J].IEEE transactions on Machine Learning and Cybernetics,2005,8(3):1611-1614
[14] He Zeng-you,Xu Xiao-fei,Deng Sheng-chun.Scalable algo-rithms for clustering large datasets with mixed type attributes[J].International Journal of Intelligent Systems,2005,0(10):1077-1089
[15] 杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,0(8):1364-1372
[16] Hsu C C,Huang Y.Incremental clustering of mixed data based on distance hierarchy[J].Expert Systems with Applications,2008,5(3):1177-1185
[17] 黄德才,沈仙桥,陆亿红.混合属性数据流的二重k近邻聚类算法[J].计算机科学,2013,0(10):226-230
[18] 陈新泉.面向混合属性数据集的双重聚类方法[J].计算机工程与科学,2013,5(2):127-132
[19] Liang Ji-ye,Zhao Xing-wang,Li De-yu,et al.Detemining thenumber of clusters using information entropy for minxed data[J].Pattern Recognition,2012,5(6):2251-2265
[20] 王述云,胡运发,范颖捷,等.基于距离和熵的混合属性数据流聚类算法[J].小型微型计算机系统,2012,2(12):2365-2371

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Clustering with Mixed Condition Attributes Based on Average Mutual Information

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0