Computer Science ›› 2018, Vol. 45 ›› Issue (2): 109-113.doi: 10.11896/j.issn.1002-137X.2018.02.019

Previous Articles     Next Articles

Fuzzy Weighted Clustering Algorithm with Fuzzy Centroid for Mixed Data

JI Jin-chao, ZHAO Xiao-wei, HE Fei, HU Ying-hui, BAI Tian and LI Zai-rong   

  • Online:2018-02-15 Published:2018-11-13

Abstract: In fuzzy c-means type algorithms,fuzy parameters are used to control the degree of possible overlap,but it also has the negative effects that all data objects tend to influence all clusters.To solve this issue,Klawonn and Hppner proposed a fuzzy function for replacing the fuzzier.However,this method is only designed for numeric data.In many real-world applications,data objects are usually described by both numeric and categorical attributes.In this paper,a novel weighted fuzzy clustering algorithm based on fuzzy centroid (FWFC) was proposed for the data with both numeric and categorical attributes,i.e.mixed data.In this method,the mean is first integrated with fuzzy centroid to represent the cluster centers.Then,a measure which can evaluate the influence of different attributes in the process of clustering is used to evaluate the dissimilarity between data objects and cluster centers.Finally,the algorithm is presented for clustering the data with mixed attributes.The proposed algorithm was tested by a series of experiments on three mixed datasets.Experimental results show that the proposed algorithm outperforms traditional clustering algorithms.

Key words: Fuzzy clustering,Data mining,Mixed data,Dissimilarity measure

[1] CELEBI M E,KINGRAVI H A,VELA P A.A comparativestudy of efficient initialization methods for the k-means clustering algorithm[J].Expert Systems with Applications,2013,40(1):200-210.
[2] BORDOGNA G,PASI G.A quality driven hierarchical data divisive soft clustering for information retrieval[J].Knowledge-Based Systems,2012,26:9-19.
[3] LI T,CORCHADO J M,SUN S,et al.Clustering for filtering:Multi-object detection and estimation using multiple/massive sensors [J].Information Sciences,2017(388-389):172-190.
[4] VERMA H,AGRAWAL R K,SHARAN A.An improved intui-tionistic fuzzy c-means clustering algorithm incorporating local information for brain image segmentation[J].Applied Soft Computing,2016,46:543-557.
[5] SAEED F,SALIM N,ABDO A.Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures [J].Molecular Informatics,2013,32(7):591-598.
[6] HUANG Z.Extensions to the k-means algorithm for clustering large data sets with categorical values [J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
[7] ZHANG X,MEI C,CHEN D,et al.Feature selection in mixed data:A method using a novel fuzzy rough set-based information entropy [J].Pattern Recognition,2016,56(1):1-15.
[8] HUANG Z.Clustering large data sets with mixed numeric and categorical values [C]∥Proceedings of the first Pacific-Asia Conference on Knowledge Discovery and Data Mining.1997:21-34.
[9] LI C,BISWAS G.Unsupervised learning with mixed numericand nominal data[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(4):673-690.
[10] FOSS A,MARKATOU M,RAY B,et al.A semiparametricmethod for clustering mixed data [J].Machine Learning,2016,105(3):419-458.
[11] BAI L,LIANG J Y,DANG C,et al.A cluster centers initialization method for clustering categorical data [J].Expert Systems with Applications,2012,39(9):8022-8029.
[12] PANG T J,LIANG J Y.Clustering Ensemble Algorithm forLarge-scale Mixed Data Based on Sampling[J].Computer Scien-ce,2016,43(9):209-212.(in Chinese) 庞天杰,梁吉业.一种基于抽样的大规范混合数据聚类集成算法[J].计算机科学,2016,43(9):209-212.
[13] PANG T J,ZHAO X W.Algorithm to Determine Number ofClusters for Mixed Data Based on Prior Information [J].Computer Science,2016,43(2):101-104.(in Chinese) 庞天杰,赵兴旺.一种基于先验信息的混合数据聚类个数确定算法[J].计算机科学 ,2016,43(2):101-104.
[14] KIM D W,LEE K H,LEE D.Fuzzy clustering of categorical data using fuzzy centroids [J].Pattern Recognition Letters,2004,25(11):1263-1271.
[15] AHMAD A,DEY L.Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes [M]∥Distributed Computing and Internet Technology.Berlin:Springer Berlin Heidelberg,2005:561-572.
[16] LEE M,PEDRYCZ W.The fuzzy c-means algorithm with fuzzy p-mode prototypes for clustering objects having mixed features [J].Fuzzy Sets and Systems,2009,160(24):3590-3600.
[17] CHATZIS S P.A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional [J].Expert Systems with Applications,2011,38(7):8684-8689.
[18] KLAWONN F,H PPNER F.What Is Fuzzy about Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier [M]∥ Advances in Intelligent Data Analysis V.Berlin:Springer Berlin Heidelberg,2003:254-264.
[19] AHMAD A,DEY L.A k-mean clustering algorithm for mixed numeric and categorical data [J].Data & Knowledge Enginee-ring,2007,63(2):503-527.
[20] WITTEN I H,FRANK E.Data Mining Practical Machine Lear-ning Tools and Techniques with Java Implementation [M].San Fransisco:Morgon Kaufmann Publishers,1999.
[21] HUANG Z X,NG M K.A fuzzy k-modes algorithm for clustering categorical data [J].IEEE Transactions on Fuzzy Systems,1999,7(4):446-452.

No related articles found!
Full text



No Suggested Reading articles found!