计算机科学 ›› 2018, Vol. 45 ›› Issue (2): 109-113.doi: 10.11896/j.issn.1002-137X.2018.02.019

• 2017年中国计算机学会人工智能会议 • 上一篇    下一篇

基于模糊质心的混合属性数据模糊加权聚类算法

冀进朝,赵晓威,何飞,胡英慧,白天,李在荣   

  1. 东北师范大学信息科学与技术学院 长春130117;东北师范大学计算生物研究所 长春130117,东北师范大学信息科学与技术学院 长春130117;东北师范大学计算生物研究所 长春130117,东北师范大学信息科学与技术学院 长春130117;东北师范大学计算生物研究所 长春130117,东北师范大学信息科学与技术学院 长春130117,吉林大学计算机科学与技术学院 长春130012,东北师范大学传媒科学学院 长春130117
  • 出版日期:2018-02-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61502093,61403077),吉林省教育厅科研项目(2016504),吉林省科技发展计划资助

Fuzzy Weighted Clustering Algorithm with Fuzzy Centroid for Mixed Data

JI Jin-chao, ZHAO Xiao-wei, HE Fei, HU Ying-hui, BAI Tian and LI Zai-rong   

  • Online:2018-02-15 Published:2018-11-13

摘要: 在模糊聚类算法中,模糊系数被用来控制簇可能重叠的程度,其负面影响是所有的数据对象会影响所有的簇。为解决该问题,Klawonn和Hppner使用模糊函数替换模糊系数(KH算法),但该方法是针对数值属性数据而设计的。然而,在许多真实的应用中,数据对象通常同时由数值属性和分类属性描述。面向混合属性数据,文中提出了一种新的基于模糊质心的模糊加权聚类算法。首先结合模糊质心和均值来表示混合属性条件下的簇中心,然后使用能够评估不同属性在聚类过程中作用的度量来评估数据对象和簇中心之间的相异度,最后给出算法框架。在3个混合属性数据集上对新算法进行了一系列的测试,实验结果表明新算法的性能优于传统算法。

关键词: 模糊聚类,数据挖掘,混合数据,相异性度量

Abstract: In fuzzy c-means type algorithms,fuzy parameters are used to control the degree of possible overlap,but it also has the negative effects that all data objects tend to influence all clusters.To solve this issue,Klawonn and Hppner proposed a fuzzy function for replacing the fuzzier.However,this method is only designed for numeric data.In many real-world applications,data objects are usually described by both numeric and categorical attributes.In this paper,a novel weighted fuzzy clustering algorithm based on fuzzy centroid (FWFC) was proposed for the data with both numeric and categorical attributes,i.e.mixed data.In this method,the mean is first integrated with fuzzy centroid to represent the cluster centers.Then,a measure which can evaluate the influence of different attributes in the process of clustering is used to evaluate the dissimilarity between data objects and cluster centers.Finally,the algorithm is presented for clustering the data with mixed attributes.The proposed algorithm was tested by a series of experiments on three mixed datasets.Experimental results show that the proposed algorithm outperforms traditional clustering algorithms.

Key words: Fuzzy clustering,Data mining,Mixed data,Dissimilarity measure

[1] CELEBI M E,KINGRAVI H A,VELA P A.A comparativestudy of efficient initialization methods for the k-means clustering algorithm[J].Expert Systems with Applications,2013,40(1):200-210.
[2] BORDOGNA G,PASI G.A quality driven hierarchical data divisive soft clustering for information retrieval[J].Knowledge-Based Systems,2012,26:9-19.
[3] LI T,CORCHADO J M,SUN S,et al.Clustering for filtering:Multi-object detection and estimation using multiple/massive sensors [J].Information Sciences,2017(388-389):172-190.
[4] VERMA H,AGRAWAL R K,SHARAN A.An improved intui-tionistic fuzzy c-means clustering algorithm incorporating local information for brain image segmentation[J].Applied Soft Computing,2016,46:543-557.
[5] SAEED F,SALIM N,ABDO A.Information theory and voting based consensus clustering for combining multiple clusterings of chemical structures [J].Molecular Informatics,2013,32(7):591-598.
[6] HUANG Z.Extensions to the k-means algorithm for clustering large data sets with categorical values [J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
[7] ZHANG X,MEI C,CHEN D,et al.Feature selection in mixed data:A method using a novel fuzzy rough set-based information entropy [J].Pattern Recognition,2016,56(1):1-15.
[8] HUANG Z.Clustering large data sets with mixed numeric and categorical values [C]∥Proceedings of the first Pacific-Asia Conference on Knowledge Discovery and Data Mining.1997:21-34.
[9] LI C,BISWAS G.Unsupervised learning with mixed numericand nominal data[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(4):673-690.
[10] FOSS A,MARKATOU M,RAY B,et al.A semiparametricmethod for clustering mixed data [J].Machine Learning,2016,105(3):419-458.
[11] BAI L,LIANG J Y,DANG C,et al.A cluster centers initialization method for clustering categorical data [J].Expert Systems with Applications,2012,39(9):8022-8029.
[12] PANG T J,LIANG J Y.Clustering Ensemble Algorithm forLarge-scale Mixed Data Based on Sampling[J].Computer Scien-ce,2016,43(9):209-212.(in Chinese) 庞天杰,梁吉业.一种基于抽样的大规范混合数据聚类集成算法[J].计算机科学,2016,43(9):209-212.
[13] PANG T J,ZHAO X W.Algorithm to Determine Number ofClusters for Mixed Data Based on Prior Information [J].Computer Science,2016,43(2):101-104.(in Chinese) 庞天杰,赵兴旺.一种基于先验信息的混合数据聚类个数确定算法[J].计算机科学 ,2016,43(2):101-104.
[14] KIM D W,LEE K H,LEE D.Fuzzy clustering of categorical data using fuzzy centroids [J].Pattern Recognition Letters,2004,25(11):1263-1271.
[15] AHMAD A,DEY L.Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes [M]∥Distributed Computing and Internet Technology.Berlin:Springer Berlin Heidelberg,2005:561-572.
[16] LEE M,PEDRYCZ W.The fuzzy c-means algorithm with fuzzy p-mode prototypes for clustering objects having mixed features [J].Fuzzy Sets and Systems,2009,160(24):3590-3600.
[17] CHATZIS S P.A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional [J].Expert Systems with Applications,2011,38(7):8684-8689.
[18] KLAWONN F,H PPNER F.What Is Fuzzy about Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier [M]∥ Advances in Intelligent Data Analysis V.Berlin:Springer Berlin Heidelberg,2003:254-264.
[19] AHMAD A,DEY L.A k-mean clustering algorithm for mixed numeric and categorical data [J].Data & Knowledge Enginee-ring,2007,63(2):503-527.
[20] WITTEN I H,FRANK E.Data Mining Practical Machine Lear-ning Tools and Techniques with Java Implementation [M].San Fransisco:Morgon Kaufmann Publishers,1999.
[21] HUANG Z X,NG M K.A fuzzy k-modes algorithm for clustering categorical data [J].IEEE Transactions on Fuzzy Systems,1999,7(4):446-452.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .