计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 197-203.doi: 10.11896/jsjkx.200900061

• 数据库&大数据&数据科学 • 上一篇    下一篇

结合多目标优化算法的模糊聚类有效性指标及应用

崔国楠1, 王立松1, 康介祥2, 高忠杰2, 王辉2, 尹伟2   

  1. 1 南京航空航天大学计算机科学与技术学院 南京210000
    2 中国航空无线电电子研究院软件部 上海200233
  • 收稿日期:2020-09-08 修回日期:2021-03-22 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 王立松(wangls@nuaa.edu.cn)
  • 作者简介:xkcaor@163.com
  • 基金资助:
    国家自然科学基金(J19K004)

Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application

CUI Guo-nan1, WANG Li-song1, KANG Jie-xiang2, GAO Zhong-jie2, WANG Hui2, YIN Wei2   

  1. 1 School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210000,China
    2 Software Department of China Aeronautical Radio Electronics Research Institute,Shanghai 200233,China
  • Received:2020-09-08 Revised:2021-03-22 Online:2021-10-15 Published:2021-10-18
  • About author:CUI Guo-nan,born in 1996,postgra-duate.His main research interests include multi-object optimization method and data mining.
    WANG Li-song,born in 1969,associate professor,is a member of China Computer Federation.His main research interests include avionics safety analysis,data management in distributed environments,formal methods and model-based safety analysis,and wireless sensor network.
  • Supported by:
    National Natural Science Foundation of China(J19K004).

摘要: 模糊聚类方法可以更有效地对复杂数据集进行分析,由于模糊聚类算法的种类繁多且聚类结果会随着输入的聚类个数的不同而改变,使得模糊聚类算法产生的结果不准确,因此,要获得准确的聚类结果必须确定模糊聚类个数k。目前已有的研究主要是利用多种模糊聚类有效性指标来确定最优聚类个数k,但是诸如SSD,PBM等模糊聚类指标会随着划分的聚类个数k的增加而单调递减,导致聚类个数k不准确。为此,文中提出了一种结合多目标优化算法的模糊聚类有效性指标(A Validity Index of Fuzzy Clustering Combined with Multi-objective Optimization Algorithm,OSACF),将模糊聚类度量指标与多目标优化算法(Multi-Objective Optimization Algorithm,MOEA)相结合来解决聚类最优个数k的问题。与使用聚类有效性指标不同,OSACF通过建立聚类个数k与聚类度量指标之间的双目标模型并使用MOEA优化该双目标模型来确定最优聚类个数k,避免了聚类有效性指标趋于单调递减的影响。另一方面,OSACF使用形态形似距离替代传统的欧氏距离度量,避免了聚类形状对计算聚类k值的影响。实验结果表明,OSACF结合MOEA得到的最优模糊聚类个数k比已有的聚类有效性指标获得的结果更准确。

关键词: 多目标优化算法, 聚类有效性指标, 模糊聚类, 模糊聚类个数k

Abstract: Fuzzy clustering method can analyze complex data sets more effectively.Because there are many kinds of fuzzy clustering algorithms and the clustering results will change with the number of input clusters,the results of fuzzy clustering algorithm are not accurate,so the number of fuzzy clustering k must be determined in order to obtain certain clustering results.At present,the existing research mainly uses a variety of fuzzy clustering effectiveness indexes to determine the optimal number of clusters k.However,fuzzy clustering indexes such as SSD,PBM will decrease monotonically with the increase of clustering number k,which makes it impossible to determine the optimal number of clusters k.Therefore,this paper proposes a fuzzy clustering validity index (OSACF) combined with a multi-objective optimization algorithm,which combines fuzzy clustering validity with a multi-objective optimization algorithm (MOEA) to solve the optimal number of clusters k problem.Different from using clustering validity index,OSACF establishes a bi-objective model between cluster number k and clustering validity index,and uses MOEA to optimize the bi-objective model to determine the optimal cluster number k,so as to avoid the influence of monotonous decreasing of clustering validity index.On the other hand,OSACF uses morphological similarity distance to replace the traditional Euclidean distance metric,which avoids the influence of cluster shape on the calculation of cluster k.The experimental results show that the optimal fuzzy cluster number k obtained by OSACF combined with MOEA is more accurate than the results obtained by the existing clustering effectiveness indicators.

Key words: Clustering validity index, Fuzzy clustering, Multi-objective optimization algorithm, Number of clusters k

中图分类号: 

  • TP302
[1]BEZDEK J C,EHRLICH R,FULL W.FCM:The fuzzy c-means clustering algorithm[J].Computers & Geosciences,1984,10(2/3):191-203.
[2]ZHANG P Z,ZHANG H Y.A Review of Features and Labels Dimensionality Reduction Methods of Multi Label Data[J].Journal of Chongqing Technology and Business University(Na-tural Science Edition),2020,37(5):23-29.
[3]WANG Z H,WANG S Y,DU H.Improved Fuzzy C-meansClustering Algorithm Based on Density-Sensitive Distance[J].Computer Engineering,2021,47(5):88-96,103.
[4]GAN G,MA C,WU J.Data clustering:theory,algorithms,and applications[M].Society for Industrial and Applied Mathema-tics,2020.
[5]MATHER P,TSO B.Classification methods for remotely sensed data[M].CRC Press,2016.
[6]CUI H,ZHANG K,FANG Y,et al.A clustering validity index based on pairing frequency[J].IEEE Access,2017,5:24884-24894.
[7]VAIDYA J,SHAFIQ B,BASU A,et al.Differentially privatenaive bayes classification[C]//2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).IEEE,2013,1:571-576.
[8]BEZDEK J C.Numerical taxonomy with fuzzy sets[J].Journal of Mathematical Biology,1974,1(1):57-71.
[9]BEZDEK J C.Cluster validity with fuzzy sets[J].Journal of Cybernetics,1973,3:58-73.
[10]XIE X L,BENI G.A validity measure for fuzzy clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1991,13(8):841-847.
[11]FUKUYAMA Y.A new method of choosing the number ofclusters for the fuzzy c-mean method[C]//Proc.5th Fuzzy Syst.Symp.,1989.1989:247-250.
[12]ZAHID N,LIMOURI M,ESSAID A.A new cluster-validity for fuzzy clustering[J].Pattern Recognition,1999,32(7):1089-1097.
[13]PAKHIRA M K,BANDYOPADHYAY S,MAULIK U.Vali-dity index for crisp and fuzzy clusters[J].Pattern Recognition,2004,37(3):487-501.
[14]PAKHIRA M K,BANDYOPADHYAY S,MAULIK U.Astudy of some fuzzy cluster validity indices,genetic clustering and application to pixel classification[J].Fuzzy Sets and Systems,2005,155(2):191-214.
[15]WANG R,LAI S,WU G,et al.Multi-clustering via evolutionary multi-objective optimization[J].Information Sciences,2018,450:128-140.
[16]ZHANG Y,WANG W,ZHANG X,et al.A cluster validity index for fuzzy clustering[J].Information Sciences,2008,178(4):1205-1218.
[17]WU K L,YANG M S.A cluster validity index for fuzzy clustering[J].Pattern Recognition Letters,2005,26(9):1275-1291.
[18]MIRJALILI S,JANGIR P,SAREMI S.Multi-objective ant lion optimizer:a multi-objective optimization algorithm for solving engineering problems[J].Applied Intelligence,2017,46(1):79-95.
[19]YANG X S.Nature-inspired optimization algorithms[M].Academic Press,2020.
[20]GUO Y,WENG G.K-means++ clustering-based active contour model for fast image segmentation[J].Journal of Electronic Imaging,2018,27(6):063013.
[21]LI Z,YUAN J,ZHANG W.Fuzzy C-mean algorithm with morphology similarity distance[C]//2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.IEEE,2009,3:90-94.
[22]YANG S,LI K,LIANG Z,et al.A novel cluster validity index for fuzzy c-means algorithm[J].Soft Computing,2018,22(6):1921-1931.
[23]CAI X,MEI Z,FAN Z.A decomposition-based many-objective evolutionary algorithm with two types of adjustments for direction vectors[J].IEEE Transactions on Cybernetics,2017,48(8):2335-2348.
[24]WANG L,CUI G,ZHOU Q,et al.A multi-clustering method based on evolutionary multiobjective optimization with grid decomposition[J].Swarm and Evolutionary Computation,2020,55:100691.
[25]REZAEE B.A cluster validity index for fuzzy clustering[J].Fuzzy Sets and Systems,2010,161(23):3014-3025.
[26]DAVIES D L,BOULDIN D W.A cluster separation measure[J].IEEE transactions on Pattern Analysis and Machine Intelligence,1979(2):224-227.
[1] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[2] 乔颖婧, 高保禄, 史瑞雪, 刘璇, 王朝辉.
融合Tamura纹理特征的改进FCM脑MRI图像分割算法
Improved FCM Brain MRI Image Segmentation Algorithm Based on Tamura Texture Feature
计算机科学, 2021, 48(8): 111-117. https://doi.org/10.11896/jsjkx.200700003
[3] 张天柱, 邹承明.
使用模糊聚类的胶囊网络在图像分类上的研究
Study on Image Classification of Capsule Network Using Fuzzy Clustering
计算机科学, 2019, 46(12): 279-285. https://doi.org/10.11896/jsjkx.190200315
[4] 冀进朝,赵晓威,何飞,胡英慧,白天,李在荣.
基于模糊质心的混合属性数据模糊加权聚类算法
Fuzzy Weighted Clustering Algorithm with Fuzzy Centroid for Mixed Data
计算机科学, 2018, 45(2): 109-113. https://doi.org/10.11896/j.issn.1002-137X.2018.02.019
[5] 邢瑞康, 李成海.
基于直觉模糊集理论的IDS方法研究
Research on Intrusion Detection System Method Based on Intuitionistic Fuzzy Sets
计算机科学, 2018, 45(11A): 344-348.
[6] 郑奇斌,刁兴春,曹建军.
结合缺失模式的不完整数据模糊聚类
Fuzzy Clustering Algorithm for Incomplete Data Considering Missing Pattern
计算机科学, 2017, 44(12): 58-63. https://doi.org/10.11896/j.issn.1002-137X.2017.12.011
[7] 耿宗科,王长宾,张振国.
基于模糊c-means与自适应粒子群优化的模糊聚类算法
Fuzzy c-means and Adaptive PSO Based Fuzzy Clustering Algorithm
计算机科学, 2016, 43(8): 267-272. https://doi.org/10.11896/j.issn.1002-137X.2016.08.054
[8] 陈小东,孙力娟,韩崇,郭剑.
基于模糊聚类的数据流概念漂移检测算法
Detecting Concept Drift of Data Stream Based on Fuzzy Clustering
计算机科学, 2016, 43(4): 219-223. https://doi.org/10.11896/j.issn.1002-137X.2016.04.045
[9] 伍大清,郑建国,朱佳俊,孙 莉.
基于人类社交行为的动态多目标优化
Dynamic Multi-objective Particle Swarm Optimization Algorithm Based on Human Social Behavior
计算机科学, 2015, 42(8): 249-252.
[10] 冯晨菲,杨燕,王红军,徐英歌,王韬.
一种基于数据相关性的半监督模糊聚类集成方法
Semi-supervised Fuzzy Clustering Ensemble Approach with Data Correlation
计算机科学, 2015, 42(6): 41-45. https://doi.org/10.11896/j.issn.1002-137X.2015.06.009
[11] 郭华峰,赵建民,潘修强.
自适应抑制式模糊C-回归模型算法
Adapting Suppressed Fuzzy C-regression Models Algorithm
计算机科学, 2015, 42(2): 274-276. https://doi.org/10.11896/j.issn.1002-137X.2015.02.057
[12] 董世龙,陈宁江,谭瑛,何子龙,朱莉蓉.
面向云环境的集群资源模糊聚类划分算法的优化
Optimization of Cluster Resource Fuzzy Clustering Partition Algorithm for Cloud Computing
计算机科学, 2014, 41(9): 104-109. https://doi.org/10.11896/j.issn.1002-137X.2014.09.020
[13] 张千,梁鸿,郉永山.
云计算环境下基于模糊聚类的并行调度策略研究
Cloud Parallel Task Scheduling Algorithm Based on Fuzzy Clustering
计算机科学, 2014, 41(8): 75-80. https://doi.org/10.11896/j.issn.1002-137X.2014.08.016
[14] 沈洁,郭莉森.
社会网络的模糊聚类理论及其实践
Social Network’s Fuzzy Cluster Theory and its Applications
计算机科学, 2013, 40(Z11): 63-67.
[15] 刘颖,张柏,王爱莲,桑娟,何咏梅.
一种基于半监督集成SVM的土地覆盖分类模型
Ensemble Model with Semisupervised SVM for Remote Sensing Land Cover Classification
计算机科学, 2013, 40(7): 206-210.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!