基于内聚度和耦合度的二分K均值方法

计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 460-464.

基于内聚度和耦合度的二分K均值方法

郁湧^1,2,康庆怡¹,陈长赓¹,阚世林¹,骆永军¹

云南大学软件学院昆明650504¹
云南省软件工程重点实验室昆明650504²

出版日期:2018-06-20 发布日期:2018-08-03
作者简介:郁湧(1980-),男,博士,副教授,CCF会员,主要研究方向为软件工程、数据分析,E-mail:yuy1219@163.com。
基金资助:
国家自然科学基金项目(61462091),云南大学数据驱动的软件工程省科技创新团队项目(2017HC012)资助

Bisecting K-means Clustering Method Based on Cohesion and Coupling

YU Yong^1,2,KANG Qing-yi¹,CHEN Chang-geng¹,KAN Shi-lin¹,LUO Yong-jun¹

School of Software,Yunnan University,Kunming 650504,China¹
Key Laboratory for Software Engineering of Yunnan Province,Kunming 650504,China²

Online:2018-06-20 Published:2018-08-03

摘要/Abstract

摘要： 聚类分析是数据挖掘中最重要的技术之一,它在社会经济的各个领域都具有重要作用,并被广泛应用。K均值算法是最经典、应用最广泛的聚类方法之一,但其缺点是过度依赖初始条件和聚类数目难以确定,这制约了其应用范围。引入簇的内聚度和耦合度的定义与度量方法,基于“高内聚低耦合”的原理,在二分K均值聚类过程中对得到的簇进行不断的分裂和合并,并判断聚类结果是否满足要求以确定聚类的次数和簇的个数,从而实现对二分K均值聚类过程的改进。在Iris数据集上的实验测试与分析表明该算法不仅更加稳定,而且其聚类结果的正确率也较高。

关键词: 二分k均值, 聚类, 内聚度, 耦合度

Abstract: Clustering analysis is one of the most important techniques in data mining.It has important role and wide application in every field of social economy.K-means is one kind of the simple and widely used clustering methods,but its disadvantage is that it depends on the initial conditions and the number of clusters is difficult to determine.This paper introduced the cohesion and coupling of cluster,and presented the measurement of cohesion and coupling.Based on the principle of “high cohesion and low coupling”,the clusters are constantly divided and merged in the process of bisecting K-Means clustering algorithm.By judging whether the clustering results meet the requirements,it can determine the number of clusters,thus improving the bisecting K-Means clustering algorithm.The experimental results on Iris data show that the algorithm is not only more stable,but also has higher clustering accuracy.

Key words: Bisecting K-means, Clustering, Cohesion, Coupling

中图分类号:

TP391

郁湧,康庆怡,陈长赓,阚世林,骆永军. 基于内聚度和耦合度的二分K均值方法[J]. 计算机科学, 2018, 45(6A): 460-464. https://doi.org/

YU Yong,KANG Qing-yi,CHEN Chang-geng,KAN Shi-lin,LUO Yong-jun. Bisecting K-means Clustering Method Based on Cohesion and Coupling[J]. Computer Science, 2018, 45(6A): 460-464. https://doi.org/

参考文献

[1]HAN J W,KAMBER M,PEI J.Data mining:concepts and techniques(3rd ed)[M].Burlington:Elsevier Science,2011.
[2]ILLHOI Y,HU X H.A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE[C]∥Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries.New York,USA:ACM,2006:220-229.
[3]SILVA J D A,HRUSCHKA E R.Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters[C]∥International Conference on Machine Learning and Applications and Workshops.2011:14-19.
[4]SAVARESI S M,BOLEY D.On the Performance of Bisecting K-Means and PDDP[C]∥Proc.of the 1st SIAM International Conference on Data Mining.Chicago,USA:2001:1-14.
[5]刘广聪,黄婷婷,陈海南.改进的二分K均值聚类算法[J].计算机应用与软件,2015,32(2):261-263.
[6]VAMSI K B S,SATHEESH P,SUNEEL K R.Comparative Study of K-means and Bisecting K-means Techniques in Wordnet Based Document Clustering[J].International Journal of Engineering and Advanced Technology,2012,1(6):119-234.
[7]张军伟,王念滨,黄少滨,等.二分K均值聚类算法优化及并行研究[J].计算机工程,2011,37(17):23-25.
[8]裘国永,张娇.基于二分K-均值的SVM决策树自适应分类方法[J].计算机应用研究,2012,29(10):3685-3709.
[9]STEINBACH M,KARYPIS G,KUMAR V.A Comparison of Document Clustering Techniques[C]∥Proc.of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,USA,2000:525-526.
[10]LIU X Z,FENG G C.Kernel Bisecting K-Means Cluster- ing for SVM Training Sample Reduction[C]∥Proc.of the 19th International Conference on Pattern Recognition.Tampa,USA,2008:1-4.
[11]戴东波,汤春蕾,熊赟.基于整体和局部相似性的序列聚类算法[J].软件学报,2010,21(4):702-717.

相关文章 15

[1]	柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[4]	郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[5]	毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[6]	陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[7]	刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[8]	陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[9]	邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[10]	朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[11]	张宇姣, 黄锐, 张福泉, 隋栋, 张虎. 基于菌群优化的近邻传播聚类算法研究 Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization 计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[12]	左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[13]	杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞. 基于节点相似性和网络嵌入的复杂网络社区发现算法 Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding 计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[14]	韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[15]	蒲实, 赵卫东. 一种面向动态科研网络的社区检测算法 Community Detection Algorithm for Dynamic Academic Network 计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed