计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 460-464.
郁湧1,2,康庆怡1,陈长赓1,阚世林1,骆永军1
YU Yong1,2,KANG Qing-yi1,CHEN Chang-geng1,KAN Shi-lin1,LUO Yong-jun1
摘要: 聚类分析是数据挖掘中最重要的技术之一,它在社会经济的各个领域都具有重要作用,并被广泛应用。K均值算法是最经典、应用最广泛的聚类方法之一,但其缺点是过度依赖初始条件和聚类数目难以确定,这制约了其应用范围。引入簇的内聚度和耦合度的定义与度量方法,基于“高内聚低耦合”的原理,在二分K均值聚类过程中对得到的簇进行不断的分裂和合并,并判断聚类结果是否满足要求以确定聚类的次数和簇的个数,从而实现对二分K均值聚类过程的改进。在Iris数据集上的实验测试与分析表明该算法不仅更加稳定,而且其聚类结果的正确率也较高。
中图分类号:
[1]HAN J W,KAMBER M,PEI J.Data mining:concepts and techniques(3rd ed)[M].Burlington:Elsevier Science,2011. [2]ILLHOI Y,HU X H.A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE[C]∥Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries.New York,USA:ACM,2006:220-229. [3]SILVA J D A,HRUSCHKA E R.Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters[C]∥International Conference on Machine Learning and Applications and Workshops.2011:14-19. [4]SAVARESI S M,BOLEY D.On the Performance of Bisecting K-Means and PDDP[C]∥Proc.of the 1st SIAM International Conference on Data Mining.Chicago,USA:2001:1-14. [5]刘广聪,黄婷婷,陈海南.改进的二分K均值聚类算法[J].计算机应用与软件,2015,32(2):261-263. [6]VAMSI K B S,SATHEESH P,SUNEEL K R.Comparative Study of K-means and Bisecting K-means Techniques in Wordnet Based Document Clustering[J].International Journal of Engineering and Advanced Technology,2012,1(6):119-234. [7]张军伟,王念滨,黄少滨,等.二分K均值聚类算法优化及并行研究[J].计算机工程,2011,37(17):23-25. [8]裘国永,张娇.基于二分K-均值的SVM决策树自适应分类方法[J].计算机应用研究,2012,29(10):3685-3709. [9]STEINBACH M,KARYPIS G,KUMAR V.A Comparison of Document Clustering Techniques[C]∥Proc.of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,USA,2000:525-526. [10]LIU X Z,FENG G C.Kernel Bisecting K-Means Cluster- ing for SVM Training Sample Reduction[C]∥Proc.of the 19th International Conference on Pattern Recognition.Tampa,USA,2008:1-4. [11]戴东波,汤春蕾,熊赟.基于整体和局部相似性的序列聚类算法[J].软件学报,2010,21(4):702-717. |
[1] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[2] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[3] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059 |
[4] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[5] | 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042 |
[6] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[7] | 刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230 |
[8] | 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174 |
[9] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[10] | 朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131 |
[11] | 张宇姣, 黄锐, 张福泉, 隋栋, 张虎. 基于菌群优化的近邻传播聚类算法研究 Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization 计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218 |
[12] | 左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228 |
[13] | 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞. 基于节点相似性和网络嵌入的复杂网络社区发现算法 Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding 计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009 |
[14] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[15] | 蒲实, 赵卫东. 一种面向动态科研网络的社区检测算法 Community Detection Algorithm for Dynamic Academic Network 计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023 |
|