计算机科学 ›› 2014, Vol. 41 ›› Issue (Z6): 406-408.

• 数据挖掘 • 上一篇    下一篇

一种具有优良抗噪性能的初始聚类质心选择算法

马仕玉,李益才,蓝章礼   

  1. 重庆交通大学信息科学与工程学院 重庆400074;重庆交通大学信息科学与工程学院 重庆400074;重庆交通大学信息科学与工程学院 重庆400074
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受重庆市交通委员会科学计划项目:基于RFID的车辆非法营运监控与特征提取资助

Novel Anti-noise K-means Algorithm Based on Spatial Distance Difference

MA Shi-yu,LI Yi-cai and LAN Zhang-li   

  • Online:2018-11-14 Published:2018-11-14

摘要: K-means算法由于其固有的初始聚类质心敏感性,存在聚类结果不稳定、容易收敛到局部最优等问题。现有改进方案在处理无噪数据集时能够在降低迭代次数的同时得到近似全局最优解,但在处理有噪数据集时容易陷入局部最优,甚至聚类效果低于传统的K-means算法。在最远空间距离确定初始质心算法的基础上,提出一种基于空间距离差的初始质心选择算法。该算法的核心思想是通过计算非聚类质心点到已选质心的距离和,并排序,选取相邻距离差最大的两点中靠近已知质心的点作为下一个簇的初始质心而实现的。实验结果表明,所提算法在聚类迭代次数相当的情况下,对不含噪声数据集的聚类准确度增加约1%,对于含有噪声的数据集,聚类准确度达到90%以上。

关键词: K-means算法,初始质心,空间距离差,噪声数据 中图法分类号TP391文献识别码A

Abstract: Due to the inherent initial clustering center sensitivity of K-means algorithm,it exists problems including result instability and being easy to fall into local optimum.The current improvement schemes can reduce the number of iteration and obtain an approximate global optimal solution when deal with noise-free data sets.But for noisy data sets,it would be easy to fall into local optimum,and the clustering result is lower than traditional K-means algorithm.Based on the algorithm that can find initial clustering centers according to the farthest spatial distance,the paper proposed a novel algorithm to select initial centers based on spatial distance difference.The main idea of the algorithm is calculating the sum distances between non-clustering center and all selected centers,then sort them.Choose the point which is the closer to the given centers as the new selected cluster center.Experimental results show that under the quite condition of iteration,when deal with noise-free data sets,the clustering accuracy of the proposed algorithm is improved about 1%.For noisy data sets,the classified accuracy is above 90%.

Key words: K-means algorithm,Initial centroid,Spatial distance difference,Noisy data

[1] 欧陈委.K均值聚类算法的研究与改进[D].长沙:长沙理工大学,2011
[2] 张俊生.数据挖掘中的聚类方法及其应用研究[D].天津:天津理工大学,2012
[3] Anil K J.Data clustering:50year beyond K-Means[J].Pattern Recognition Letters,2010,31(08):651-666
[4] 吴晓蓉.K-均值聚类算法初始质心选取相关问题的研究[D].长沙:湖南大学,2008
[5] Fayyad U,Reina C,Bradley P S.Initialization of Iterative Refinement Clustering Algorithms[C]∥Proc of the Fourth International Conference on Knowledge Discovery and Data Mining.1998:194-198
[6] Adil M,Bafirov,Julien U.Fast modified global k-means algorithm for incremental cluster construction[J].Pattern Recognition,2011,44(4):866-876
[7] 曹志宇,张忠林,李元韬.快速查找初始聚类质心的K_means算法 [J].兰州交通大学学报,2009,28(6):15-18
[8] 张真,任贺宇.一种基于动态网格技术的K-means初始质心选取算法[J].微电子学与计算机,2013,30(6):101-104
[9] 谢娟英,蒋帅,王春霞.一种改进的全局K-均值聚类算法[J].陕西师范大学学报,2010,38(2):18-22
[10] 谢娟英,郭文娟,谢维信.基于样本空间分布密度的初始聚类中心优化K-均值算法[J].计算机应用研究,2012,29(3):888-892
[11] 杨燕,靳蕃,Kamel M.聚类有效性评价综述[J].计算机应用研究,2008,25(6):1631-1632
[12] 张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12
[13] Hubert L,Arabie P.Comparing partitions [J].Journal of Classification,1985,2(1):193-218
[14] 王纵虎,刘志镜子,陈东辉.基于粒子群优化的模糊C-均值聚类算法研究[J].计算机科学,2012,39(9):166-169

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!