计算机科学 ›› 2014, Vol. 41 ›› Issue (8): 245-249.doi: 10.11896/j.issn.1002-137X.2014.08.052

• 人工智能 • 上一篇    下一篇

一种基于边缘度密度距的聚类算法

吴明晖,张红喜,金苍宏,蔡文明   

  1. 浙江大学城市学院计算机科学与工程学系 杭州310015;浙江大学城市学院计算机科学与工程学系 杭州310015;浙江大学计算机科学与技术学院 杭州310027;浙江大学计算机科学与技术学院 杭州310027;浙江大学计算机科学与技术学院 杭州310027
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受浙江省重点科技创新团队项目(2010R50009)资助

Cluster Algorithm Based on Edge Density Distance

WU Ming-hui,ZHANG Hong-xi,JING Cang-hong and CAI Wen-ming   

  • Online:2018-11-14 Published:2018-11-14

摘要: 传统网格聚类算法聚类质量低,而密度聚类算法时间复杂度高。针对两类算法各自的缺点,结合它们的聚类思想提出了一种新的聚类算法。该算法提出了边缘度密度距作为新的密度度量,并在此基础上逐步确定了类的定义和聚类过程的定义。算法前期通过网格划分操作统计记录了待聚类数据的初始信息,以供随后的k近邻统计使用。在寻找聚类中心点时使用了桶排序的策略,使得算法能快速地选出下一个聚类中心点。随后的聚类步骤是迭代搜索并检验当前类中未检验的k近邻是否满足密度可达性来完成聚类。理论分析和实验测试的结果表明,该算法不仅保持了较高的聚类精度,而且有接近线性的低时间复杂度。

关键词: 聚类,网格,密度,Caed,Dbscan,Kmeans

Abstract: Clustering algorithms based on grid have a drawback of low clustering precision,and most clustering algorithms based on density have high time complexity.In order to improve clustering performance,a cluster algorithm based on edge density distance was proposed in this paper.The new cluster algorithm makes new definitions of density and category.In the clustering process,data are divided into grids and some initial information is recorded firstly for the operation of finding k near points.Then in the process of finding a new clustering center,a method come from bucket sort is used,which makes it fast to find the clustering center.A subsequent procedure is to iteratively analyse k near points of one category to judge whether they are density accessible.Analysis in theory and result of experiments show that the proposed algorithm has both high quality in clustering result and low time complexity.

Key words: Cluster,Grid,Density,Caed,Dbscan,Kmeans

[1] Escudero L F,Garín M A,Pérez G,et al.Scenario Cluster Decomposition of the Lagrangian dual in two-stage stochastic mixed 0-1 optimization [J].Computers & Operations Research,2013,1(40):362-377
[2] Wang W,Yang J,Muntz R.STING:A statistical information grid approach to spatial data mining [C]∥ Proc.of the 23rd Very Large Databases Conf.(VLDB 1997).Athens,Greece.1997:186-195
[3] Rakai L,Farshidi A,Behjat L,et al.A New Length-Based Algebraic Multigrid Clustering Algorithm [J].VLSI Design,2012,2012
[4] Demirtas E A.A Data Envelopment-Based Clustering Approach for Public Sugar Factories in Privatizing Process [J].Mathematical Problems in Engineering,2011,2011
[5] Xu X-W,Ester M,Kriegel H P,et al.A distribution-based clustering algorithm for mining in large spatial databases [C]∥Proc.14th Internat.Conf.on Data Eng.(ICDE 98).Orlando,FL,1998:324-331
[6] Zhong Y-F,Zhang L-P.A New Fuzzy Clustering algorithmBased on Clonal S election for Land Cover Classification [J].Mathematical Problems in Engineering,2011,2011
[7] Elbatta M T,Bolbol R M,Ashour W M.A Vibration Method for Discovering Density Varied Clusters [J]. ISRN Artificial Intelligence,2012,2012
[8] Karypis G,Han E H,Kumar V.CHAMELEON:A hierarchical clustering algorithm using dynamic modeling [J].IEEE Computer,1999,2(8):68-75
[9] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,1(19):48-64
[10] 赵慧,刘希玉,崔海青.网格聚类算法[J].计算机技术与发展,2010,9(20)
[11] Bhattacharya G,Ghosh K,Chowdhury A S.An affinity-based new local distance function and similarity measure for KNN algorithm [J].Pattern Recognition Letters,2012,33(3):356-363
[12] Emre Celebi M,Kingravi H A ,Vela P A.A comparative study of efficient initialization methods for the Kmeans clustering algorithm [J].Expert Systems with Applications,2013,40(1):200-210
[13] Tsai C,Chiu C.Developing a feature weight self-adjustment me-chanism for a Kmeans clustering algorithm [J].Computational Statistics & Data Analysis,2008,10(52):4658-4672
[14] Mok P,Huang H,Ylkwok E.A robust adaptive clustering analysis method for automatic identification of clusters [J].Pattern Recognition,2012,45(8):3017-3033

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!