计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 403-406.doi: 10.11896/j.issn.1002-137X.2017.11A.085
夏庆亚
XIA Qing-ya
摘要: 针对快速搜索和发现密度峰值的聚类算法(DPC)中数据点之间计算复杂,最终聚类的中心个数需要通过决策图手动选取等问题,提出基于密度峰值和网格的自动选定聚类中心的改进算法GADPC。首先结合Clique网格聚类算法的思想,不再针对点对象进行操作,而是将点映射到网格,并将网格作为聚类对象,从而减少了DPC算法中对数据点之间的距离计算和聚类次数;其次通过改进后的聚类中心个数判定准则更精确地自动选定聚类中心个数;最后对网格边缘点和噪声点,采用网格内点对象和相邻网格间的相似度进行了处理。实验通过采用UEF(University of Eastern Finland)提供的数据挖掘使用的人工合成数据集和UCI自然数据集进行对比,其聚类评价指标(Rand Index)表明,改进的算法在计算大数据集时聚类质量不低于DPC和K-means算法,而且提高了DPC算法的处理效率。
[1] ARABIE P,HUBERT LJ.An Overview of Combinatorial Data Analysis[M]∥Clustering and Classification.2003 :5-63. [2] MICHALSKI R S,STEPP R E.Learning from Observation:Conceptual Clustering[M].Machine Learning:An Articial Intelligence Approach,1983:331-363. [3] FUKUNAGE K.Introduction to Statistic Pattern Recognition[M].Academic Press,1990. [4] QIAN W,ZHOU A.Analyzing popular clustering algorithmsfrom different viewpoints[J].Journal of Software,2002,13(8):1382-1394. [5] YANG W,WANG T,LI J D.Clustering parameter selection algorithm based on density for divisional clustering process[J].Control & Decision,2016,31(1):21-29. [6] LLOYD S.Least squares quantization in PCM[J].IEEE Tran-sactions on Information Theory,1982,28(2):129-137. [7] 夏宁霞,苏一丹,覃希.一种高效的K-medoids聚类算法[J].计算机应用研究,2010,27(12):4517-4519. [8] NG R T,HAN J.CLARANS:A Method for Clustering Objects for Spatial Data Mining[J].IEEE Transactions on Knowledge &Data Engineering,2002,14(5):1003-1016. [9] ESTER B M,KRIEGEL H P,SANDER J.et al.A DensityBased algorithm for discovering clusters in large spatial data-bases[C]∥Proceedings of International Conference on know-ledge Discovery and Data Mining.AAAI,1996:226-231. [10] SANDER J,ESTER M,KRIEGEL H P,et al.Density Based Clustering in Spatial Databases:The Algorithm GDBSCAN and Its Applications[J].Data Mining &Knowledge Discovery,1998,2(2):169-194. [11] ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OP-TICS:Ordering Points to Identify the Clustering Structure[J].Stanford Research Inst Memo Stanford University,1999,28(2):49-60. [12] GUHA S,RASTOGI R,SHIM K.CURE:An Efficient Clustering Algorithm for Large Databases[C]∥Proc.of the ACM SIGMOD International Conference on Management of Data.1998:73-84. [13] KARYPIS G,HAN E H,KUMAR V.CHAMELEON:A Hie-rarchical Clustering Algorithm Using Dynamic Modeling[J].IEEE Computer,1999,32(8):68-75. [14] ZHANG T,RAMAKRISHNAN R,LIVNY M.BIRCH:an efficient data clustering method for very large databases[J].AcmSigmod Record,1996,25(2):103-114. [15] WANG W,YANG J,MUNTZ R R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C]∥Proceedings of the 23rd International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc.1997:186-195. [16] SHEIKHOLESLAMI G,CHATTERJEE S, ZHANG A.WaveCluster:A Multi-Resolution Clustering Approach for Very Large Spatial Databases[C]∥International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc.1998:428-439. [17] AGRAWAL R,GEHRKE J E,GUNOPULOS D,et al.Auto-matic subspace clustering of high dimensional data for data-mining applications[M]∥ACM SIGMOD Record.ACM,1998:94-105. [18] RODRIGUEZ A,LIAO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [19] 何熊熊,管俊轶,叶宣佐,等.一种基于密度和网格的簇心可确定聚类算法[J].控制与决策,2016,7(5):913-919. [20] GOIL S,NAGESH H,CHOUDHARY A.MAFIA:Ecient and Scalable Subspace Clustering for Very Large Data Sets[R].Technical Report,1999. [21] MEHMOOD R,BIE R,DAWOOD H,et al.Fuzzy clustering by fast search and find of density peaks[C]∥International Confe-rence on Identification,Information,and Knowledge in the Internet of Things.IEEE,2016:258-261. |
No related articles found! |
|