基于密度峰值和网格的自动选定聚类中心算法

doi:10.11896/j.issn.1002-137X.2017.11A.085

Abstract

Abstract: Aiming at the shortcomings of clustering by fast search and find of density peaks algorithm(DPC),which calculates massive distance between point objects,has high computational-complexity about clustering process,and needs to select the final cluster centers manually,an improved algorithm that choose clustering centers automatically based on density peak and grid(GADPC) was proposed.Firstly,with the idea of Clique algorithm,all data points are mapped to grid clustering with grid objects rather than point objects,in order to reduce the distance computation and clustering complexity of DPC algorithm.Secondly,the decision accuracy of the number of cluster centers is improved so that it can automatically select cluster centers more precisely.Finally,the relative similarity between grid internal points and adjacent grid points is dealt,so that the edge points and noise points can be solved well.Comparing with machine learning synthetic data sets of UEF and UCI natural data sets,the rand index of those data sets shows that the clustering quality of the improved algorithm is not lower than DPC and K-means algorithm when calculating large data sets,and it improves the dealing efficiency of DPC algorithm.

Key words: Data mining,Clustering analysis,Density peak,Grid,Similarity

XIA Qing-ya. Automatically Selecting Clustering Centers Algorithm Based on Density Peak and Grid[J].Computer Science, 2017, 44(Z11): 403-406.

References

[1] ARABIE P,HUBERT LJ.An Overview of Combinatorial Data Analysis[M]∥Clustering and Classification.2003 :5-63.
[2] MICHALSKI R S,STEPP R E.Learning from Observation:Conceptual Clustering[M].Machine Learning:An Articial Intelligence Approach,1983:331-363.
[3] FUKUNAGE K.Introduction to Statistic Pattern Recognition[M].Academic Press,1990.
[4] QIAN W,ZHOU A.Analyzing popular clustering algorithmsfrom different viewpoints[J].Journal of Software,2002,13(8):1382-1394.
[5] YANG W,WANG T,LI J D.Clustering parameter selection algorithm based on density for divisional clustering process[J].Control & Decision,2016,31(1):21-29.
[6] LLOYD S.Least squares quantization in PCM[J].IEEE Tran-sactions on Information Theory,1982,28(2):129-137.
[7] 夏宁霞,苏一丹,覃希.一种高效的K-medoids聚类算法[J].计算机应用研究,2010,27(12):4517-4519.
[8] NG R T,HAN J.CLARANS:A Method for Clustering Objects for Spatial Data Mining[J].IEEE Transactions on Knowledge &Data Engineering,2002,14(5):1003-1016.
[9] ESTER B M,KRIEGEL H P,SANDER J.et al.A DensityBased algorithm for discovering clusters in large spatial data-bases[C]∥Proceedings of International Conference on know-ledge Discovery and Data Mining.AAAI,1996:226-231.
[10] SANDER J,ESTER M,KRIEGEL H P,et al.Density Based Clustering in Spatial Databases:The Algorithm GDBSCAN and Its Applications[J].Data Mining &Knowledge Discovery,1998,2(2):169-194.
[11] ANKERST M,BREUNIG M M,KRIEGEL H P,et al.OP-TICS:Ordering Points to Identify the Clustering Structure[J].Stanford Research Inst Memo Stanford University,1999,28(2):49-60.
[12] GUHA S,RASTOGI R,SHIM K.CURE:An Efficient Clustering Algorithm for Large Databases[C]∥Proc.of the ACM SIGMOD International Conference on Management of Data.1998:73-84.
[13] KARYPIS G,HAN E H,KUMAR V.CHAMELEON:A Hie-rarchical Clustering Algorithm Using Dynamic Modeling[J].IEEE Computer,1999,32(8):68-75.
[14] ZHANG T,RAMAKRISHNAN R,LIVNY M.BIRCH:an efficient data clustering method for very large databases[J].AcmSigmod Record,1996,25(2):103-114.
[15] WANG W,YANG J,MUNTZ R R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C]∥Proceedings of the 23rd International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc.1997:186-195.
[16] SHEIKHOLESLAMI G,CHATTERJEE S, ZHANG A.WaveCluster:A Multi-Resolution Clustering Approach for Very Large Spatial Databases[C]∥International Conference on Very Large Data Bases.Morgan Kaufmann Publishers Inc.1998:428-439.
[17] AGRAWAL R,GEHRKE J E,GUNOPULOS D,et al.Auto-matic subspace clustering of high dimensional data for data-mining applications[M]∥ACM SIGMOD Record.ACM,1998:94-105.
[18] RODRIGUEZ A,LIAO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[19] 何熊熊,管俊轶,叶宣佐,等.一种基于密度和网格的簇心可确定聚类算法[J].控制与决策,2016,7(5):913-919.
[20] GOIL S,NAGESH H,CHOUDHARY A.MAFIA:Ecient and Scalable Subspace Clustering for Very Large Data Sets[R].Technical Report,1999.
[21] MEHMOOD R,BIE R,DAWOOD H,et al.Fuzzy clustering by fast search and find of density peaks[C]∥International Confe-rence on Identification,Information,and Knowledge in the Internet of Things.IEEE,2016:258-261.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Automatically Selecting Clustering Centers Algorithm Based on Density Peak and Grid

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0