计算机科学 ›› 2014, Vol. 41 ›› Issue (Z11): 288-293.

• 数据挖掘 • 上一篇    下一篇

聚类方法综述

金建国   

  1. 浙江工业大学理学院应用数学系 杭州310032
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受浙江省自然科学基金(Y1100837),浙江省151人才培养计划资助

Review of Clustering Method

JIN Jian-guo   

  • Online:2018-11-14 Published:2018-11-14

摘要: 文中对聚类方法作了综述。系统地讨论了聚类研究中的4个关键内容:数据点之间“距离”函数的定义方法、类数目的确定、高效优良的聚类算法和聚类算法好坏的评估。分析了各类聚类算法的优劣性,指出了聚类分析研究的发展趋势。

关键词: 聚类,距离,类数目,算法评估

Abstract: The paper reviewed some clustering methods and results.Four key problems were discussed:distance and similarity measures,cluster number,clustering algorithms and the valid methods.The advantages and disadvantages of clustering algorithms were analyzed.The developing trend of clustering analysis techniques was pointed out.

Key words: Clustering,Distance,Cluster number,Algorithm valid

[1] Jin Jian-guo.PPOS SYSTEM:A System of Partitioning Polygonal Objects[C]∥ICISE 2009.2009:920-923
[2] Wazavkar S V,Manjrekar A A.Text Clustering Using HFRECCA and Rough K-Means Cluster Algorithm[J].Discovery,2014,15(40):44-47
[3] Zhang Chun-fei,Fang Zhi-yi.An Improved K-means Clustering Algorithm[J].Journal of Information &Computational Science,2013,0(1):193-199
[4] Zhong Luo,Tang Kun-hao,Li Lin,et al.An Improved Clustering Algorithm of Tunnel Monitoring Data for cloud Computing[J].The Scientific World Journal,2014
[5] Trikha P,Vijendra S.Fast Density Based Clustering Algorithm[J].International Journal of Machine Learning and Computing,2013,3(1):10-12
[6] Wu Jia-wei,Li Xiong-fei,Sun Tao,et al.A density-based clustering algorithm concerning neighborhood balace[J].Journal of Computer Research and Development,2010,7(6):1044-1052
[7] Frey B J,Dueck D.Clustering by Passing Messages Between Data Points[J].Science,2007,315:972-976
[8] Brusco M J,Kohn H-F.Comment on “Clustering by PassingMessages Between Data Points” [J].Science,2008,9:726
[9] Liu Rong,Zhang Hao.Segmentation of 3D Meshes throughSpectral Clustering[C]∥Proceedings of the 12th Pacific Confe-rence on Computer Graphics and Applictaions.2004
[10] Blatt M,Wiseman S,Domany E.Superparamagetic Clustering of Data[J].Physical Review Letters,1996,6(18):
[11] Guha S,Rastogi R,Shim K.CURE:An Efficient Clustering Algorithm for Large Databases[C]∥Proc.ACM SIGMOD Int.Conf.Management of Data.1998:73-84
[12] Guha S,Rastogi R,Shim K.ROCK:A Robust Clustering Algorithm for Categorical Attributes[C]∥Proceedings of the IEEE Conference on Data Engineering.1999
[13] Ng R,Han J.Efficient and Effictive Clustering Methods for Spatial Data Mining[C]∥Proceeding’s of the 20th VLDB Confe-rence.Santiago,Chile,1994
[14] Huang Z.A Fast Clustering Algorithm to Cluster very largeCategorical Data Sets in Data Mining[C]∥DMKD.1997
[15] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large databases[C]∥Proc.ACM SIGMOD Conf.Management of Data.1996:103-114
[16] Sharan R,Shamir R.CLICK:A clustering algorithm with applications to gene expression analysis[C]∥Proc.8th Int.Conf.Intelligent Systems for Molecular Biology.2000:307-316
[17] Ester M,Kriegel H,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise” [C]∥Proc.2nd Int.Conf.Knowledge Discovery and Data Mining(KDD’96).1996:226-231
[18] Birant D,Kut A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1),208-221
[19] Wang W,Yang J,Muntz R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C]∥Proceedings of 23rd VLDB Conference.1997:186-195
[20] Kaufman L,Rousseeuw P.Finding Groups in Data:An Introduction to Cluster Analysis[M].Wiley,1990
[21] Duda R O,Hart P E,Stork D G.Pattern Classification,Second Edition[M].A Wiley-Interscience Publication,2001
[22] Peng Jing,et al.A new similarity computing method based on concept similarity in Chinese test processing[J].Science in China Series F:Information Sciences,2008,1(9):1215-1230
[23] Wang Hai-xun,Wang Wei,Yang Jiong,et al.Clustering by Pattern Similarity in Large Data Sets[C]∥Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.2002:394-405
[24] Tibshirani R,et al.Estimating the number of clusters in a data set via the gap statistic[J].J.R.Statist.Soc.B,63,part 2,2001:411-423
[25] Fraley C,Raftery A E.How Many Clusters? Which Clustering Method? Answer Via Model-Based Cluster Analysis[J].The Computer Journal,1998,41(8):578-588
[26] Sun Hao-jun,Wang Sheng-rui, Jiang Qing-shan.FCM-BasedModel Selection Algorithms for Determining the Number of Clusters[J].Pattern Recognition,2004(37):2027-2037
[27] Halkidi M,Batistakis Y,Vazirgiannis M.On Clustering Validation Techniques[J].Journal of Intelligent Information Systems,2001,17(2/3):107-145
[28] Fraley C,Raftery A E.Model-Based Clustering,Discriminant Analysis,and Density Estimation[J].Journal of the American Statistical Association,2002,97(458):611-631
[29] LEroux B G.Consistent Estimation of a Mixing Distribution[J].The Annals of Statistics,1992,20(3):1350-1360
[30] Baya A E,Granitto P M.ISOMAP based metrics for clustering[J].Inteligencia Artificial,2008,2(37):15-23
[31] Tenenbaum J B,de Silva V,Langford J C.A Global Geometric Framework for Nonlinear Dimensionality Reduction[J].Science,2000,290:2319-2323
[32] Hyvarinen A,Oja E.Independent Component Analysis:algo-rithms and applications[J].Neural Networks,2000,13:411-430
[33] Alter O,Brown P O,Botstein D.Singular value decomposition for genome-wide expression data processing and modeling[J].PNAS,2000,97(18):10101-10106
[34] Kim P M,Tidor B.Subsystem identification through dimensionality reduction of large-scale gene expression data[J].Genome Res,2003,13(7):1706-1718
[35] Murtagh F,Starck J L,Berry M W.Overcoming the Curse of Di-mensionality by Means of the Wavelet Transform[J].The Computer Journal,2000,43:107-120
[36] De Winter J,Wagemans J.Segmentation of object outlines into parts:A large-scale intgrative study[J].Cognition,2006,99:275-325
[37] Freixenet J,Munoz X,Raba D,et al.Yet Another Survey on Image Segmentation:Region and Boundary Information Integration[C]∥ECCV2002.LNS2352,2002:408-422
[38] 李瑞,邱玉辉.基于离散点的蚁群聚类算法的研究[J].计算机科学,2005,2(6):111-113
[39] 田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学E辑:信息科学,2007,37(4):527-543
[40] Barbara B,Chen Ping.Using the Fractal Dimension to ClusterDatasets[C]∥Proc.of the 6th ACM SIGKDD Int’1 Conf.on Knowledge discovery and data mining(KDD-2000).ACM Press,2000:260-264
[41] 杨博,刘大有,Liu Ji-ming,等.复杂网络聚类方法[J].软件学报,2009,0(1):54-66
[42] 张莉,周伟达,焦李成.核聚类算法[J].计算机学报,2002,25(6):587-590
[43] Al-Shammary D,Khalil I,Tari Z.A distributed aggregation and fast fractal clustering approach for SOAP traffic[J].Journal of Network and Computer Applications,2014,41:1-14
[44] Basu B,Srinivas V V.Regional flood frequency analysis using kernel-based fuzzy clustering approach[J].Water Resources Research,2014,50(4):3295-3316
[45] Li Xiang,Wong Hau-san,Wu Si.A fuzzy minimax clusteringmodel and its applications[J].Information Sciences,2012,6(1):114-125

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!