Computer Science ›› 2015, Vol. 42 ›› Issue (Z6): 491-499.

Previous Articles     Next Articles

General Overview on Clustering Algorithms

WU Yu-hong   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Data mining techniques can be used to find out potential and useful knowledge from the vast amount of data,and it plays a new significant role to the stored data in the info-times.With the rapid development of the data mining techniques,the technique of grid clustering,as important parts of data mining,are widely applied to the fields such as pattern recognition,data analysis,image processing,and market research.Research on grid clustering algorithms has become a highly active topic in the data mining research.In this thesis,the author presented the theory of data mining,and deeply analyzes the algorithms of grid clustering.Based on the analysis of traditional grid clustering algorithms,we advanced some improved grid clustering algorithms that can enhance the quality and efficiency of grid clustering compared with the traditional grid clustering algorithms.Based on the analysis of traditional algorithms for multi-density,we advanced a grid-based clustering algorithm for multi-density(GDD).The GDD is a kind of the multi-stage clustering that integrates grid-based clustering,the technique of density threshold descending and border points extraction.As shown in the research,GDD algorithm can not only clusters correctly but find outliers in the dataset,and it effectively solves the problem that traditional grid algorithms can cluster only or find outliers only.The precision of GDD algorithm is better than that of SNN.The GDD algorithm works well for even density dataset and lots of multi-density datasets;it can discover clusters of arbitrary shapes;it isn’t sensitive to the input order of noises and outliers data,but it is imperfect to cluster on some multi-density datasets.

Key words: Grid clustering,Density threshold descending,Multi-stage clustering

[1] 马刚,李志刚.数据仓库与数据挖掘的原理及应用[M].北京:高等教育出版社,2012:20-42
[2] 陈志泊.数据仓库与数据挖掘[M].北京:清华大学出版社,2011:8-37
[3] Tan Pang-ning,Steinbach M,Kumar V.数据挖掘导论[M].范明,译.北京:人民邮电出版社,2013:6-53
[4] Dunham M H.DATA MINING Introductory and AdvancedTopics [M].北京:清华大学出版社,2010:23-60
[5] Ng R T,Han J.Efficient and effective clustering methods forspatial data mining[C]∥Proc of the 20th VLDB Conference.Chile,Santia,2010:144-155
[6] Spivak G .Victory in Limbo:Imagism [C].Nelson C,Grasberg L,eds.Urbana:University of Illinois Press,2010:271-313
[7] Zhang T,Rrmakrishnan R,Livny M.An efficient data clustering method for very large databases[C]∥Proc of ACM SIGMOD International Conference on Management of Data.New York:ACM Press,2012:103-114
[8] Tan Pang-ning,Steinbach M.Introduction to Data Mining[M].2010:372-373
[9] Chen Y,Tu L.Density-Based Clustering for Real-Time Stream Data[C]∥ Proceedings of the 13th ACM SIGKDD International Conference of Knowledge Discovery and Data Mining.San Jose,California,USA,2009:133-142
[10] 曹洪其,余岚,孙志辉.基于网格聚类技术的离群点挖掘算法[J].计算机工程,2006(11):18-96
[11] 孙玉芬.基于网格方法的聚类算法研究[D].武汉:华中科技大学,2011
[12] Han J,Kamber M.Data Mining:Concepts and Techniques [J].Morgan Kaufmann Publishers,2011,2(9):33-82
[13] Chen Ming-yan,Han Jia-wei,Philip S Y.Data mining:an overview from a database perspective [J].IEEE Trans on Know-ledge and Data Eng.,1996,8(6):806-833

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!