Computer Science ›› 2016, Vol. 43 ›› Issue (Z6): 380-383.doi: 10.11896/j.issn.1002-137X.2016.6A.090

Previous Articles     Next Articles

Survey of Clustering Algorithms for Big Data

HAI Mo   

  • Online:2018-11-14 Published:2018-11-14

Abstract: With the rapid increase of data size,it is a challenge to cluster the large scale data.Clustering algorithms for big data are very important for the stock investment analysis in the traditional finance field,customer segmentation in Internet finance field and so on.Firstly,the existing clustering algorithms for big data were divided,and then the advantages and disadvantages of each type were compared.After that,the problems of the existing researches were summarized.Finally,the future research directions were given.

Key words: Big data,Clustering algorithms,Stock investment analysis,Customer segmentation

[1] Manyika J,Chui M,Brown B,et al.Big data:The Next Frontier for Innovation,Competition,and Productivity[R].McKinsey Global Institute,2011
[2] Fahad A,Alshatri N,Tari Z,et al.A Survey of Clustering Algorithms for Big Data:Taxonomy & Empirical Analysis[J].IEEE Transactions on Emerging Topics in Computing,2014,2(3):1
[3] Ayed A B,Halima M B,Alimi M.Survey on clustering me-thods:Towards fuzzy clustering for Big Data[C]∥6th International Conference of Soft Computing and Pattern Recognition (SoCPaR).IEEE,2014:331-336
[4] Sherin A,Uma S,Saranya K,et al.Survey On Big Data Mining Platforms,Algorithms And Challenges[J].International Journal of Computer Science & Engineering Technology,2014,5(9):854-862
[5] Arora S,Chana I.A survey of clustering techniques for Big Data analysis[C]∥5th International Conference Confluence The Next Generation Information Technology Summit (Conflue-nce).IEEE,2014:59-65
[6] Nagpal P B,Mann P A.Survey of Density Based Clustering Algorithms[J].International Journal of Computer Science and its Applications,2011,1(1):313-317
[7] Xu R,Wunsch D.Survey of clustering algorithms,Neural Networks[J].IEEE Transactions,2005,6(3):645-678
[8] Yadav C,Wang S,Kumar M.Algorithm and approaches to handle large Data-A Survey[J].Eprint Arxiv,2013,1-363(3):1117-1181
[9] Shirkhorshidi A S,Aghabozorgi S,Wah T Y,et al.Big Data Clus-tering:A Review[M]∥Computational Science and Its Applications(ICCSA 2014).Springer International Publishing, 2014:707-720
[10] Wu X,Zhu X,Wu G Q,et al.Data mining with Big Data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107
[11] Aggarwal C C,Reddy C K.Data Classification:Algorithms and Applications[C]∥CRC Press.2014
[12] Vadgasiya M G,Jagani J M.An enhanced algorithm for im-proved cluster generation to remove outlier’s ratio for large datasets in data mining[P].Development,2014
[13] Ng R T,Han J.CLARANS:A method for clustering objects for spatial data mining[J].IEEE Trans.Knowl.Data Eng.,2002,14(5):1003-1016
[14] Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction on Cluster Analysis[M].John Wiley and Sons,1990
[15] Ng R T,Han J.CLARANS:A method for clustering objects for spatial data mining[J].IEEE Trans.Knowl.Data Eng.,2002,4(5):1003-1016
[16] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large database[C]∥SIGMOD Conference.1996:103-114
[17] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large database[C]∥SIGMOD Conference.1996:103-114
[18] Guha S,Rastogi R.CURE:An efficient clustering algorithm for large database[J].Inf.Syst.,2001,26(1):35-58
[19] Bu F,Chen Z,Zhang Q,et al.Incomplete Big Data Clustering Algorithm Using Feature Selection and Partial Distance[C]∥5th International Conference on Digital Home (ICDH).IEEE, 2014:263-266
[20] Kim B J.A Classifier for Big Data[M]∥Convergence and Hybrid Information Technology.Springer Berlin Heidelberg,2012:505-512
[21] Dhillon I S,Modha D S.A data-clustering algorithm on distributed memory multiprocessors[M]∥Large-Scale Parallel Data Mining.Springer Berlin Heidelberg.2000:245-260
[22] Stoffel K,Belkoniene A.Parallel k/h-means clustering for large data sets[M]∥Euro-Par’99 Parallel Processing.Springer Berlin Heidelberg,1999:1451-1454
[23] Nagesh H S,Goil S,Choudhary A.A scalable parallel subspace clustering algorithm for massive data sets[C]∥International Conference on Parallel Processing,2000.IEEE,2000:477-484
[24] Ng M K,H Zhe-xue.A Parallel k-Prototypes Algorithm forClustering Large Data Sets in Data Mining[J].Intelligent Data Engineering and Learning,1999:263-290
[25] Davidson I,Satyanarayana A.Speeding up k-means clustering by bootstrap averaging[J].IEEE Data Mining Workshop on Clustering Large Data Sets,2003,133(12):982-992
[26] Farnstrom F,Lewis J,Elkan C.Scalability for clustering algorithms revisited[J].ACM SIGKDD Explorations Newsletter,2000,2(1):51-57
[27] Domingos P,Hulten G.A general method for scaling up machine learning algorithms and its application to clustering[C]∥ICML.2001:106-113
[28] Cui X,Zhu P,Yang X,et al.Optimized Big Data K-means clustering using MapReduce[J].The Journal of Supercomputing,2014,70(3):1249-1259
[29] Zhao Y,Chen Y,Liang Z,et al.Big Data Processing with Probabilistic Latent Semantic Analysis on MapReduce [C]∥International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.2014:162-166
[30] Younghoon K,Kyuseok S,Min-Soeng K,et al.DBCUREMR:An efficient density-based clustering algorithm for large data using MapReduce[J].Information Systems,2014,42:15-35
[31] Jianqiang D,Fei W,Bo Y.Accelerating BIRCH for clustering large scale streaming data using CUDA dynamic parallelism[M].Intelligent Data Engineering and Automated Learning-IDEAL 2013.Springer Berlin Heidelberg,2013:409-416

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!