大数据聚类算法综述

doi:10.11896/j.issn.1002-137X.2016.6A.090

摘要/Abstract

摘要： 随着数据量的迅速增加,如何对大规模数据进行有效的聚类成为挑战性的研究课题。面向大数据的聚类算法对传统金融行业的股票投资分析、互联网金融行业中的客户细分等金融应用领域具有重要价值。对已有的大数据聚类算法进行了详细划分,并比较了每种聚类算法的优缺点,进一步总结了已有研究存在的问题,最后对未来的研究方向进行了展望。

关键词: 大数据,聚类算法,股票投资分析,客户细分

Abstract: With the rapid increase of data size,it is a challenge to cluster the large scale data.Clustering algorithms for big data are very important for the stock investment analysis in the traditional finance field,customer segmentation in Internet finance field and so on.Firstly,the existing clustering algorithms for big data were divided,and then the advantages and disadvantages of each type were compared.After that,the problems of the existing researches were summarized.Finally,the future research directions were given.

Key words: Big data,Clustering algorithms,Stock investment analysis,Customer segmentation

海沫. 大数据聚类算法综述[J]. 计算机科学, 2016, 43(Z6): 380-383. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.090

HAI Mo. Survey of Clustering Algorithms for Big Data[J]. Computer Science, 2016, 43(Z6): 380-383. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.090

参考文献

[1] Manyika J,Chui M,Brown B,et al.Big data:The Next Frontier for Innovation,Competition,and Productivity[R].McKinsey Global Institute,2011
[2] Fahad A,Alshatri N,Tari Z,et al.A Survey of Clustering Algorithms for Big Data:Taxonomy & Empirical Analysis[J].IEEE Transactions on Emerging Topics in Computing,2014,2(3):1
[3] Ayed A B,Halima M B,Alimi M.Survey on clustering me-thods:Towards fuzzy clustering for Big Data[C]∥6th International Conference of Soft Computing and Pattern Recognition (SoCPaR).IEEE,2014:331-336
[4] Sherin A,Uma S,Saranya K,et al.Survey On Big Data Mining Platforms,Algorithms And Challenges[J].International Journal of Computer Science & Engineering Technology,2014,5(9):854-862
[5] Arora S,Chana I.A survey of clustering techniques for Big Data analysis[C]∥5th International Conference Confluence The Next Generation Information Technology Summit (Conflue-nce).IEEE,2014:59-65
[6] Nagpal P B,Mann P A.Survey of Density Based Clustering Algorithms[J].International Journal of Computer Science and its Applications,2011,1(1):313-317
[7] Xu R,Wunsch D.Survey of clustering algorithms,Neural Networks[J].IEEE Transactions,2005,6(3):645-678
[8] Yadav C,Wang S,Kumar M.Algorithm and approaches to handle large Data-A Survey[J].Eprint Arxiv,2013,1-363(3):1117-1181
[9] Shirkhorshidi A S,Aghabozorgi S,Wah T Y,et al.Big Data Clus-tering:A Review[M]∥Computational Science and Its Applications(ICCSA 2014).Springer International Publishing, 2014:707-720
[10] Wu X,Zhu X,Wu G Q,et al.Data mining with Big Data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107
[11] Aggarwal C C,Reddy C K.Data Classification:Algorithms and Applications[C]∥CRC Press.2014
[12] Vadgasiya M G,Jagani J M.An enhanced algorithm for im-proved cluster generation to remove outlier’s ratio for large datasets in data mining[P].Development,2014
[13] Ng R T,Han J.CLARANS:A method for clustering objects for spatial data mining[J].IEEE Trans.Knowl.Data Eng.,2002,14(5):1003-1016
[14] Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction on Cluster Analysis[M].John Wiley and Sons,1990
[15] Ng R T,Han J.CLARANS:A method for clustering objects for spatial data mining[J].IEEE Trans.Knowl.Data Eng.,2002,4(5):1003-1016
[16] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large database[C]∥SIGMOD Conference.1996:103-114
[17] Zhang T,Ramakrishnan R,Livny M.BIRCH:An efficient data clustering method for very large database[C]∥SIGMOD Conference.1996:103-114
[18] Guha S,Rastogi R.CURE:An efficient clustering algorithm for large database[J].Inf.Syst.,2001,26(1):35-58
[19] Bu F,Chen Z,Zhang Q,et al.Incomplete Big Data Clustering Algorithm Using Feature Selection and Partial Distance[C]∥5th International Conference on Digital Home (ICDH).IEEE, 2014:263-266
[20] Kim B J.A Classifier for Big Data[M]∥Convergence and Hybrid Information Technology.Springer Berlin Heidelberg,2012:505-512
[21] Dhillon I S,Modha D S.A data-clustering algorithm on distributed memory multiprocessors[M]∥Large-Scale Parallel Data Mining.Springer Berlin Heidelberg.2000:245-260
[22] Stoffel K,Belkoniene A.Parallel k/h-means clustering for large data sets[M]∥Euro-Par’99 Parallel Processing.Springer Berlin Heidelberg,1999:1451-1454
[23] Nagesh H S,Goil S,Choudhary A.A scalable parallel subspace clustering algorithm for massive data sets[C]∥International Conference on Parallel Processing,2000.IEEE,2000:477-484
[24] Ng M K,H Zhe-xue.A Parallel k-Prototypes Algorithm forClustering Large Data Sets in Data Mining[J].Intelligent Data Engineering and Learning,1999:263-290
[25] Davidson I,Satyanarayana A.Speeding up k-means clustering by bootstrap averaging[J].IEEE Data Mining Workshop on Clustering Large Data Sets,2003,133(12):982-992
[26] Farnstrom F,Lewis J,Elkan C.Scalability for clustering algorithms revisited[J].ACM SIGKDD Explorations Newsletter,2000,2(1):51-57
[27] Domingos P,Hulten G.A general method for scaling up machine learning algorithms and its application to clustering[C]∥ICML.2001:106-113
[28] Cui X,Zhu P,Yang X,et al.Optimized Big Data K-means clustering using MapReduce[J].The Journal of Supercomputing,2014,70(3):1249-1259
[29] Zhao Y,Chen Y,Liang Z,et al.Big Data Processing with Probabilistic Latent Semantic Analysis on MapReduce [C]∥International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.2014:162-166
[30] Younghoon K,Kyuseok S,Min-Soeng K,et al.DBCUREMR:An efficient density-based clustering algorithm for large data using MapReduce[J].Information Systems,2014,42:15-35
[31] Jianqiang D,Fei W,Bo Y.Accelerating BIRCH for clustering large scale streaming data using CUDA dynamic parallelism[M].Intelligent Data Engineering and Automated Learning-IDEAL 2013.Springer Berlin Heidelberg,2013:409-416

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed