Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 425-429.doi: 10.11896/jsjkx.190700071

• Big Data & Data Science • Previous Articles     Next Articles

Application of Improved DBSCAN Algorithm on Spark Platform

DENG Ding-sheng   

  1. School of Science and Technology,Sichuan Minzu College,Kangding,Sichuan 626001,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:DENG Ding-sheng,born in 1978,asso-ciate professor.His main research interests include algorithm analysis and design and so on.
  • Supported by:
    This work was supported by the Key Project of Natural Science of SichuanMinzu College(XYZB19001ZA),Key Project of Natural Science of Sichuan Provincial Education Department (17ZA0295),2017 Applied Demonstration Course Project of Sichuan Minzu College (sfkc201705) and National Natural Science Foundation of China(11461058).

Abstract: Aiming at the problem of high memory occupancy of DBSCAN(Density-Based Spatial Clustering of Applications with Noise) clustering algorithm,this paper combines the improved DBSCAN clustering algorithm with the parallel clustering calculation theory of Spark platform,and the clustering and processing methods for massive data are clustered,which greatly reduces the memory usage of the algorithm.The experimental simulation results show that the proposed parallel computing method can effectively reduce the shortage of memory,and it also can be used to evaluate the clustering effect of the DBSCAN clustering algorithm on the Hadoop platform,and compare and analyze the twoclustering methods to obtain better computing performance.Besides,the acceleration is increased by about 24% compared with that on the Hadoop platform.The proposed method can be used to evaluate the pros and cons of the DBSCAN clustering algorithm in clustering.

Key words: Clustering acceleration ratio, Clustering algorithm, DBSCAN, Parallel computing, Spark

CLC Number: 

  • TP391
[1] AFSAR M,TAYARANI-N M H,AZIZ M.An adaptive competition-based clustering approach for wireless sensor networks[J].Telecommunication Systems,2016,61(1):181-204.
[2] ZHAO W,XIA G S,GOU Z J,et al.An Improved DBSCAN Algorithm[J].Journal of Sichuan Normal University (Natural Science Edition),2013(2):114-116.
[3] WANG L,WU L L,FU D M.A Density-Dased Fuzzy Adaptive Clustering Algorithm[J].Journal of University of Science and Technology Beijing,2014(11):312-316.
[4] ZHANG Q,WANG X,WANG X.An OPTICS Clustering-Based Anomalous Data Filtering Algorithm for Condition Monitoring of Power Equipment[C]//Revised Selected Papers of the Third Ecml Pkdd Workshop on Data Analytics for Renewable Energy Integration.Springer-Verlag New York,Inc.,2015:123-134.
[5] HUANG H,YOO S,YU D,et al.Density-Aware ClusteringBased on Aggregated Heat Kernel and Its Transformation[J].Acm Transactions on Knowledge Discovery from Data,2015,9(4):29-32.
[6] YE X,SAKURAI T.Spectral clustering using robust similarity measure based on closeness of shared Nearest Neighbors[C]//International Joint Conference on Neural Networks.IEEE,2015:1-8.
[7] SINGH G,KAUR J,MULGE Y.Performance evaluation of enhanced hierarchical and partitioning based clustering algorithm (EPBCA) in data mining[C]//International Conference on Applied and Theoretical Computing and Communication Technology.IEEE,2015:805-810.
[8] WANG B,ZHANG C,SONG L,et al.Design and optimization of DBSCAN Algorithm based on CUDA[J].Computer Science,2015,40(5):553-556.
[9] LENG Y,CHEN Z,ZHONG F,et al.BRDPHHC:A Balance RDF Data Partitioning Algorithm Based on Hybrid Hierarchical Clustering[C]//IEEE,International Conference on High PERFORMANCE Computing and Communications.IEEE,2015:1755-1760.
[10] LIN W H,TAN X J,LIU F J,et al.A new directional query method for polygon dataset in spatial database[J].Earth Science Informatics,2015,8(4):775-786.
[11] EZUGWU A E,FRINCU M E,JUNAIDU S B.A Multiagent-Based Approach to Scheduling of Multi-component Applications in Distributed Systems[J].Advances in Intelligent Systems and Computing,2015,347:1-12.
[12] ZHANG J,YOU S,GRUENWALD L.Large-scale spatial data processing on GPUs and GPU-accelerated clusters[J].Sigspatial Special,2015,6(3):27-34.
[13] YANG J.From Google File System to Omega:A Decade of Advancement in Big Data Management at Google[C]//IEEE First International Conference on Big Data Computing Service and Applications.IEEE,2015:249-255.
[14] WANG Y,ZHANG L,TAN J,et al.HydraDB:a resilient RDMA-driven key-value middleware for in-memory cluster computing[C]//2015 SC-International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.IEEE,2017:22.
[15] WANG X,WU Y,JIANG X H,et al.Incremental Parallel Fast Clustering Based on DBSCAN Algorithm Under Large-scale Data Sets[J].Computer Applications and Software,2018,35(4):269-280.
[16] APILETTI D,GARZA P,PULVIRENTI F.A Review of Scalable Approaches for Frequent Itemset Mining[C]//East European Conference on Advances in Databases and Information Systems.Springer International Publishing,2015:243-247.
[17] WANG B,ZHANG C,SONG L,et al.Design and optimization of DBSCAN Algorithm based on CUDA[J].Computer Science,2015,40(5):553-556.
[18] CHEN R,ZHANG Y,ZHANG J,et al.Design and Optimizations of the MD5 Crypt Cracking Algorithm Based on CUDA[C]//International Conference on Cloud Computing.Springer International Publishing,2014:155-164.
[19] NING J F.DBSCAN Text Clustering Algorithm Based on Spark Framework[J].Journal of Shantou University (Natural Science Edition),2018,33(2):73-80.
[20] PENG X Y,YANG Y B,WANG C D,et al.An Efficient Parallel Nonlinear Clustering Algorithm Using MapReduce[C]//2016 IEEE International Parallel and Distributed Processing Sympo-sium Workshops.IEEE,2016:1473-1476.
[1] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[2] CHEN Xin, LI Fang, DING Hai-xin, SUN Wei-ze, LIU Xin, CHEN De-xun, YE Yue-jin, HE Xiang. Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture [J]. Computer Science, 2022, 49(6): 99-107.
[3] ZHANG Ya-di, SUN Yue, LIU Feng, ZHU Er-zhou. Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index [J]. Computer Science, 2022, 49(1): 121-132.
[4] DAI Hong-liang, ZHONG Guo-jin, YOU Zhi-ming , DAI Hong-ming. Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark [J]. Computer Science, 2021, 48(9): 118-124.
[5] ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa. Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder [J]. Computer Science, 2021, 48(7): 62-69.
[6] YU Jian-ye, QI Yong, WANG Bao-zhuo. Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark [J]. Computer Science, 2021, 48(6A): 518-523.
[7] LI Shan, XU Xin-zheng. Parallel Pruning from Two Aspects for VGG16 Optimization [J]. Computer Science, 2021, 48(6): 227-233.
[8] FU Tian-hao, TIAN Hong-yun, JIN Yu-yang, YANG Zhang, ZHAI Ji-dong, WU Lin-ping, XU Xiao-wen. Performance Skeleton Analysis Method Towards Component-based Parallel Applications [J]. Computer Science, 2021, 48(6): 1-9.
[9] HE Ya-ru, PANG Jian-min, XU Jin-long, ZHU Yu, TAO Xiao-han. Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform [J]. Computer Science, 2021, 48(6): 34-40.
[10] LI Fan, YAN Xing, ZHANG Xiao-yu. Optimization of GPU-based Eigenface Algorithm [J]. Computer Science, 2021, 48(4): 197-204.
[11] TANG Xin-yao, ZHANG Zheng-jun, CHU Jie, YAN Tao. Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor [J]. Computer Science, 2021, 48(3): 151-157.
[12] HU Rong, YANG Wang-dong, WANG Hao-tian, LUO Hui-zhang, LI Ken-li. Parallel WMD Algorithm Based on GPU Acceleration [J]. Computer Science, 2021, 48(12): 24-28.
[13] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[14] WANG Wei-dong, XU Jin-hui, ZHANG Zhi-feng, YANG Xi-bei. Gaussian Mixture Models Algorithm Based on Density Peaks Clustering [J]. Computer Science, 2021, 48(10): 191-196.
[15] ZHANG Yu, LU Yi-hong, HUANG De-cai. Weighted Hesitant Fuzzy Clustering Based on Density Peaks [J]. Computer Science, 2021, 48(1): 145-151.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!