Computer Science ›› 2020, Vol. 47 ›› Issue (11A): 425-429.doi: 10.11896/jsjkx.190700071

• Big Data & Data Science • Previous Articles     Next Articles

Application of Improved DBSCAN Algorithm on Spark Platform

DENG Ding-sheng   

  1. School of Science and Technology,Sichuan Minzu College,Kangding,Sichuan 626001,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:DENG Ding-sheng,born in 1978,asso-ciate professor.His main research interests include algorithm analysis and design and so on.
  • Supported by:
    This work was supported by the Key Project of Natural Science of SichuanMinzu College(XYZB19001ZA),Key Project of Natural Science of Sichuan Provincial Education Department (17ZA0295),2017 Applied Demonstration Course Project of Sichuan Minzu College (sfkc201705) and National Natural Science Foundation of China(11461058).

Abstract: Aiming at the problem of high memory occupancy of DBSCAN(Density-Based Spatial Clustering of Applications with Noise) clustering algorithm,this paper combines the improved DBSCAN clustering algorithm with the parallel clustering calculation theory of Spark platform,and the clustering and processing methods for massive data are clustered,which greatly reduces the memory usage of the algorithm.The experimental simulation results show that the proposed parallel computing method can effectively reduce the shortage of memory,and it also can be used to evaluate the clustering effect of the DBSCAN clustering algorithm on the Hadoop platform,and compare and analyze the twoclustering methods to obtain better computing performance.Besides,the acceleration is increased by about 24% compared with that on the Hadoop platform.The proposed method can be used to evaluate the pros and cons of the DBSCAN clustering algorithm in clustering.

Key words: Parallel computing, DBSCAN, Clustering algorithm, Spark, Clustering acceleration ratio

CLC Number: 

  • TP391
