计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 425-429.doi: 10.11896/jsjkx.190700071
邓定胜
DENG Ding-sheng
摘要: 针对DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法内存占用率较高的问题,文中将改进的DBSCAN聚类算法与Spark平台并行聚类计算理论相结合,对海量数据采用分而治之的办法进行聚类处理,大幅减小了算法对内存的占用率。实验仿真结果表明,所提出的并行计算方法能够有效缓解内存不足的问题,并且该方法也能够用来评价DBSCAN聚类算法在Hadoop平台下的聚类分析效果,还能对两种聚类方法进行对比分析,从而获得较好的计算性能;且其比在Hadoop平台上的计算加速度提高了24%左右,因此可以用以评价DBSCAN聚类算法在聚类处理方面的优劣。
中图分类号:
[1] AFSAR M,TAYARANI-N M H,AZIZ M.An adaptive competition-based clustering approach for wireless sensor networks[J].Telecommunication Systems,2016,61(1):181-204. [2] ZHAO W,XIA G S,GOU Z J,et al.An Improved DBSCAN Algorithm[J].Journal of Sichuan Normal University (Natural Science Edition),2013(2):114-116. [3] WANG L,WU L L,FU D M.A Density-Dased Fuzzy Adaptive Clustering Algorithm[J].Journal of University of Science and Technology Beijing,2014(11):312-316. [4] ZHANG Q,WANG X,WANG X.An OPTICS Clustering-Based Anomalous Data Filtering Algorithm for Condition Monitoring of Power Equipment[C]//Revised Selected Papers of the Third Ecml Pkdd Workshop on Data Analytics for Renewable Energy Integration.Springer-Verlag New York,Inc.,2015:123-134. [5] HUANG H,YOO S,YU D,et al.Density-Aware ClusteringBased on Aggregated Heat Kernel and Its Transformation[J].Acm Transactions on Knowledge Discovery from Data,2015,9(4):29-32. [6] YE X,SAKURAI T.Spectral clustering using robust similarity measure based on closeness of shared Nearest Neighbors[C]//International Joint Conference on Neural Networks.IEEE,2015:1-8. [7] SINGH G,KAUR J,MULGE Y.Performance evaluation of enhanced hierarchical and partitioning based clustering algorithm (EPBCA) in data mining[C]//International Conference on Applied and Theoretical Computing and Communication Technology.IEEE,2015:805-810. [8] WANG B,ZHANG C,SONG L,et al.Design and optimization of DBSCAN Algorithm based on CUDA[J].Computer Science,2015,40(5):553-556. [9] LENG Y,CHEN Z,ZHONG F,et al.BRDPHHC:A Balance RDF Data Partitioning Algorithm Based on Hybrid Hierarchical Clustering[C]//IEEE,International Conference on High PERFORMANCE Computing and Communications.IEEE,2015:1755-1760. [10] LIN W H,TAN X J,LIU F J,et al.A new directional query method for polygon dataset in spatial database[J].Earth Science Informatics,2015,8(4):775-786. [11] EZUGWU A E,FRINCU M E,JUNAIDU S B.A Multiagent-Based Approach to Scheduling of Multi-component Applications in Distributed Systems[J].Advances in Intelligent Systems and Computing,2015,347:1-12. [12] ZHANG J,YOU S,GRUENWALD L.Large-scale spatial data processing on GPUs and GPU-accelerated clusters[J].Sigspatial Special,2015,6(3):27-34. [13] YANG J.From Google File System to Omega:A Decade of Advancement in Big Data Management at Google[C]//IEEE First International Conference on Big Data Computing Service and Applications.IEEE,2015:249-255. [14] WANG Y,ZHANG L,TAN J,et al.HydraDB:a resilient RDMA-driven key-value middleware for in-memory cluster computing[C]//2015 SC-International Conference for High Perfor-mance Computing,Networking,Storage and Analysis.IEEE,2017:22. [15] WANG X,WU Y,JIANG X H,et al.Incremental Parallel Fast Clustering Based on DBSCAN Algorithm Under Large-scale Data Sets[J].Computer Applications and Software,2018,35(4):269-280. [16] APILETTI D,GARZA P,PULVIRENTI F.A Review of Scalable Approaches for Frequent Itemset Mining[C]//East European Conference on Advances in Databases and Information Systems.Springer International Publishing,2015:243-247. [17] WANG B,ZHANG C,SONG L,et al.Design and optimization of DBSCAN Algorithm based on CUDA[J].Computer Science,2015,40(5):553-556. [18] CHEN R,ZHANG Y,ZHANG J,et al.Design and Optimizations of the MD5 Crypt Cracking Algorithm Based on CUDA[C]//International Conference on Cloud Computing.Springer International Publishing,2014:155-164. [19] NING J F.DBSCAN Text Clustering Algorithm Based on Spark Framework[J].Journal of Shantou University (Natural Science Edition),2018,33(2):73-80. [20] PENG X Y,YANG Y B,WANG C D,et al.An Efficient Parallel Nonlinear Clustering Algorithm Using MapReduce[C]//2016 IEEE International Parallel and Distributed Processing Sympo-sium Workshops.IEEE,2016:1473-1476. |
[1] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[2] | 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香. 面向国产异构众核架构的CFD非结构网格计算并行优化方法 Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture 计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157 |
[3] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[4] | 戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法 Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark 计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280 |
[5] | 张仁杰, 陈伟, 杭梦鑫, 吴礼发. 基于变分自编码器的不平衡样本异常流量检测 Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder 计算机科学, 2021, 48(7): 62-69. https://doi.org/10.11896/jsjkx.200600022 |
[6] | 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129 |
[7] | 李杉, 许新征. 基于双角度并行剪枝的VGG16优化方法 Parallel Pruning from Two Aspects for VGG16 Optimization 计算机科学, 2021, 48(6): 227-233. https://doi.org/10.11896/jsjkx.200800016 |
[8] | 傅天豪, 田鸿运, 金煜阳, 杨章, 翟季冬, 武林平, 徐小文. 一种面向构件化并行应用程序的性能骨架分析方法 Performance Skeleton Analysis Method Towards Component-based Parallel Applications 计算机科学, 2021, 48(6): 1-9. https://doi.org/10.11896/jsjkx.201200115 |
[9] | 何亚茹, 庞建民, 徐金龙, 朱雨, 陶小涵. 基于神威平台的Floyd并行算法的实现和优化 Implementation and Optimization of Floyd Parallel Algorithm Based on Sunway Platform 计算机科学, 2021, 48(6): 34-40. https://doi.org/10.11896/jsjkx.201100051 |
[10] | 冯凯, 马鑫玉. (n,k)-冒泡排序网络的子网络可靠性 Subnetwork Reliability of (n,k)-bubble-sort Networks 计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139 |
[11] | 汤鑫瑶, 张正军, 储杰, 严涛. 基于自然最近邻的密度峰值聚类算法 Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor 计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112 |
[12] | 胡蓉, 阳王东, 王昊天, 罗辉章, 李肯立. 基于GPU加速的并行WMD算法 Parallel WMD Algorithm Based on GPU Acceleration 计算机科学, 2021, 48(12): 24-28. https://doi.org/10.11896/jsjkx.210600213 |
[13] | 王茂光, 杨行. 一种基于AP-Entropy选择集成的风控模型和算法 Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble 计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110 |
[14] | 王卫东, 徐金慧, 张志峰, 杨习贝. 基于密度峰值聚类的高斯混合模型算法 Gaussian Mixture Models Algorithm Based on Density Peaks Clustering 计算机科学, 2021, 48(10): 191-196. https://doi.org/10.11896/jsjkx.200800191 |
[15] | 张煜, 陆亿红, 黄德才. 基于密度峰值的加权犹豫模糊聚类算法 Weighted Hesitant Fuzzy Clustering Based on Density Peaks 计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043 |
|