计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 216-221.doi: 10.11896/jsjkx.181001846
陈春涛, 陈优广
CHEN Chun-tao, CHEN You-guang
摘要: DP(Clustering by Fast Search and Find of Density Peaks)是一种新提出的基于局部密度和距离的聚类算法,具有能够发现任意形状的类簇、易于理解并且可以高效划分数据的优点。但是该算法无法处理单个类簇中同时存在多个密度峰值的情况,并且数据划分不稳定,容易导致连锁错误划分;当类簇间的密度差异较大时,其无法准确识别稀疏的类簇。为弥补以上不足,提出一种基于影响空间的稳健密度峰值聚类算法。该改进算法通过邻近数据计算局部密度,增强对小规模类簇的识别能力。为了提高数据划分的稳定性,引入了影响空间,并定义了一种新的对称关系,提出了一种新的分配策略。其通过计算目标数据与邻近数据的局部密度比值,并对影响空间进行加权,使算法能够处理具有多密度分布特征的数据。基于人工合成数据集和UCI数据集的模拟对比实验表明,提出的改进策略增强了算法对稀疏类簇的识别能力,提高了数据划分的稳定性,在NMI和Acc评价指标方面取得了较优的结果。
中图分类号:
[1]MACQUEEN J.Some Methods for Classification and Analysisof MultiVariate Observations[C]∥Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press,1966:281-297. [2]KARYPIS G,HAN E H,KUMAR V.CHAMELEON.A hierarchical clustering algorithm using dynamic modeling[J].Computer,1999,32(8):68-75. [3]ESTER M,KRIEGEL H P,XU X.A Density-based Algorithmfor Discovering Clusters in Large Spatial Databases with Noise[C]∥International Conference on Knowledge Discovery and Data Mining.Palo Alto:AAAI Press,1996:226-231. [4]WANG W,YANG J,MUNTZ R R.STING:A Statis-tical Information Grid Approach to Spatial Data Mining[C]∥Interna-tional Conference onVery Large Data Bases.San Mateo:Morgan Kaufmann.1997:186-195. [5]HU Q,WU J,BAI L,et al.Fast K-means for Large Scale Clustering[C]∥Conference on Information and Knowledge Management.New York:ACM,2017:2099-2102. [6]PARK H S,JUN C H.A Simple and Fast Algo-rithm for K-medoids Clustering[J].Expert Systems with Applications,2009,36(2):3336-3341. [7]HE Y,TAN H,LUO W,et al.MR-DBSCAN:An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce[C]∥International Conference on Parallel and Distributed Systems.New York:IEEE,2012:473-480. [8]JAHIRABADKAR S,KULKARNI P.Algorithm to Determine $\ell$-distance Parameter in Density based Clustering[J].Expert Systems with Ap-plications,2014,41(6):2939-2946. [9]NANDA S J,PANDA G.Design of Computationally Efficient Density-based Clustering Algorithms[J].Data & Knowledge Engineering,2015,95:23-38. [10]BAI X,YANG P,SHI X.An Overlapping Community Detection Algorithm Based on Density Peaks[J].Neurocomputing,2016,226(22):7-15. [11]WANG H Y,ZHANG T F,MA F M.Rough K-means Algorithm with Self-adaptive Weights Measurement Based on Space Distance[J].Computer Science,2018,45(7):190-196.(in Chinese) 王慧研,张腾飞,马福民.基于空间距离自适应权重度量的粗糙K-means算法[J].计算机科学,2018,45(7):190-196. [12]RODRIGUEZ A,LAIO A.Machine Learning.Clustering byFast Search and Find of Density Peaks[J].Science,2014,344(6191):1492-1496. [13]ZHANG Y,XIA Y,LIU Y,et al.Clustering Sentences withDensity Peaks for Multidocument Summarization[C]∥Confe-rence of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2015:1262-1267. [14]ZHANG Y,CHEN S.Efficient distributed density peaks forclustering large data sets in mapreduce[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(12):3218-3230. [15]XU J,WANG G,LI T,et al.Fat node leading tree for datastream clustering with density peaks[J].Knowledge-Based Systems,2017,120:99-117. [16]CHEN Y,MA S,CHEN X,et al.Hyperspectral data clustering based on density analysis-ensemble[J].Remote Sensing Letters,2017,8(2):194-203. [17]LI Z J,TANG Y C.Comparative density Peaks Clustering[J].Expert Systemswith Applications,2018,95:236-247. [18]DING J,CHEN Z,HE X,et al.Clustering by Finding Density Peaks Based on Chebyshev’s inequality[C]∥ControlConfe-rence.New York:IEEE,2016:7169-7172. [19]GAO S Y,ZHOU X F,LI S.Clustering by Fast Search and Find of Density Peaks Based on Density Ratio[J].Computer Engineering and Applications,2017,53(16):10-17.(in Chinese) 高诗莹,周晓锋,李帅.基于密度比例的密度峰值聚类算法[J].计算机工程与应用,2017,53(16):10-17. [20]WANG S L,WANG D K,LI C Y,et al.Clustering by Fast Search and Find of Density Peaks with Data Field[J].Chinese Journal of Electronics,2016,25(3):397-402. [21]LIU Y,MA Z,YU F.Adaptive Density Peak Clustering Based on K-nearest Neighbors with Aggregating Strategy[J].Know-ledge Based Systems,2017,133:208-220. [22]JIN W,TUNG A K H,HAN J,et al.Ranking Outliers Using Symmetric Neighborhood Relationship[C]∥Pacific-Asia Conference on Knowledge Discovery and Data Mining.Berlin Heidelberg:Springer,2006:577-593. |
[1] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[2] | 郑文萍, 刘美麟, 杨贵. 一种基于节点稳定性和邻域相似性的社区发现算法 Community Detection Algorithm Based on Node Stability and Neighbor Similarity 计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146 |
[3] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[4] | 李杉, 许新征. 基于双角度并行剪枝的VGG16优化方法 Parallel Pruning from Two Aspects for VGG16 Optimization 计算机科学, 2021, 48(6): 227-233. https://doi.org/10.11896/jsjkx.200800016 |
[5] | 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞. 基于类粒度的克隆代码群稳定性实证研究 Empirical Study on Stability of Clone Code Sets Based on Class Granularity 计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062 |
[6] | 汤鑫瑶, 张正军, 储杰, 严涛. 基于自然最近邻的密度峰值聚类算法 Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor 计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112 |
[7] | 王茂光, 杨行. 一种基于AP-Entropy选择集成的风控模型和算法 Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble 计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110 |
[8] | 王卫东, 徐金慧, 张志峰, 杨习贝. 基于密度峰值聚类的高斯混合模型算法 Gaussian Mixture Models Algorithm Based on Density Peaks Clustering 计算机科学, 2021, 48(10): 191-196. https://doi.org/10.11896/jsjkx.200800191 |
[9] | 张煜, 陆亿红, 黄德才. 基于密度峰值的加权犹豫模糊聚类算法 Weighted Hesitant Fuzzy Clustering Based on Density Peaks 计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043 |
[10] | 徐守坤, 倪楚涵, 吉晨晨, 李宁. 基于YOLOv3的施工场景安全帽佩戴的图像描述 Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 计算机科学, 2020, 47(8): 233-240. https://doi.org/10.11896/jsjkx.190600109 |
[11] | 王萌, 丁志军. 一种新的设备指纹特征选择及模型构建方法 New Device Fingerprint Feature Selection and Model Construction Method 计算机科学, 2020, 47(7): 257-262. https://doi.org/10.11896/jsjkx.190900107 |
[12] | 朱林立, 华钢, 高炜. 决定图框架下本体学习算法的稳定性分析 Stability Analysis of Ontology Learning Algorithm in Decision Graph Setting 计算机科学, 2020, 47(5): 43-50. https://doi.org/10.11896/jsjkx.200100129 |
[13] | 周畅,陆慧梅,向勇,吴竞邦. 区块链在车载自组网中的应用研究及展望 Survey on Application of Blockchain in VANET 计算机科学, 2020, 47(2): 213-220. https://doi.org/10.11896/jsjkx.190600001 |
[14] | 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用 Application of Improved DBSCAN Algorithm on Spark Platform 计算机科学, 2020, 47(11A): 425-429. https://doi.org/10.11896/jsjkx.190700071 |
[15] | 田献珍, 孙立强, 田振中. 基于蚁群算法的图像重建 Image Reconstruction Based on Ant Colony Algorithm 计算机科学, 2020, 47(11A): 231-235. https://doi.org/10.11896/jsjkx.191000128 |
|