计算机科学 ›› 2018, Vol. 45 ›› Issue (11): 244-248.doi: 10.11896/j.issn.1002-137X.2018.11.038

• 人工智能 • 上一篇    下一篇

基于核密度估计的K-CFSFDP聚类算法

董晓君, 程春玲   

  1. (南京邮电大学计算机学院 南京210003)
  • 收稿日期:2017-10-27 发布日期:2019-02-25
  • 作者简介:董晓君(1993-),男,硕士生,主要研究方向为数据挖掘,E-mail:dongxiaojun_njupt@163.com;程春玲(1972-),女,教授,CCF会员,主要研究方向为数据管理、云计算中的资源管理和优化等,E-mail:chengcl@njupt.edu.cn(通信作者)。

K-CFSFDP Clustering Algorithm Based on Kernel Density Estimation

DONG Xiao-jun, CHENG Chun-ling   

  1. ( College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
  • Received:2017-10-27 Published:2019-02-25

摘要: 快速搜索和发现密度峰值的聚类算法(Clustering by Fast Search and Find of Density Peaks,CFSFDP)是一种新的基于密度的聚类算法,它通过发现密度峰值来有效地识别类簇中心,具有聚类速度快、实现简单等优点。针对CFSFDP算法的准确性依赖于数据集的密度估计和截断距离(dc)的人为选择问题,提出一种基于核密度估计的K-CFSFDP算法。该算法利用无参的核密度估计分析数据点的分布特征并自适应地选取dc,从而搜索和发现数据点的密度峰值,并以峰值点数据作为初始聚类中心。基于4个典型数据集的仿真结果表明,K-CFSFDP算法比CFSFDP,K-means和DBSCAN算法具有更高的准确度和更强的鲁棒性。

关键词: 核密度估计, 聚类, 聚类中心, 密度峰值

Abstract: The CFSFDP (Clustering by Fast Search and Find of Density Peaks) is a new density-based clustering algorithm,it can identify the cluster centers effectively by finding the density peaks,and it has the advantages of fast clustering speed and simple realization.The accuracy of CFSFDP algorithm depends on the density estimation in the dataset and cut off distance (dc) of artificial selection.Therefore,an improved K-CFSFDP algorithm based on kernel density estimation was presented.The algorithm uses non parametric kernel density to analyze distribution of data points and selects the dc adaptively to search and find the peak density of data points,with the peak point data as the initial cluster centers.The simulated results on 4 typical datasets show that the K-CFSFDP algorithm has better performance in accuracy and better robustness than CFSFDP,K-means and DBSCAN algorithm.

Key words: Cluster center, Clustering, Density peak, Kernel density estimation

中图分类号: 

  • TP311
[1]MENG X F,CI X.Big Data Management:Concepts,Tech- niques,and Challenges[J].Computer Research and Development,2013,50(1):146-169.(in Chinese)
孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169.
[2]SUN J G,LIU J,ZHAO L Y.Study on clustering algorithms [J].Journal of Software,2008,19(1):48-61.(in Chinese)
孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[3]MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]∥Fifth Berkeley symposium on mathematical statistics and probability.Berkeley:California Press,1967,1(14):281-297.
[4]GELARD R,GOLDMAN O,SPIELER I.Investigating diversity of clustering methods:An empirical comparison[J].Elsevier Scien-ce Publishers B.V,2007,63(1):155-156.
[5]BIRANT D,KUT A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1):208-221.
[6]RODRIGUEZ A,LAIO A.Machine learning.Clustering by fast search and find of density peaks [J].Science,2014,344(6191):1492.
[7]GAO J,ZHAO L,CHEN Z,et al.ICFS:An Improved Fast Search and Find of Density Peaks Clustering Algorithm[C]∥IEEE International Conference on Pervasive Intelligence and Computing.Auckland:IEEE Press,2016:537-543.
[8]WANG S,WANG D,LI C,et al.Clustering by fast search and find of density peaks with data field [J].Chinese Journal of Electronics,2016,25(3):397-402.
[9]CHEN J Y,HE H H.Research on Clustering Algorithm Based on Density-based Clustering Center for Automatic Determination of Mixed Attribute Data [J].Journal of Automation,2015,41(10):1798-1813.(in Chinese)
陈晋音,何辉豪.基于密度的聚类中心自动确定的混合属性数据聚类算法研究[J].自动化学报,2015,41(10):1798-1813.
[10]JIANG L Q,ZHANG M X,ZHENG J L,et al.Optimization research of fast searching and finding density peak clustering algorithm [J].Application Research of Computers,2016,33(11):3251-3254.(in Chinese)
蒋礼青,张明新,郑金龙,等.快速搜索与发现密度峰值聚类算法的优化研究[J].计算机应用研究,2016,33(11):3251-3254.
[11]ZHANG R,MA H,LIU Q,et al.An Improved Fast Search Clustering Algorithm Based on Kernel Density[C]∥IEEE International Conference on Smart City.Chengdu:IEEE Press,2015:689-693.
[12]DU M,DING S,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145.
[13]XIE J,GAO H,XIE W,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors [J].Information Sciences,2016,354(C):19-40.
[14]BOTEV Z I,GROTOWSKI J F,KROESE D P.Kernel density estimation via diffusion [J].Annals of Statistics,2010,38(5):2916-2957.
[15]FU L,MEDICO E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data [J].BMC Bioinformatics,2007,8(1):3.
[16]TSAPARAS P,MANNILA H,GIONIS A.Clustering aggregation [J].ACM Transactions on Knowledge Discovery from Data,2007,1(1):4.
[17]CHANG H,YEUNG D Y.Robust path-based spectral cluste- ring [J].Pattern Recognition,2008,41(1):191-203.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[4] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[5] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[6] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[7] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[8] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[9] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[10] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[11] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[12] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[13] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[14] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[15] 蒲实, 赵卫东.
一种面向动态科研网络的社区检测算法
Community Detection Algorithm for Dynamic Academic Network
计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!