计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 457-460.

• 大数据与数据挖掘 • 上一篇    下一篇

基于网格数据中心的密度峰值聚类算法

李晓光, 邵超   

  1. 河南财经政法大学计算机与信息工程学院 郑州450046
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 邵 超(1977-),男,博士,教授,CCF会员,主要研究方向为机器学习,E-mail:sc_flying@163.com(通信作者)。
  • 作者简介:李晓光(1994-),男,硕士生,主要研究方向为机器学习;
  • 基金资助:
    本文受国家自然科学基金资助项目(61202285,61502146)资助。

Density Peak Clustering Algorithm Based on Grid Data Center

LI Xiao-guang, SHAO Chao   

  1. School of Computer & Information Engineering,Henan University of Economics and Law,Zhengzhou 450046,China
  • Online:2019-06-14 Published:2019-07-02

摘要: 通过对数据集进行网格划分来降低聚类过程中的计算复杂度,提出了一种基于网格数据中心的密度峰值聚类算法。首先将数据集进行网格化,形成若干网格对象,以落在网格内的数据点个数与通过衰减后的相邻网格内数据点个数之和作为该网格对象的局部密度值,以该网格数据中心到更高密度网格数据中心的最近距离作为该网格对象的相对距离值;然后根据簇心网格对象同时具备更高的局部密度和较大的相对距离的特征,确定簇心网格对象;最后通过密度划分的方法完成数据聚类。在UCI人工数据集上的仿真实验表明,该算法能够在较短的时间内有效地处理大规模数据,聚类准确率较高。

关键词: 聚类, 决策图, 密度峰值, 数据中心, 网格

Abstract: A density peak clustering algorithm based on the grid data center was proposed.The computational complexity of the clustering process is reduced by meshing the dataset.Firstly,the dataset space is divided into grids with the same size,the density value of each grid is composed of the number of data objects that are contained in the grid and the decayed number of the data objects in its adjacent grids,and the distance value of each grid is defined as the nearest distance from its data center to the data center of another grid which has a higher density.Then,the cluster center grids are found since these grids always have high density value and large distance value.Finally,a density-based division approach is used to complete the duty of clustering.The simulation experiments performed on UCI artificial data set show that this algorithm can effectively cluster large-scale data with high clustering accuracy in a short period of time.

Key words: Clustering, Data center, Decision graph, Density peak, Grid

中图分类号: 

  • TP181
[1]Online Computer Library Center,Inc.History of OCLC [EB/OL].[2016-11-08].https://zhuanlan.zhihu.com/p/22452157.
[2]YANG H.Data mining:Concepts and techniques[J].San Francisco,2001,29(S1):1-18.
[3]王骏,王士同,邓赵红.聚类分析研究中的若干问题[J].控制与决策,2012,27(3):321-328.
[4]Online Computer Library Center,Inc.History of OCLC[EB/ OL].[2017-06-05].https://www.jianshu.com/p/9b53cd7eb28d.
[5]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[6]何熊熊,管俊轶,叶宣佐,等.一种基于密度和网格的簇心可确定聚类算法[J].控制与决策,2017,32(5):913-919.
[7]戴娇,张明新,郑金龙,等.基于密度峰值的快速聚类算法优化[J].计算机工程与设计,2016,37(11):2979-2984.
[8]张素洁,赵怀慈.最优聚类个数和初始聚类中心点选取算法研究[J].计算机应用研究,2017,34(6):1617-1620.
[9]夏庆亚.基于密度峰值和网格的自动选定聚类中心算法[J].计算机科学,2017,44(11):403-406.
[10]DU M J,DING S F,JIA H J.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99(2):135-145.
[11]YANG W,WANG T,LI J D.Clustering parameter selection algorithm based on density for divisional clustering process[J].Control andDecision,2016,31(1):21-29.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3] 潘志勇, 程宝雷, 樊建席, 卞庆荣.
数据中心网络BCDC上的顶点独立生成树构造算法
Algorithm to Construct Node-independent Spanning Trees in Data Center Network BCDC
计算机科学, 2022, 49(7): 287-296. https://doi.org/10.11896/jsjkx.210500170
[4] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[5] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[6] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[7] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[8] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[9] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[10] 叶跃进, 李芳, 陈德训, 郭恒, 陈鑫.
基于国产众核架构的非结构网格分区块重构预处理算法研究
Study on Preprocessing Algorithm for Partition Reconnection of Unstructured-grid Based on Domestic Many-core Architecture
计算机科学, 2022, 49(6): 73-80. https://doi.org/10.11896/jsjkx.210900045
[11] 陈鑫, 李芳, 丁海昕, 孙唯哲, 刘鑫, 陈德训, 叶跃进, 何香.
面向国产异构众核架构的CFD非结构网格计算并行优化方法
Parallel Optimization Method of Unstructured-grid Computing in CFD for DomesticHeterogeneous Many-core Architecture
计算机科学, 2022, 49(6): 99-107. https://doi.org/10.11896/jsjkx.210400157
[12] 封雷, 朱登明, 李兆歆, 王兆其.
一种基于遮罩的稀疏点云滤波算法
Sparse Point Cloud Filtering Algorithm Based on Mask
计算机科学, 2022, 49(5): 25-32. https://doi.org/10.11896/jsjkx.210600129
[13] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[14] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[15] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!