计算机科学 ›› 2018, Vol. 45 ›› Issue (2): 287-290.doi: 10.11896/j.issn.1002-137X.2018.02.049

• 人工智能 • 上一篇    下一篇

云计算环境下高复杂度动态数据的增量密度快速聚类算法研究

陈赣浪,颜飞龙,潘家辉   

  1. 华南师范大学软件学院 广东 南海528225,华南师范大学教育科学学院 广州510631,华南师范大学软件学院 广东 南海528225
  • 出版日期:2018-02-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金青年科学基金项目(61503143),广东省自然科学基金博士科研启动项目(2014A030310244)资助

Study on Fast Incremental Clustering Algorithm for High Complexity Dynamic Data in Cloud Computing Environment

CHEN Gan-lang, YAN Fei-long and PAN Jia-hui   

  • Online:2018-02-15 Published:2018-11-13

摘要: 针对传统的聚类算法存在开销大、聚类质量差、聚类速度慢等问题,提出一种新的云计算环境下高复杂度动态数据的增量密度快速聚类算法。首先,依据密度对云计算环境下高复杂度动态数据进行聚类,从数据空间中找到部分子空间,使得数据映射至该空间后可产生高密度点集区域,将连通区域的集合看作聚类结果;其次,通过DBSCAN算法进行增量聚类,并对插入或删除数据导致的原聚类合并或分裂进行研究;最后,在更新的过程中通过改变核心状态数据的邻域中含有的全部核心数据进行处理,从插入或删除数据两方面进行增量聚类分析。实验结果表明,所提算法开销低、聚类速度快、聚类质量高。

关键词: 云计算环境,高复杂度,动态数据,增量密度,快速聚类

Abstract: In order to solve the problems that the traditional clustering algorithm has the disadvantages of high cost,poor clustering quality and slow clustering speed,this paper proposed a new fast clustering algorithm based on incremental density of high complexity dynamic data in cloud computing environment.First of all,on the basis of density under the environment of high complexity of dynamic data clustering in cloud computing,this algorithm finds some subspace from the data space.The data mapped to the space area can produce high density point set,and the set of connec-ted regions is regarded as the clustering results.Secondly,it executes incremental clustering by DBSCAN algorithm, and studies the original clustering merger or split caused by inserting or deleting data.Finally,by dealing with all the core data in the neighborhood of changing the core status in the process of updating,the incremental clustering is analyzed from two aspects of inserting or deleting data.The experimental results show that the proposed algorithm has the cha-racteristics of low cost,fast clustering speed and high clustering quality.

Key words: Cloud computing environment,High complexity,Dynamic data,Incremental density,Fast clustering

[1] WU X X,NI Z W,NI L P.Research on fractal clustering ensemble algorithm based on cloud computing environment[J].Computer Engineering and Applications,2015,51(14):1-6.(in Chinese) 吴晓璇,倪志伟,倪丽萍.云计算环境下基于分形的聚类融合算法研究[J].计算机工程与应用,2015,1(14):1-6.
[2] WANG X,ZHOU X M.Research and Simulation on Big Data Reasonable Splitting Technology in Cloud Computing Environment[J].Computer Simulation,2016,33(3):292-295.(in Chinese) 王欣,周晓梅.云计算环境下大数据合理分流技术研究与仿真[J].计算机仿真,2016,3(3):292-295.
[3] SI F M.Research on incremental K-means clustering algorithm based on the density[J].Journal of Changchun Institute of Technology (Natural Science Edition),2016,21(2):114-117.(in Chinese) 司福明.一种基于密度的增量k-means聚类算法研究[J].长春工程学院学报(自然科学版),2016,1(2):114-117.
[4] XING C Z,YU B S.An existence-level uncertain data streamclustering algorithm[J].Computer Applications and Software,2015(4):252-255.(in Chinese) 邢长征,余彬生.一种存在级不确定数据流聚类算法[J].计算机应用与软件,2015(4):252-255.
[5] ZHAO L,CHEN Z K,ZHANG Q C,et al.Incomplete Data Imputation Algorithm Based on Distributed Subtractive Clustering[J].Journal of Chinese Computer Systems,2015,36(7):1409-1414.(in Chinese) 赵亮,陈志奎,张清辰,等.基于分布式减法聚类的不完整数据填充算法[J].小型微型计算机系统,2015,6(7):1409-1414.
[6] LIAN W W,FU L L,HUANG C.Simulation on Weak Association Mining Model of Data in Cloud Computing Environment[J].Computer Simulation,2015,32(4):359-362.(in Chinese) 廉文武,傅凌玲,黄潮.云计算环境下数据弱关联挖掘模型的仿真[J].计算机仿真,2015,32(4):359-362.
[7] XING Y F.MPI Parallel Algorithm for Heterogeneous Resource Scheduling Based on SOM and Particle Swarm Optimization[J].Computer Measurement & Control,2014,2(8):2523-2525.(in Chinese) 邢永峰.基于SOM和PSO的云计算异构资源聚类MPI并行算法[J].计算机测量与控制,2014,2(8):2523-2525.
[8] LIU J,GUO H S.K-means Cluster Center Optimization inCloud Calculation[J].Bulletin of Science and Technology,2015,1(10):100-102.(in Chinese) 柳静,郭红山.云计算中K-means聚类中心优化求解方法[J].科技通报,2015,1(10):100-102.
[9] YU H Y,FAN J L.Robust Digital Watermarking Based onRidgelet Transform and s-FCM[J].Science Technology and Engineering,2015,15(11):80-88.(in Chinese) 于海燕,范九伦.一种基于Ridgelet变换和抑制式FCM聚类的数字水印算法[J].科学技术与工程,2015,15(11):80-88.
[10] FAN T K.Research and implementation of user clustering based on MapReduce in cloud environment[J].Electronic Design Engineering,2016,24(10):35-37.(in Chinese) 樊同科.云环境下基于MapReduce的用户聚类研究与实现[J].电子设计工程,2016,24(10):35-37.
[11] XING C Z,WEN P.Uncertain data streams clustering algorithm based on grid density and force[J].Application Research of Computers,2015,32(1):98-101.(in Chinese) 邢长征,温培.基于网格密度和引力的不确定数据流聚类算法[J].计算机应用研究,2015,32(1):98-101.
[12] WU T,TAN G W.Real-time Data Loading of Dynamic Data Warehouse Using Index View Set[J].Computer Science,2016,43(6A):493-496.(in Chinese) 武彤,谭光炜.基于索引视图实现动态数据仓库的实时数据加载[J].计算机科学,2016,43(6A):493-496.
[13] XU F S,YAN L M,SHI K Q.Dynamic Data Intelligent Mining with Attributes Disjunctive Reduction and Expansion Characte-ristics[J].Computer Science,2015,42(5):215-220.(in Chinese) 徐凤生,闫立梅,史开泉.具有属性析取萎缩-扩张特征的动态数据智能挖掘[J].计算机科学,2015,42(5):215-220.
[14] LIU J L,CHENG C Y,CHEN Z,et al.Research on Cloud Data Management Model Based k-Means and Gridding Clustering[J].Journal of Chongqing University of Technology (Natural Scie-nce),2017,1(9):119-124.(in Chinese) 刘加伶,程春游,陈庄,等.基于k-Means和网格化聚类的云数据管理模型研究[J].重庆理工大学学报(自然科学),2017,1(9):119-124.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!