计算机科学 ›› 2015, Vol. 42 ›› Issue (12): 247-250.

• 信息安全 • 上一篇    下一篇

一种大数据环境下的新聚类算法

李斌,王劲松,黄玮   

  1. 天津理工大学计算机与通信工程学院 天津300384计算机病毒防治技术国家工程实验室 天津300457天津理工大学智能计算及软件新技术天津市重点实验室 天津300191,天津理工大学计算机与通信工程学院 天津300384计算机病毒防治技术国家工程实验室 天津300457天津理工大学智能计算及软件新技术天津市重点实验室 天津300191,天津理工大学计算机与通信工程学院 天津300384计算机病毒防治技术国家工程实验室 天津300457天津理工大学智能计算及软件新技术天津市重点实验室 天津300191
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61272450),天津市科技支撑项目(14ZCZDGX00072)资助

Novel Global Kmeans Clustering Algorithm for Big Data

LI Bin, WANG Jin-song and HUANG Wei   

  • Online:2018-11-14 Published:2018-11-14

摘要: 提出了一种新的聚类算法NGKCA,该算法克服了经典聚类算法检测率和稳定性的不足,适用于解决大数据环境下的聚类问题。NGKCA聚类算法包括4个阶段:首先利用谱聚类NJW算法对大数据集进行列降维和数据归一化处理,其次引入对初始值不敏感的粒子群算法对数据集进行行降维从而选出临时的聚类中心集,接着通过全局Kmeans算法对最佳聚类中心集进行聚类以获取聚类中心点,最后使用粒子群算法对聚类中心点进行调整进而获取最终的聚类划分。在一些著名的机器学习数据集和国际标准的网络安全数据集KDDCUP99上进行实验,结果表明:提出的算法比谱聚类、Kmeans、粒子群、全局Kmeans等常见算法具有更好的稳定性和更高的检测率,与全局Kmeans算法相比具有更优的时间复杂度。

关键词: 全局Kmeans,谱聚类,粒子群优化,聚类,KDDCUP99

Abstract: The clustering method for big data has attracted lots of interest in recent years.This paper proposed a novel global k-means clustering algorithm (NGKCA).The proposed clustering method comprises four phrases,namely row dimension reduction phrase,line dimension reduction phrase,global k-means clustering phrase and the adjustment of clustering center point.The row dimension reduction phrase is realized by means of spectral clustering method,while the line dimension reduction phrase is realized with the aid of particle swarm optimization.Both the row dimension reduction phrase and the line dimension reduction phrase are completed,and then the global k-means clustering phrase and the PSO phrase proceed.The experiments were carried out on some well-known machine learning data set and a standard network security data set KDDCUP99.Experimental results show that the proposed NGKCA leads to superior perfor-mance in comparison with some common algorithms reported in the literature and the time complexity of the NGKCA is better than the algorithm of global k-means.

Key words: Global Kmeans,Spectral clustering,PSO,Clustering,KDDCUP99

[1] Li M J,Ng M K,et al.Agglomerative fuzzy K-means clustering algorithm with selection of number of clusters[J].IEEE Transactions on Knowledge and Data Engineering,2008,20(11):1519-1534
[2] Tou J T,Gonzalez R C.Pattern recognition principle [M].Addison Wesley,1974
[3] 姜大庆,夏士雄,周勇.基于半监督自动谱聚类算法的网络故障检测[J].计算机工程与应用,2012,8(30):89-94 Jiang Da-qing,Xia Shi-xiong,Zhou Yong.Network fault detection based on semi-supervised automatic spectral clustering algorithm[J].Computer Engineering and Applications,2012,8(30):89-94
[4] 周文刚,陈雷霆,董仕.基于谱聚类的网络流量分类识别算法[J].电子测量与仪器学报,2013,7(12):1114-1119 Zhou Wen-gang,Chen Lei-ting,Dong Shi.Network traffic classification algorithm based on spectral clustering[J].Journal of Electronic Measurement and Instrument,2013,7(12):1114-1119
[5] 刘婧明,韩丽川,侯丽文.基于粒子群的K均值聚类算法[J].系统工程理论与实践,2005(6):54-58 Liu Jing-ming,Han Li-chuan,Hou Li-wen.Cluster Analysis Based on Particle Swarm Optimization Algorithm[J].Systems Engineering--Theory & Practice,2005,5(6):54-58
[6] 张宇,吴昊,陈怀新.一种新的基于粒子群密度的聚类算法[J].电讯技术,2008,8(8):17-21 Zhang Yu,Wu Hao,Chen Huai-xin.A Novel Particle Swarm Optimization Clustering Algorithm Based on Density[J].Telecommunication Engineering,2008,8(8):17-21
[7] 夏奇,郝顺义,董淼,等.新的改进K均值粒子群算法在组合导航的应用[J].计算机应用,2014,4(5):1397-1399,1412 Xia Qi,Hao Shun-yi,Dong Miao,et al.Application of novel K-means particle swarm optimization algorithm in integrated navigation[J].Journal of Computer Applications,2014,4(5):1397-1399,1412
[8] 施培蓓,郭玉堂,胡玉娟,等.初始化独立的谱聚类算法[J].计算机工程与应用,2010,6(25):134-137 Shi Pei-bei,Guo Yu-tang,Hu Yu-juan,et al.Initialization independent spectral clustering algorithm[J].Computer Enginee-ring and Applications,2010,6(25):134-137
[9] 谢皝,张平伟,罗晟.基于全局K-means的谱聚类算法[J].计算机应用,2010,0(7):1936-1937,1940 Xie Huang,Zhang Ping-wei,Luo Sheng.Spectral clustering based on global K-means[J].Journal of Computer Applications,2010,0(7):1936-1937,1940
[10] Ng A Y,Jordan M I,Weiss Y.On spectral clustering:Analysis and an algorithm[C]∥Advances in Neural Information Proces-sing Systems.Cambridge,MA:MIT Press,2001:856-897

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!