计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 258-261.doi: 10.11896/j.issn.1002-137X.2016.08.052

• 人工智能 • 上一篇    下一篇

基于改进K均值聚类的异常检测算法

左进,陈泽茂   

  1. 海军工程大学信息安全系 武汉430033,海军工程大学信息安全系 武汉430033
  • 出版日期:2018-12-01 发布日期:2018-12-01

Anomaly Detection Algorithm Based on Improved K-means Clustering

ZUO Jin and CHEN Ze-mao   

  • Online:2018-12-01 Published:2018-12-01

摘要: 通过改进传统K-means算法的初始聚类中心随机选取过程,提出了一种基于改进K均值聚类的异常检测算法。在选择初始聚类中心时,首先计算所有数据点的紧密性,排除离群点区域,在数据紧密的地方均匀选择K个初始中心,避免了随机性选择容易导致局部最优的缺陷。通过优化选取过程,使得算法在迭代前更加接近真实的聚类类簇中心,减少了迭代次数,提高了聚类质量和异常检测率。实验表明,改进算法在聚类性能和异常检测方面都明显优于原算法。

关键词: K均值,聚类,紧密性,异常检测

Abstract: After optimizing random selection process of the initial cluster centers,an anomaly detection algorithm based on improved K-means clustering was proposed.When the cluster centers are selected,the tightness of all data points is calculated, outliers region is removed,and then the K initial centers in dense regions of data are selected,which avoids that the random selection is easy to cause the defect of local optimum.By optimizing the selection process,the initial cluster centers are more closer to the real clusters centers before iteration of the algorithm,the numbers of iterations are reduced,and the quality of clustering and anomaly detection rate are improved.Experiments show that the improved algorithm is much better than the original algorithm in clustering performance and anomaly detection.

Key words: K-mean,Clustering,Tightness,Anomaly detection

[1] Yang Yu-zhou.Research and implementation of the clustering anomaly detection technology based on feature extraction[D].Chengdu:University of Electronic Science and Technology of China,2012(in Chinese) 杨宇舟.基于特征提取的聚类异常检测技术的研究与实现[D].成都:电子科技大学,2012
[2] Sun Na,Guo Yan-feng,Yao Yuan.Network data stream abnormal detection model based on SVM incremental learning method[J].Computer Engineering and Applications,2012,48(29):78-81(in Chinese) 孙娜,郭延锋,姚远.增量式SVM的数据流异常检测模型[J].计算机工程与应用,2012,48(29):78-81
[3] Luo Yong-jian.Research on Data Flow Anomaly Detection Algorithm Cluster-based[D].Harbin:Harbin Engineering University,2010(in Chinese) 骆永健.基于聚类的数据流异常检测算法的研究[D].哈尔滨:哈尔滨工程大学,2010
[4] Fu Ying-ding,Lan Ju-long.Kernel-based adaptation for affinity propagation clustering algorithm[J].Application Research of Computers,2012,29(5):1644-1650(in Chinese) 付迎丁,兰巨龙.基于核自适应的近邻传播聚类算法[J].计算机应用研究,2012,29(5):1644-1650
[5] Jiang Min,Pi De-chang,Sun Lan.Research on Density Clustering Algorithm with a Multiple Constraints[J].Computer Scie-nce,2011,38(10A):143-164(in Chinese) 江敏,皮德常,孙兰.一种多约束的密度聚类算法的研究[J].计算机科学,2011,38(10A):143-164
[6] Celeb M,Kingravi H,Vela P.A Comparative Study of Efficient Initialization Methods for the K-methods for the K-Means Clustering Algorithm [J].Expert Systems with Applications,2013,40(1):200-210
[7] Tzortzis G,Likas A.The minmax k-means clustering algorithm[J].Pattern Recognition,2011,44(4):866-876
[8] Jiang Da-yu.A fast and efficient parallel bisecting K-Means algorithm[D].Harbin:Harbin Engineering University,2013(in Chinese) 蒋大宇.快速有效的并行二分K均值算法[D].哈尔滨:哈尔滨工程大学,2013
[9] Zhu Jian-yu.Research and Application of K-means algorithm[D].Dalian:Dalian University of Technology,2013(in Chinese) 朱建宇.K均值算法研究及其应用[D].大连:大连理工大学,2013
[10] Han Zui-jiao.An Adaptive K-means initialization method based on data density[J].Computer Applications and Software,2014,1(2):182-187(in Chinese) 韩最蛟.基于数据密集性的自适应K均值初始化方法[J].计算机应用与软件,2014,31(2):182-187
[11] Macqueen J.Some methods for classification and analysis ofmultivariate observe[C]∥Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeyey:University of California Press,1967:281-297
[12] Asuncion A,Newman D.UCI Machine Learning Respository[EB/OL].[2015-06-01].http://archive.ics.uci.edu/ml/datasets.html

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!