计算机科学 ›› 2019, Vol. 46 ›› Issue (7): 211-216.doi: 10.11896/j.issn.1002-137X.2019.07.032

• 人工智能 • 上一篇    下一篇

改进K均值聚类的海洋数据异常检测算法研究

蒋华,武尧,王鑫,王慧娇   

  1. (桂林电子科技大学计算机与信息安全学院 广西 桂林541004)
  • 收稿日期:2018-06-06 出版日期:2019-07-15 发布日期:2019-07-15
  • 作者简介:蒋 华(1963-),男,博士,教授,主要研究方向为信息安全;武 尧(1994-),男,硕士,主要研究方向为信息安全、数据挖掘、海洋大数据,E-mail:907149625@qq.com(通信作者);王 鑫(1976-),男,硕士,副教授,主要研究方向为无线传感网络;王慧娇(1976-),女,硕士,副教授,主要研究方向为无线传感器网络。
  • 基金资助:
    广西科技重大专项(AA18118025),桂林电子科技大学研究生教育创新计划项目(2017YJCX48)资助

Study on Ocean Data Anomaly Detection Algorithm Based on Improved K-means Clustering

JIANG Hua,WU Yao,WANG Xin,WANG Hui-jiao   

  1. (College of Computer Science and Information Security,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China)
  • Received:2018-06-06 Online:2019-07-15 Published:2019-07-15

摘要: 针对海洋Argo浮标监测数据中的异常数据挖掘问题,在改进K均值算法的基础上,提出基于距离为准则进行海洋异常数据判定的异常检测算法。该算法重新定义海洋数据邻近度,并根据数据的规模以及分布情况,区块化、自适应地筛选备选初始聚类中心;在算法迭代过程中,运用簇内,数据对象相对于聚类中心的距离均值,全局考量类簇内,符合异常特征的数据对象进行异常检测。通过仿真数据集和真实数据集分别进行实验验证,对比结果表明:该算法在聚类性能以及异常检测方面都优于对比算法。

关键词: Argo浮标数据, K-means算法, 邻近度, 区块化, 异常检测

Abstract: Aiming at the problem of abnormal data mining in marine Argo buoy monitoring data,an anomaly detection algorithm based on distance criterion was proposed based on the improved K-means algorithm.The algorithm redefines the proximity of ocean data,blocks according to the size and distribution of the data,and adaptively selects alternative initial clustering centers.In the iterative process of the algorithm,using the distance mean of the data objects in the cluster relative to the clustering center,the global consideration is given to the data objects in the cluster according with the abnormal features to detect the anomalies.The simulation dataset and the real dataset are verified by experiments,and the comparison results show that it is superior to the contrast algorithm in clustering performance and anomaly detection.

Key words: Anomaly detection, Argo buoy data, Block, K-means algorithm, Proximity

中图分类号: 

  • TP391
[1]LIU Z H,WU X F,XU J P,et al.Argoocean observations in China for 15 years [J].Progressin Geoscience,2016,31(5):445-460.(in Chinese)
刘增宏,吴晓芬,许建平,等.中国Argo海洋观测十五年[J].地球科学进展,2016,31(5):445-460.
[2]DING J,WANG L,SHEN D,et al.An Anomaly Detection System on Big Data[J].Natural Science Journal of Hainan University,2015,33(1):24-27.
[3]WANG H Z,ZHANG R,WANG G H,et al.Quality Control Technology of temperature and Salt profile observation data of Argo buoy [J].Journal of Geophysics,2012,55(2):577-588.(in Chinese)
王辉赞,张韧,王桂华,等.Argo浮标温盐剖面观测资料的质量控制技术[J].地球物理学报,2012,55(2):577-588.
[4]SHAOLEI L U,HONG L I,LIU Z.Improvement of Argo salini- ty data delayed-mode quality control Method[J].Journal of Pla University of Science&Technology,2014,15(6):598-606.
[5]TZORTZIS G,LIKAS A,TZORTZIS G.The MinMaxk-Means clustering algorithm[J].Pattern Recognition,2014,47(7):2505-2516.
[6]CHEN G P,WANG W P,HUANG J,et al.Improved initial clustering center selection method for k-means algorithm [ J ].Journal of Chinese Computer Systems,2012,33(6):1320-1323.
[7]XING C Z,GU H.K-means algorithm for optimizing initial clustering centers based on average density [J].Computer Enginee-ring and Application,2014,50(20):135-138.(in Chinese)
邢长征,谷浩.基于平均密度优化初始聚类中心的k-means算法[J].计算机工程与应用,2014,50(20):135-138.
[8]CELEBI M E,KINGRAVI H A,VELA P A.A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm[J].Expert Systems with Applications,2013,40(1):200-210.
[9]HAN Z J.Adaptive K-means initialization method based on data denseness [J].Computer Applications and Software,2014,31(2):182-187.(in Chinese)
韩最蛟.基于数据密集性的自适应K均值初始化方法[J].计算机应用与软件,2014,31(2):182-187.
[10]ZUO J,CHEN Z M.Anomaly detection algorithm based on improved K-means clustering [J].Computer Science,2016,43(8):258-261.(in Chinese)
左进,陈泽茂.基于改进K均值聚类的异常检测算法[J].计算机科学,2016,43(8):258-261.
[11]CHEN G P,WANG W P,HUANG J.An improved K-means algorithm for initial clustering Center selection [J].Minicomputer System,2012,33(6):170-173.(in Chinese)
陈光平,王文鹏,黄俊.一种改进初始聚类中心选择的K-means算法[J].小型微型计算机系统,2012,33(6):170-173.
[12]HAN C,YUAN Y S,MEI T,et al.Outlier Detection algorithm based on K-means [J].Computer Engineering and Application,2017,53(3):58-63.(in Chinese)
韩崇,袁颖珊,梅焘,等.基于K-means的数据流离群点检测算法[J].计算机工程与应用,2017,53(3):58-63.
[13]SAMRIN R,VASUMATHI D.Hybrid Weighted K-Means Clustering and Artificial Neural Network for an Anomaly-Based Network Intrusion Detection System[J].Journal of Intelligent Systems,2016,27(2):135-147.
[14]SHEN G.Improved k-means initialization method based on data density[J].Computer Engineering & Applications,2014,51(11):139-144.
[15]TZORTZIS G,LIKAS A,TZORTZIS G.The MinMax k-Meansclustering algorithm[J].Pattern Recognition,2014,47(7):2505-2516.
[1] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[2] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 杜航原, 李铎, 王文剑.
一种面向电商网络的异常用户检测方法
Method for Abnormal Users Detection Oriented to E-commerce Network
计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092
[5] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[6] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[7] 刘意, 毛莺池, 程杨堃, 高建, 王龙宝.
基于邻域一致性的异常检测序列集成方法
Locality and Consistency Based Sequential Ensemble Method for Outlier Detection
计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156
[8] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[9] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[10] 邢红杰, 郝忠.
基于全局和局部判别对抗自编码器的异常检测方法
Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder
计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083
[11] 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀.
基于人体关节点的低头异常行人检测
Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points
计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214
[12] 刘立成, 徐一凡, 谢贵才, 段磊.
面向NoSQL数据库的JSON文档异常检测与语义消歧模型
Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database
计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039
[13] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[14] 石琳姗, 马创, 杨云, 靳敏.
基于SSC-BP神经网络的异常检测算法
Anomaly Detection Algorithm Based on SSC-BP Neural Network
计算机科学, 2021, 48(12): 357-363. https://doi.org/10.11896/jsjkx.201000086
[15] 杨月麟, 毕宗泽.
基于深度学习的网络流量异常检测
Network Anomaly Detection Based on Deep Learning
计算机科学, 2021, 48(11A): 540-546. https://doi.org/10.11896/jsjkx.201200077
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!