计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 216-226.doi: 10.11896/jsjkx.241200044

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于变密度的自适应数据流的异常检测算法

唐承海1, 杨雨晴1, 杨海峰1, 蔡江辉2, 周立婵1   

  1. 1 太原科技大学计算机科学与技术学院 太原 030024
    2 中北大学计算机科学与技术学院 太原 030051
  • 收稿日期:2024-12-06 修回日期:2025-03-13 发布日期:2026-02-10
  • 通讯作者: 杨雨晴(2022066@tyust.edu.cn)
  • 作者简介:(19940522783@163.com)
  • 基金资助:
    国家自然科学基金青年项目(62402332);山西省青年科学基金(202303021212223)

Adaptive Data Stream Anomaly Detection Algorithm Based on Variable Density

TANG Chenghai1, YANG Yuqing1, YANG Haifeng1, CAI Jianghui2, ZHOU Lichan1   

  1. 1 School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China
    2 School of Computer Science and Technology,North University of China,Taiyuan 030051,China
  • Received:2024-12-06 Revised:2025-03-13 Online:2026-02-10
  • About author:TANG Chenghai,born in 2000,postgraduate.His main research interests include data mining and machine lear-ning.
    YANG Yuqing,born in 1992,Ph.D,lecturer.Her main research interests include data mining and applications,intelligent optimization and decision support.
  • Supported by:
    National Natural Science Foundation of China(62402332) and Shanxi Provincial Youth Science Foundation(202303021212223).

摘要: 数据流是一类具有高生成率、动态分布特性的数据,其异常检测旨在从这一类数据中发现偏离预期行为的数据流,从而为医疗、工业生产、金融等诸多领域的决策提供支持。现有数据流异常检测方法普遍面临参数敏感性高、时空开销大、阈值选取难等问题。为了解决上述问题,提出一种基于变密度的自适应数据流的异常检测方法。首先定义了可变局部离群因子(Va-riable Local Outlier Factor,VLOF),VLOF通过对比数据点在并行的不同k值的邻域窗口下,其局部可达密度和局部异常因子的变化情况,度量数据点的密度分布,降低单一k近邻密度度量导致的结果不准确。其次,计算VLOF与k值的相对增长率和绝对均值率,以反映数据流的动态变化趋势,并将适应这种动态变化趋势的数据点定义为核心点,通过核心点加快对后续正常点的判断。最后,将相对增长率和绝对均值率作为数据点理论分布的度量指标,计算理论分布和新数据点实际分布的差异,从而自适应地将偏离理论分布的点识别为异常。为了验证提出算法的有效性,在多个UCI数据集和真实数据集下与8个算法进行对比实验,实验结果表明:与基线模型相比,所提方法在精确率、召回率、F1性能指标上表现良好,且时间和空间效率也有相应提升。

关键词: 数据流异常检测, 变密度, 可变局部离群因子, 核心点, 自适应阈值

Abstract: Data stream is a kind of data with high generation rate and dynamic distribution characteristics.Its anomaly detection aims to find the data stream deviating from the expected behavior from this kind of data,so as to provide support for decision-making in many fields such as medical treatment,industrial production and finance.The existing data stream anomaly detection methods generally face the problems of high parameter sensitivity,high time and space overhead,and difficult threshold selection.In order to solve the above problems,this paper proposes an anomaly detection method based on variable density adaptive data stream.Firstly,VLOF is defined.VLOF measures the density distribution of data points by comparing their local reachable density and local anomaly factor changes under parallel neighborhood windows with different k values,and reduces the impact of inaccurate results caused by a single neighbor density measurement.Secondly,according to the relative growth rate and absolute mean rate of VLOF and k value,the dynamic change trend of data stream is reflected,and the data point adapted to this dynamic change trend is defined as the core point,and the judgment of subsequent normal points is accelerated through the core point.Finally,the relative growth rate and absolute mean rate are used as the measurement indicators of the theoretical distribution of data points,and the difference between the theoretical distribution and the actual distribution of new data points is calculated,so that the points deviating from the theoretical distribution can be identified as anomalies.In order to verify the effectiveness of the proposed algorithm,a comparison experiments are conducted with 8 algorithms under multiple UCI datasets and real datasets.The experimental results show that compared with the baseline models,the proposed method performs well in accuracy rate,recall rate and F1 performance indicators,and correspondingly improves time and space efficiency.

Key words: Data stream anomaly detection, Variable density, Variable local outlier factor, Core point, Adaptive threshold

中图分类号: 

  • TP311
[1]BHATIA S,JAIN A,LI P,et al.Mstream:Fast anomaly detection in multi-aspect streams[C]//Proceedings of the Web Conference 2021.2021:3371-3382.
[2]KORYCKI Ł,CANO A,KRAWCZYK B.Active learning withabstaining classifiers for imbalanced drifting data streams[C]//2019 IEEE International Conference on Big Data.IEEE,2019:2334-2343.
[3]ZUBAROĞLU A,ATALAY V.Data stream clustering:a re-view[J].Artificial Intelligence Review,2021,54(2):1201-1236.
[4]KONG L C,LIU G Z.Review of Outlier Detection Algorithms[J].Computer Science,2024,51(8):20-33.
[5]CAI S,LI S,YUAN G,et al.MiFI-Outlier:Minimal infrequent itemset-based outlier detection approach on uncertain data stream[J].Knowledge-Based Systems,2020,191:105268.
[6]ZHANG L,LIN J,KARIM R.Sliding window-based fault detection from high-dimensional data streams[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2016,47(2):289-303.
[7]DOSHI K,YILMAZ Y.Continual learning for anomaly detec-tion in surveillance videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:254-255.
[8]WANG H,BAH M J,HAMMAD M.Progress in outlier detection techniques:A survey[J].IEEE Access,2019,7:107964-108000.
[9]ELTANBOULY S,BASHENDY M,ALNAIMI N,et al.Ma-chine learning techniques for network anomaly detection:A survey[C]//2020 IEEE International Conference on Informatics,IoT,and Enabling Technologies(ICIoT).IEEE,2020:156-162.
[10]TAHA A,HADI A S.Anomaly detection methods for categorical data:A review[J].ACM Computing Surveys,2019,52(2):1-35.
[11]NA G S,KIM D,YU H.Dilof:Effective and memory efficient local outlier detection in data streams[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1993-2002.
[12]CHEN L,WANG W,YANG Y.CELOF:Effective and fastmemory efficient local outlier detection in high-dimensional data streams[J].Applied Soft Computing,2021,102:107079.
[13]WANG W,REN Y,ZHOU R,et al.An Outlier Detection Algorithm Based on Probability Density Clustering[J].International Journal of Data Warehousing and Mining,2023,19(1):1-20.
[14]ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018.
[15]WANG L,CHEN S,CHEN F,et al.B-Detection:Runtime Reliability Anomaly Detection for MEC Services With Boosting LSTM Autoencoder[J].IEEE Transactions on Mobile Computing,2023,23(4):2599-2613.
[16]ZHU L Q,ZHANG T,LYU Z H,et al.Application Performance Anomaly Detection Based on LSTM Prediction Model[J].Computer Simulation,2024,41(5):536-542.
[17]DIN S U,SHAO J,KUMAR J,et al.Data stream classification with novel class detection:a review,comparison and challenges[J].Knowledge and Information Systems,2021,63:2231-2276.
[18]AGRAHARI S,SINGH A K.Concept drift detection in data stream mining:A literature review[J].Journal of King Saud University-Computer and Information Sciences,2022,34(10):9523-9540.
[19]POKRAJAC D,LAZAREVIC A,LATECKI L J.Incrementallocal outlier detection for data streams[C]//2007 IEEE Symposium on Computational Intelligence and Data Mining.IEEE,2007:504-515.
[20]SALEHI M,LECKIE C,BEZDEK J C,et al.Fast memory efficient local outlier detection in data streams[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(12):3246-3260.
[21]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory,1967,13(1):21-27.
[22]TANG D,ZHANG S,CHEN J,et al.The detection of low-rate DoS attacks using the SADBSCAN algorithm[J].Information Sciences,2021,565:229-247.
[23]NITHINSHA S,ANUSUYA S.Designing framework to secure data using K Means clustering based outlier Detection(KCOD) algorithm[J].Journal of Intelligent & Fuzzy Systems,2023,44(1):1057-1068.
[24]TAX D M J,DUIN R P W.Support vector data description[J].Machine Learning,2004,54:45-66.
[25]ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018.
[26]MUNIR M,SIDDIQUI S A,DENGEL A,et al.DeepAnT:Adeep learning approach for unsupervised anomaly detection in time series[J].IEEE Access,2018,7:1991-2005.
[27]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[28]PENG B,LI Y D,GONG X F.Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder[J].Computer Science,2024,51(6A):230700070-5.
[29]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative Adversarial Nets[C]//Proceedings of the 28th International Conference on Neural Information Processing System.2014:2672-2680.
[30]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need.Advances in neural information processing systems[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[31]HE J,ZHAO L,YANG H,et al.HSI-BERT:Hyperspectralimage classification using the bidirectional encoder representation from transformers[J].IEEE Transactions on Geoscience and Remote Sensing,2019,58(1):165-178.
[32]YOON S,LEE Y,LEE J G,et al.Adaptive model pooling for online deep anomaly detection from a complex evolving data stream[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:2347-2357.
[33]YUAN Y,ADHATARAO S S,LIN M,et al.Ada:Adaptivedeep log anomaly detector[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications.IEEE,2020:2449-2458.
[34]LI D,CHEN D,JIN B H,et al.MAD-GAN:Multivariate anomaly detection for time series data with generative adversarial networks[C]//LNTCS.2019:703-716.
[35]BASHAR M A,NAYAK R.TAnoGAN:Time series anomalydetection with generative adversarial networks[C]//2020 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2020:1778-1785.
[36]LU T,WANG L,ZHAO X.Review of anomaly detection algorithms for data streams[J].Applied Sciences,2023,13(10):6353.
[37]LI S,ZHANG K,DUAN P,et al.Hyperspectral anomaly detection with kernel isolation forest[J].IEEE Transactions on Geoscience and Remote Sensing,2019,58(1):319-329.
[38]SUN Y,QIN W,ZHUANG Z,et al.An adaptive fault detection and root-cause analysis scheme for complex industrial processes using moving window KPCA and information geometric causal inference[J].Journal of Intelligent Manufacturing,2021,32:2007-2021.
[39]MACIAG P S,KRYSZKIEWICZ M,BEMBENIK R,et al.Unsupervised anomaly detection in stream data with online evolving spiking neural networks[J].Neural Networks,2021,139:118-139.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!