Computer Science ›› 2026, Vol. 53 ›› Issue (2): 216-226.doi: 10.11896/jsjkx.241200044

• Database & Big Data & Data Science • Previous Articles     Next Articles

Adaptive Data Stream Anomaly Detection Algorithm Based on Variable Density

TANG Chenghai1, YANG Yuqing1, YANG Haifeng1, CAI Jianghui2, ZHOU Lichan1   

  1. 1 School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China
    2 School of Computer Science and Technology,North University of China,Taiyuan 030051,China
  • Received:2024-12-06 Revised:2025-03-13 Published:2026-02-10
  • About author:TANG Chenghai,born in 2000,postgraduate.His main research interests include data mining and machine lear-ning.
    YANG Yuqing,born in 1992,Ph.D,lecturer.Her main research interests include data mining and applications,intelligent optimization and decision support.
  • Supported by:
    National Natural Science Foundation of China(62402332) and Shanxi Provincial Youth Science Foundation(202303021212223).

Abstract: Data stream is a kind of data with high generation rate and dynamic distribution characteristics.Its anomaly detection aims to find the data stream deviating from the expected behavior from this kind of data,so as to provide support for decision-making in many fields such as medical treatment,industrial production and finance.The existing data stream anomaly detection methods generally face the problems of high parameter sensitivity,high time and space overhead,and difficult threshold selection.In order to solve the above problems,this paper proposes an anomaly detection method based on variable density adaptive data stream.Firstly,VLOF is defined.VLOF measures the density distribution of data points by comparing their local reachable density and local anomaly factor changes under parallel neighborhood windows with different k values,and reduces the impact of inaccurate results caused by a single neighbor density measurement.Secondly,according to the relative growth rate and absolute mean rate of VLOF and k value,the dynamic change trend of data stream is reflected,and the data point adapted to this dynamic change trend is defined as the core point,and the judgment of subsequent normal points is accelerated through the core point.Finally,the relative growth rate and absolute mean rate are used as the measurement indicators of the theoretical distribution of data points,and the difference between the theoretical distribution and the actual distribution of new data points is calculated,so that the points deviating from the theoretical distribution can be identified as anomalies.In order to verify the effectiveness of the proposed algorithm,a comparison experiments are conducted with 8 algorithms under multiple UCI datasets and real datasets.The experimental results show that compared with the baseline models,the proposed method performs well in accuracy rate,recall rate and F1 performance indicators,and correspondingly improves time and space efficiency.

Key words: Data stream anomaly detection, Variable density, Variable local outlier factor, Core point, Adaptive threshold

CLC Number: 

  • TP311
[1]BHATIA S,JAIN A,LI P,et al.Mstream:Fast anomaly detection in multi-aspect streams[C]//Proceedings of the Web Conference 2021.2021:3371-3382.
[2]KORYCKI Ł,CANO A,KRAWCZYK B.Active learning withabstaining classifiers for imbalanced drifting data streams[C]//2019 IEEE International Conference on Big Data.IEEE,2019:2334-2343.
[3]ZUBAROĞLU A,ATALAY V.Data stream clustering:a re-view[J].Artificial Intelligence Review,2021,54(2):1201-1236.
[4]KONG L C,LIU G Z.Review of Outlier Detection Algorithms[J].Computer Science,2024,51(8):20-33.
[5]CAI S,LI S,YUAN G,et al.MiFI-Outlier:Minimal infrequent itemset-based outlier detection approach on uncertain data stream[J].Knowledge-Based Systems,2020,191:105268.
[6]ZHANG L,LIN J,KARIM R.Sliding window-based fault detection from high-dimensional data streams[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2016,47(2):289-303.
[7]DOSHI K,YILMAZ Y.Continual learning for anomaly detec-tion in surveillance videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:254-255.
[8]WANG H,BAH M J,HAMMAD M.Progress in outlier detection techniques:A survey[J].IEEE Access,2019,7:107964-108000.
[9]ELTANBOULY S,BASHENDY M,ALNAIMI N,et al.Ma-chine learning techniques for network anomaly detection:A survey[C]//2020 IEEE International Conference on Informatics,IoT,and Enabling Technologies(ICIoT).IEEE,2020:156-162.
[10]TAHA A,HADI A S.Anomaly detection methods for categorical data:A review[J].ACM Computing Surveys,2019,52(2):1-35.
[11]NA G S,KIM D,YU H.Dilof:Effective and memory efficient local outlier detection in data streams[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2018:1993-2002.
[12]CHEN L,WANG W,YANG Y.CELOF:Effective and fastmemory efficient local outlier detection in high-dimensional data streams[J].Applied Soft Computing,2021,102:107079.
[13]WANG W,REN Y,ZHOU R,et al.An Outlier Detection Algorithm Based on Probability Density Clustering[J].International Journal of Data Warehousing and Mining,2023,19(1):1-20.
[14]ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018.
[15]WANG L,CHEN S,CHEN F,et al.B-Detection:Runtime Reliability Anomaly Detection for MEC Services With Boosting LSTM Autoencoder[J].IEEE Transactions on Mobile Computing,2023,23(4):2599-2613.
[16]ZHU L Q,ZHANG T,LYU Z H,et al.Application Performance Anomaly Detection Based on LSTM Prediction Model[J].Computer Simulation,2024,41(5):536-542.
[17]DIN S U,SHAO J,KUMAR J,et al.Data stream classification with novel class detection:a review,comparison and challenges[J].Knowledge and Information Systems,2021,63:2231-2276.
[18]AGRAHARI S,SINGH A K.Concept drift detection in data stream mining:A literature review[J].Journal of King Saud University-Computer and Information Sciences,2022,34(10):9523-9540.
[19]POKRAJAC D,LAZAREVIC A,LATECKI L J.Incrementallocal outlier detection for data streams[C]//2007 IEEE Symposium on Computational Intelligence and Data Mining.IEEE,2007:504-515.
[20]SALEHI M,LECKIE C,BEZDEK J C,et al.Fast memory efficient local outlier detection in data streams[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(12):3246-3260.
[21]COVER T,HART P.Nearest neighbor pattern classification[J].IEEE Transactions on Information Theory,1967,13(1):21-27.
[22]TANG D,ZHANG S,CHEN J,et al.The detection of low-rate DoS attacks using the SADBSCAN algorithm[J].Information Sciences,2021,565:229-247.
[23]NITHINSHA S,ANUSUYA S.Designing framework to secure data using K Means clustering based outlier Detection(KCOD) algorithm[J].Journal of Intelligent & Fuzzy Systems,2023,44(1):1057-1068.
[24]TAX D M J,DUIN R P W.Support vector data description[J].Machine Learning,2004,54:45-66.
[25]ZONG B,SONG Q,MIN M R,et al.Deep autoencoding gaussian mixture model for unsupervised anomaly detection[C]//International Conference on Learning Representations.2018.
[26]MUNIR M,SIDDIQUI S A,DENGEL A,et al.DeepAnT:Adeep learning approach for unsupervised anomaly detection in time series[J].IEEE Access,2018,7:1991-2005.
[27]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[28]PENG B,LI Y D,GONG X F.Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder[J].Computer Science,2024,51(6A):230700070-5.
[29]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative Adversarial Nets[C]//Proceedings of the 28th International Conference on Neural Information Processing System.2014:2672-2680.
[30]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need.Advances in neural information processing systems[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.
[31]HE J,ZHAO L,YANG H,et al.HSI-BERT:Hyperspectralimage classification using the bidirectional encoder representation from transformers[J].IEEE Transactions on Geoscience and Remote Sensing,2019,58(1):165-178.
[32]YOON S,LEE Y,LEE J G,et al.Adaptive model pooling for online deep anomaly detection from a complex evolving data stream[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:2347-2357.
[33]YUAN Y,ADHATARAO S S,LIN M,et al.Ada:Adaptivedeep log anomaly detector[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications.IEEE,2020:2449-2458.
[34]LI D,CHEN D,JIN B H,et al.MAD-GAN:Multivariate anomaly detection for time series data with generative adversarial networks[C]//LNTCS.2019:703-716.
[35]BASHAR M A,NAYAK R.TAnoGAN:Time series anomalydetection with generative adversarial networks[C]//2020 IEEE Symposium Series on Computational Intelligence(SSCI).IEEE,2020:1778-1785.
[36]LU T,WANG L,ZHAO X.Review of anomaly detection algorithms for data streams[J].Applied Sciences,2023,13(10):6353.
[37]LI S,ZHANG K,DUAN P,et al.Hyperspectral anomaly detection with kernel isolation forest[J].IEEE Transactions on Geoscience and Remote Sensing,2019,58(1):319-329.
[38]SUN Y,QIN W,ZHUANG Z,et al.An adaptive fault detection and root-cause analysis scheme for complex industrial processes using moving window KPCA and information geometric causal inference[J].Journal of Intelligent Manufacturing,2021,32:2007-2021.
[39]MACIAG P S,KRYSZKIEWICZ M,BEMBENIK R,et al.Unsupervised anomaly detection in stream data with online evolving spiking neural networks[J].Neural Networks,2021,139:118-139.
[1] LI Zhiqian, ZHENG Jiali, CHEN Yijun, ZHANG Jiangbo. Enhanced Snake Optimizer Based RFID Network Planning [J]. Computer Science, 2024, 51(6): 375-383.
[2] PENG Yan,WU Zhao-qiang, ZHANG Jing-kuo, CHEN Run-xue. Improved Difference Algorithm and It’s Application in QRS Detection [J]. Computer Science, 2018, 45(6A): 588-590.
[3] HE Xiao-jun, WU Meng-lin, FAN Wen, YUAN Song-tao, CHEN Qiang. SD-OCT CSC NRD Region Segmentation Based on Region Restricted 3D Region Growing [J]. Computer Science, 2018, 45(6A): 187-192.
[4] ZHANG Wen-ya, XU Hua-zhong and LUO Jie. Moving Objects Detection under Complex Background Based on ViBe [J]. Computer Science, 2017, 44(9): 304-307.
[5] REN Dian-yuan, WANG Wen-wei and MA Qiang. Background Subtraction Based on Color and Local Binary Similarity Pattern [J]. Computer Science, 2016, 43(3): 296-300.
[6] ZHANG Kun,WANG Cui-rong and WAN Cong. Adaptive Threshold Background Modeling Algorithm Based on Chebyshev Inequality [J]. Computer Science, 2013, 40(4): 287-291.
[7] . Adaptive Segmentation Algorithm of Visual Impurity in Liquid Based on Motive Information [J]. Computer Science, 2012, 39(11): 272-276.
[8] REN Yong-gong, LU Zhen, SUN Yu-qi. Frequent Itemsets Mining Algorithm of Succinct Constraint with Adaptive Thresholds [J]. Computer Science, 2011, 38(9): 155-157.
[9] WEI Zhi-qiang ,SUN Ya-bing, JI Xiao-peng, YANG Miao (Computer Science Department,Ocean University of China,Qingdao 266100,China). [J]. Computer Science, 2009, 36(1): 211-215.
[10] . [J]. Computer Science, 2008, 35(8): 220-222.
[11] SHI Shi xu ZHENG Qi-lun HUANG Han (Institute of Computer Science,South China University of Tech,Guangzhou 510640,China). [J]. Computer Science, 2008, 35(7): 224-226.
[12] Wu AiYan;Wei ShiZe. [J]. Computer Science, 2005, 32(8): 196-199.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!