计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230400016-8.doi: 10.11896/jsjkx.230400016

• 网络&通信 • 上一篇    下一篇

深度学习驱动下IaaS云运维异常检测算法的研究进展

司佳1, 梁建峰1, 谢硕1, 邓英俊2   

  1. 1 国家海洋信息中心 天津 300171
    2 天津大学应用数学中心 天津 300072
  • 发布日期:2024-06-06
  • 通讯作者: 邓英俊(yingjun.deng@tju.edu.cn)
  • 作者简介:(sijia@nmdis.org.cn)
  • 基金资助:
    国家海洋信息中心青年基金项目(202102006);南海海洋资源利用国家重点实验室开放基金(MRUKF2021035)

Research Progress of Anomaly Detection in IaaS Cloud Operation Driven by Deep Learning

SI Jia1, LIANG Jianfeng1, XIE Shuo1, DENG Yingjun2   

  1. 1 National Marine Data and Information Service,Tianjin 300171,China
    2 Center for Applied Mathematics,Tianjin University,Tianjin 300072,China
  • Published:2024-06-06
  • About author:SI Jia,born in 1994,postgraduate,assistant engineer.Her main research interests include marine information system and so on.
    DENG Yingjun,born in 1986,Ph.D,lecturer.His main research interests include predictive maintenance and machine learning.
  • Supported by:
    National Marine Data and Information Service Youth Fund Project(202102006) and Open Fund of State Key Laboratory of Marine Resources Utilization in South China Sea(MRUKF2021035).

摘要: 异常检测是IaaS云系统运维中的一个关键任务,通过早期预警和提前干预,可有效避免系统崩溃等严重事故的发生。但相较于传统数据中心,IaaS云系统具有较大规模的计算节点,节点拓扑复杂、监测数据量大、缺少标注信息等特点,为IaaS云运维异常检测带来新的挑战。从深度学习的技术框架出发,分析了异常检测问题面临的难点,调研总结了IaaS云系统下常见异常检测算法和相关技术。面向节点异常和系统异常两类典型问题,对深度学习驱动的解决方法进行调研:面向节点级别异常,重点调研了时间依赖的运维数据下由时序数据驱动的检测算法;面向系统级别异常,重点调研了网络拓扑建模下由图数据驱动的检测算法。最后,提出了数据驱动下IaaS云运维数据异常检测中的新问题与新挑战。

关键词: 异常检测, IaaS云平台, 时序数据, 图数据, 深度学习, 机器学习

Abstract: Anomaly detection is an important task in the operation and maintenance of IaaS cloud systems.Through early warning and intervention,serious accidents such as system crashes can be effectively avoided.However,compared to traditional data centers,IaaS cloud systemshave the characteristics of large-scale computing nodes,complex node topology,large monitoring data vo-lume,and lack of data labels,which bring new challenges for IaaS cloud anomaly detection.Starting from the technical framework of deep learning,this paper analyzes the difficulties faced by anomaly detection problems,and summarizes common anomaly detection algorithms and related technologies in IaaS cloud systems.This paper investigates deep learning driven solutions for two typical problems:node anomalies and system anomalies.For node anomalies,detection algorithms driven by temporal data are studied for time-dependent data.For system anomalies,detection algorithms driven by graph data in network topology modeling are investigated.Finally,new issues and challenges in data-driven anomaly detection in IaaS cloud systems are proposed.

Key words: Anomaly detection, IaaS cloud, Time series data, Graph data, Deep learning, Machine learning

中图分类号: 

  • TP311.1
[1]JIANG P.Development and Application of Smart Ocean CloudPlatform under the Internet of Things[J].Journal of Marine Information Technology and Application,2022,3:10-17.
[2]SUN C,WANG Y,PAN Z,et al.Design and implementation of island information management and display system based on cloud storage technology[J].Marine Science Bulletin,2019,2:233-240.
[3]QIU J,DU Q,QIAN C.KPI-TSAD:A Time-Series AnomalyDetector for KPI Monitoring in Cloud Applications[J].Symmetry,2019,11:1350.
[4]GUERRON X,ABRAHO S,INSFRAN E,et al.A taxonomy of quality metrics for cloud services[J].IEEE Access,2020,8:131461-131498.
[5]MENG W,LIU Y,ZHU Y,et al.LogAnomaly:UnsupervisedDetection of Sequential and Quantitative Anomalies in Unstructured Logs[C]//Twenty-Eighth International Joint Conference on Artificial Intelligence(IJCAI-19).2019:4739-4745.
[6]XIU Z.Request Tracing and Anomalies Detecting System inCloud[D].Wuhan:Huazhong University of Science and Technology,2014.
[7]LAVIN A,AHMAD S.Evaluating Real-Time Anomaly Detection Algorithms-The Numenta Anomaly Benchmark[C]//IEEE 14th International Conference on Machine Learning and Applications(ICMLA).2015:38-44.
[8]LI Z Y,ZHAO N W,ZHANG S L,et al.Constructing Large-Scale Real-World Benchmark Datasets for AIOps[J/OL].(2022-08-08)[2023-03-08].https://doi.org/10.48550/arXiv.2208.03938.
[9]ZHANG X,LIN Q,XU Y,et al.Cross-dataset time series ano-maly detection for cloud systems[C]//USENIX Annual Technical Conference.USENIX Association,2019:1063-1076.
[10]LIU H,LU S,MUSUVATHI M,et al.What bugs cause production cloud incidents?[C]//Proceedings of the Workshop on Hot Topics in Operating System.ACM,2019:155-162.
[11]VISHWANATH K V,NAGAPPAN N.Characterizing CloudComputing Hardware Reliability[C]//Proceedings of the 1st ACM Symposium on Cloud Computing.ACM,2010:193-204.
[12]SOHL-DICKSTEIN J,WEISS E A,MAHESWARANATHAN N,et al.Deep Unsupervised Learning Thermodynamics[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning.ACM,2015:2256-2265.
[13]HO J,JAIN A,ABBEEL P.Denoising Diffusion ProbabilisticModels[J/OL].(2020-12-16)[2023-03-08].https://arxiv.org/abs/2006.11239.
[14]SIFFER A,FOUQUE P A,TERMIER A,et al.Anomaly Detection in Streams with Extreme Value Theory[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2017:1067-1075.
[15]WARD R A,WU X,BOTTOU L.AdaGrad stepsizes:sharpconvergence over nonconvex landscapes[C]//Proceedings of the 36th International Conference on Machine Learning.2019:6677-6686.
[16]KINGMA D,BA J.Adam:A Method for Stochastic Optimization[J/OL].(2017-01-30)[2023-03-08]https://doi.org/10.48550/arXiv.1412.6980.
[17]SETTLES B.Active Learning Literature Survey[J/OL].(2012-03-15)[2023-04-01].http://digital.library.wisc.edu/1793/60660.
[18]HAN S,WU Q,ZHANG H,et al.Log-based Anomaly Detection with Robust Feature Extraction and Online Learning[J].IEEE Transactions on Information Forensics and Security,2021,16:2300-2311.
[19]ZHAO Y,NASRULLAH Z,HRYNIEWICKI M K,et al.LSCP:Locally Selective Combination in Parallel Outlier Ensembles[C]//Proceedings of the 2019 SIAM International Confe-rence on Data Mining.2019:585-593.
[20]MALHOTRA P,VIG L,SHROFF G,et al.Long Short Term Memory Networks for Anomaly Detection in Time Series[C]//23rd European Symposium on Artificial Neural Networks,Computational Intelligence and Machine Learning.2015.
[21]LI D,CHEN D,SHI L,et al.MAD-GAN:Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks[C]//Artificial Neural Networks and Machine Lear-ning(ICANN 2019):Text and Time Series:28th International Conference on Artificial Neural Networks.ACM,2019:703-716.
[22]SU Y,ZHAO Y,NIU C,et al.Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:2828-2837.
[23]BAI S J,ZICO K J,KOLTUN V,et al.An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling[J/OL].(2018-04-19)[2023-04-08].https://doi.org/10.48550/arXiv.1803.01271.
[24]HE Y,ZHAO J.Temporal Convolutional Networks for Anomaly Detection in Time Series[J].Journal of Physics:Conference Series,2019,1213:042050.
[25]THILL M,KONEN W,WANG H,et al.Temporal convolu-tional autoencoder for unsupervised anomaly detection in time series[J].Applied Soft Computing,2021,3:107751.
[26]PHAM T,LEE J,PARK C.MST-VAE:Multi-Scale Temporal Variational Autoencoder for Anomaly Detection in Multivariate Time Series[J].Applied Sciences,2022,12(19):10078.
[27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J/OL].(2017-12-06)[2023-03-08].https://doi.org/10.48550/arXiv.1706.03762.
[28]XU J,WU H,WANG J,et al.Anomaly Transformer:Time Series Anomaly Detection with Association Discrepancy[J/OL].(2022-06-29)[2023-03-08].https://doi.org/10.48550/arXiv.2110.02642.
[29]TULI S,CASALE G,JENNINGS N R.TranAD:Deep Trans-former Networks for Anomaly Detection in Multivariate Time Series Data[J].Pro.VLDB Endow,2022,15:1201-1214.
[30]AHMAD S,LAVIN A,PURDY S,et al.Unsupervised real-time anomaly detection for streaming data[J].Neurocomputing,2017,262:134-147.
[31]HE Z,CHEN P,LI X,et al.A Spatiotemporal Deep LearningApproach for Unsupervised Anomaly Detection in Cloud Systems[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):1705-1719.
[32]DENG A,HOOI B.Graph Neural Network-Based Anomaly Detection in Multivariate Time Series[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(5):4027-4035.
[33]SCARSELLI F,GORI M,TSOIA C,et al.The Graph Neural Network Model[J].IEEE Transactions on Neural Networks,2009,20(1):61-80.
[34]KIPF T,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[J/OL].(2017-02-22)[2023-03-08].https://doi.org/10.48550/arXiv.1609.02907907K.
[35]VELIKOVI P,CUCURULL G,CASANOVA A,et al.GraphAttention Networks[J/OL].(2018-02-04)[2023-03-08].https://doi.org/10.48550/arXiv.1710.10903.
[36]ZHENG L,LI Z,LI J,et al.AddGraph:Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN[C]//Twenty-Eighth International Joint Conference on Artificial Intelligence(IJCAI-19).2019:4419-4425.
[37]WANG S,LI W,HOU S,et al.STA-GAN:A Spatio-Temporal Attention Generative Adversarial Network for Missing Value Imputation in Satellite Data[J].Remote Sensing,2022,15:88.
[38]YU W,WEI C,AGGARWAL C C,et al.NetWalk:A FlexibleDeep Embedding Approach for Anomaly Detection in Dynamic Networks[C]//The 24th ACM SIGKDD International Confe-rence on Knowledge Discovery & Data Mining.2018:2672-2681.
[39]MA X,WU J,XUE S,et al.A Comprehensive Survey on Graph Anomaly Detection with Deep Learning[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(12):12012-12038.
[40]ZHAO N,ZHU J,LIU R,et al.Label-Less:A Semi-Automatic Labelling Tool for KPI Anomalies[C]//IEEE Conference on Computer Communications(INFOCOM 2019).IEEE,2019:1882-1890.
[41]ZHANG X,LIN Q,XU Y,et al.Cross-dataset Time SeriesAnomaly Detection for Cloud Systems[C]//USENIX Annual Technical Conference.2019:1063-1076.
[42]VERLEYSEN M,FRENA Y.Classification in the Presence of Label Noise:A Survey[J].IEEE Transactions on Neural Networks and Learning Systems,2014,25(5):845-869.
[43]ZHAO N,CHEN J,YU Z,et al.Identifying bad software changes via multimodal anomaly detection for online service systems[C]//Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2021:527-539.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!