计算机科学 ›› 2026, Vol. 53 ›› Issue (5): 388-403.doi: 10.11896/jsjkx.250300131

• 计算机体系结构 • 上一篇    下一篇

基于多任务联邦学习的微服务架构系统异常检测与诊断

陈鹏1, 郝俊峰1, 夏云霓2, 李曦1   

  1. 1 西华大学计算机与软件工程学院 成都 610039
    2 重庆大学计算机学院 重庆 400044
  • 收稿日期:2025-03-24 修回日期:2025-06-24 发布日期:2026-05-08
  • 通讯作者: 夏云霓(xiayunni@hotmail.com)
  • 作者简介:(chenpeng@mail.xhu.edu.cn)
  • 基金资助:
    国家自然科学基金(62172062);四川省自然科学基金创新研究群体项目(2024NSFTD0008);四川省科技计划(2020JDRC0067,2023JDRC0087)

Novel Multi-task Federated Learning Based Approach for Detecting and Diagnosing Anomalies inCloud Microservices

CHEN Peng1, HAO Junfeng1, XIA Yunni2, LI Xi1   

  1. 1 School of Computer and Software Engineering, Xihua University, Chengdu 610039, China
    2 College of Computer, Chongqing University, Chongqing 400044, China
  • Received:2025-03-24 Revised:2025-06-24 Online:2026-05-08
  • About author:CHEN Peng,born in 1979,Ph.D,professor,is a executive member of CCF(No.B3144M).His main research interests include cloud computing,service computing and anomaly detection.
    XIA Yunni,born in 1980,Ph.D,professor,doctoral supervisor,is a member of CCF(No.23641M).His main research interests include cloud computing,ser-vice computing and edge computing.
  • Supported by:
    National Natural Science Foundation of China(62172062),Sichuan Provincial Natural Science Foundation(2024NSFTD0008) and Science and Technology Program of Sichuan Province(2020JDRC0067,2023JDRC0087).

摘要: 微服务架构广泛应用于云环境中的应用开发,其本质是通过一系列功能独立的小型自治服务构建应用,具有高内聚、高可用、低耦合和良好的可扩展性。然而,由于微服务架构是一种分布式计算架构,具有高动态性,因此对分布式部署且相互独立的各个微服务进行实时系统异常检测是一项非常具有挑战性的工作,更进一步地,确定检测到异常的类别在实际应用中则更为关键。为解决上述问题,提出了基于多任务联邦学习的系统异常检测与诊断方法MT-FL-SADD(Multi-Task Federated Learning based System Anomaly Detection and Diagnosis)。首先,提出了一种基于多任务联邦学习(Multi-Task Federated Learning,MT-FL)的分布式学习框架,该框架用来构建各个微服务的异常检测与诊断模型;其次,为了识别微服务运行时复杂的系统异常模式与特征,构建了一种基于压缩激活(Squeeze Excitetion,SE)和外部注意力(External Attention,EA)的双网络特征提取器(SE and EA based Enhance Dual Network,SE-EA-EDN),以高效提取运行时微服务系统监控实时数据特征;最后,设计了一种基于本地-全局特征的并行知识迁移框架(Local-Global Feature-based Parallel Knowledge Transfer,LGF-PKT),并行化实现本地和全局特征的权重更新。为了验证所提方法的有效性,在微服务基准测试平台Sock Shop和Train Ticket上进行了对比实验,MT-FL-SADD相比其他联邦学习方法平均Macro F1提高了33.9%,平均Micro F1提高了33.4%;同时,该方法在SWaT,SMD和SKAB上相比其他联邦学习方法的平均F1提升了2.2%。

关键词: 微服务架构, 多任务联邦学习, 分布式计算架构, 系统异常检测与诊断

Abstract: Microservice architecture is widely used for application development in cloud environments,and its essence is to build applications through a series of functionally independent small autonomous services with high cohesion,high availability,low coupling,and good scalability.However,since microservice architecture is a distributed computing architecture with high dynamics and real-time system anomaly detection of distributed and independent microservices is a very challenging task,determining the category of detected anomalies is even more critical in practical applications.To solve the above problems,a multi-task federated learning-based system anomaly detection and diagnosis(MT-FL-SADD) method is proposed.Firstly,a multi-task federated lear-ning(MT-FL) distributed learning framework is proposed,which is used to construct an anomaly detection and diagnosis model for each microservice.Secondly,in order to identify the complex system anomaly patterns and features at the runtime of microservices,a feature extractor based on squeeze excitetion and external attention(SE-EA-EDN) is constructed to efficiently extract the features of real-time data from microservices monitoring at the runtime.Finally,a local-global feature-based parallel knowledge transfer(LGF-PKT) framework is designed to parallelize the weight update of local and global features.To validate the effectiveness of the proposed method,MT-FL-SADD improves the average Macro F1 by 33.9% and the average Micro F1 by 33.4% compared to other federated learning methods on the microservices benchmarking platforms Sock Shop and Train Ticket,and also improves the average F1 by 2.2% compared to other federated learning methods on SWaT,SMD and SKAB.

Key words: Microservice architecture, Multi-task federated learning, Distributed computing architectures, System anomaly detection and diagnosis

中图分类号: 

  • TP181
[1]NICOLA D,SAVERIO G,ALBERTO L,et al.Microservices:yesterday,today,and tomorrow[M]//Present and Ulterior Software Engineering,2017:195-216.
[2]LU W,JIANG Y,LI Q S,et al.A Review of Research on Microservice Fault Detection[J].Chinese Journal of Computers,2023,46(11):2342-2369.
[3]ZEINA H,DANIEL B T,EDDY C,et al.Enhancing microser-vices architectures using data-driven service discovery and QoS guarantees[C]//Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster,Cloud and Internet Computing(CCGRID).IEEE,2020:290-299.
[4]ZENG Z H,LI C Y,LIAO Q.Multivariate Time Series Anomaly Detection Algorithm in Missing Value Scenario[J].Computer Science,2024,51(7):108-115.
[5]JAVAD G,DANIEL L.Challenges of Microservices Architecture:A Survey on the State of the Practice[C]//ZEUS.2018:1-8.
[6]ZHANG C,XIE Y,BAI H,et al.A survey on federated learning[J].Knowledge-Based Systems,2021,216:106775.
[7]LIU Y X,CHEN H,LIU Y H,et al.Privacy-preserving Techniques in Federated Learning[J].Journal of Software,2022,33(3):1057-1092.
[8]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[9]ZHANG Y,LIU J,ZUO X.Multi-task learning[J].ChineseJournal of Computing,2020,43(7):1340-1378.
[10]SMITH V,CHIANG C K,SANJABI M,et al.Federated multi-task learning[C]//Proceedings of the Advances in Neural Information Processing Systems(NeurIPS).2017.
[11]ZHANG Y,YANG Q.A survey on multi-task learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(12):5586-5609.
[12]VIACHESLAV K,IURII K,DMITRY L.Online forecastingand anomaly detection based on the ARIMA model[J].Applied Sciences,2021,11(7):3194.
[13]CHALAPATHY R,MENON A K,CHAWLA S.Anomaly detection using one-class neural networks[J].arXiv:1802.06360,2018.
[14]TAKEISHI N.Shapley values of reconstruction errors of pca for explaining anomaly detection[C]//Proceedings of the 2019 International Conference on Data Mining Workshops(ICDM Workshops).IEEE,2019:793-798.
[15]CHEN Y,ZHAO Q,LU L.Combining the outputs of various k-nearest neighbor anomaly detectors to form a robust ensemble model for high-dimensional geochemical anomaly detection[J].Journal of Geochemical Exploration,2021,231:106875.
[16]PAPER D.Scikit-learn classifier tuning from complex trainingsets[C]//Hands-on Scikit-Learn for Machine Learning Applications:Data Science Fundamentals with Python.2020:165-188.
[17]XU H,CHEN W,ZHAO N,et al.Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications[C]//Proceedings of the 2018 World Wide Web Confe-rence(WWW).2018:187-196.
[18]RUFF L,VANDERMEULEN R,GOERNITZ N,et al.Deepone-class classification[C]//Proceedings of the International Conference on Machine Learning(ICML).PMLR,2018:4393-4402.
[19]SONG Y J,XIN R Y,CHEN P,et al.Identifying performanceanomalies in fluctuating cloud environments:A robust correlative-GNN-based explainable approach[J].Future Generation Computer Systems,2023,145:77-86.
[20]CHEN P,LIU H Y,XIN R Y,et al.Effectively detecting operational anomalies in large-scale iot data infrastructures by using a gan-based predictive model[J].The Computer Journal,2022,65(11):2909-2925.
[21]AUDIBERT J,MICHIARDI P,GUYARD F,et al.Usad:Unsupervised anomaly detection on multivariate time series[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(SIGKDD).2020:3395-3404.
[22]TULI S,CASALE G,JENNINGS N R.TranAD:Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data[C]//VLDB.2022:1201-1214.
[23]DENG A,HOOI B.Graph neural network-based anomaly detection in multivariate time series[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2021:4027-4035.
[24]ZHOU X,WU J,LIANG W,et al.Reconstructed graph neural network with knowledge distillation for lightweight anomaly detection[J].IEEE Transactions on Neural Networks and Lear-ning Systems,2024,35(9):11817-11828.
[25]GUO H,ZHOU Z,ZHAO D,et al.EGNN:Energy-efficientanomaly detection for IoT multivariate time series data using graph neural network[J].Future Generation Computer Systems,2024:151:45-56.
[26]YANG X,ZHAO X,SHEN Z.A Generalizable Anomaly Detection Method in Dynamic Graphs[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2025:22001-22009.
[27]HUANG X,CHEN W,HU B,et al.Graph mixture of expertsand memory-augmented routers for multivariate time series anomaly detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2025:17476-17484.
[28]ZHANG C X,SONG D J,CHEN Y C,et al.A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.AAAI,2019:1409-1416.
[29]XIN R Y,LIU H Y,CHEN P,et al.Robust and accurate performance anomaly detection and prediction for cloud applications:a novel ensemble learning-based framework[J].Journal of Cloud Computing,2023,12(1):1-16.
[30]GUO J Y,LI R H,ZHANG Y,et al.Graph neural networkbased anomaly detection in dynamic networks[J].Journal of Software,2020,31(3):748-762.
[31]CHEN X,GE C,WANG M,et al.Supervised contrastive few-shot learning for high-frequency time series[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2023:7069-7077.
[32]LIU Y,HU T,ZHANG H,et al.iTransformer:Inverted Transformers Are Effective for Time Series Forecasting[C]//The Twelfth International Conference on Learning Representations(ICLR).2024.
[33]SATER R A,HAMZA A B.A federated learning approach toanomaly detection in smart buildings[J].ACM Transactions on Internet of Things,2021,2(4):1-23.
[34]NGUYEN T D,MARCHAL S,MIETTINEN M,et al.DÏoT:A federated self-learning anomaly detection system for IoT[C]//2019 IEEE 39th International Conference on Distributed Computing Systems(ICDCS).IEEE,2019:756-767.
[35]LI S,CHENG Y,LIU Y,et al.Abnormal client behavior detection in federated learning[J].arXiv:1910.09933,2019.
[36]YUROCHKIN M,AGARWAL M,GHOSH S,et al.Bayesian nonparametric federated learning of neural networks[C]//Proceedings of the International Conference on Machine Learning(ICML).PMLR,2019:7252-7261.
[37]CHEN Y,NING Y,CHAI Z,et al.Federated multi-task hierarchical attention model for sensor analytics[J].arXiv:1905.05142,2019.
[38]QU Z,LIN K,LI Z,et al.Federated learning’s blessing:Fedavg has linear speedup[C]//Proceedings of the ICLR 2021-Workshop on Distributed and Private Machine Learning(DPML).2021.
[39]LI X,HUANG K X,YANG W H,et al.On the convergence of fedavg on non-IID data[J].arXiv:1907.02189,2019.
[40]YANG H W,HE H,ZHANG W C,et al.FEDSTEG:A federated transfer learning framework for secure image steganalysis[J].IEEE Transactions on Network Science and Engineering,2020,8(2):1084-1094.
[41]LIU Y,KANG Y,XING C P,et al.A secure federated transfer learning framework[J].IEEE Intelligent Systems,2020,35(4):70-82.
[42]ZHU Z,HONG J,ZHOU J.Data-free knowledge distillation for heterogeneous federated learning[C]//Proceedings of the International Conference on Machine Learning(ICML).PMLR,2021:12878-12889.
[43]LONG G,XIE M,SHEN T,et al.Multi-center federated lear-ning:clients clustering for better personalization[C]//World Wide Web 26.2023:481-500.
[44]HAO J,CHEN P,CHEN J,et al.Effectively detecting and diagnosing distributed multivariate time series anomalies via Unsupervised Federated Hypernetwork[J].Information Processing &Management,2025,62(4):104107.
[45]GHOSH A,HONG J,YIN D,et al.Robust federated learning in a heterogeneous environment[J].arXiv:1906.06629,2019.
[46]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2018:7132-7141.
[47]GUO M H,LIU Z N,MU T J,et al.Beyond self-attention:External attention using two linear layers for visual tasks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(5):5436-5447.
[48]MARIANI L,MONNI C,PEZZÉ M,et al.Localizing faults in cloud systems[C]//Proceedings of the 2018 IEEE 11th International Conference on Software Testing,Verification and Validation(ICST).IEEE,2018:262-273.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!