计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 143-150.doi: 10.11896/jsjkx.241200100

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

基于贝叶斯网的故障根因分析

刘华帅, 陶厚国, 岳昆, 段亮   

  1. 云南大学信息学院 昆明 650500
  • 收稿日期:2024-12-13 修回日期:2025-03-08 发布日期:2026-03-12
  • 通讯作者: 段亮(duanl@ynu.edu.cn)
  • 作者简介:(avalon@mail.ynu.edu.cn)
  • 基金资助:
    云南省重大科技专项(202202AD080001);云南省“兴滇英才支持计划”青年人才项目(C6213001195)

Bayesian Network Based Fault Root Cause Analysis

LIU Huashuai, TAO Houguo, YUE Kun, DUAN Liang   

  1. School of Information Science and Engineering, Yunnan University, Kunming 650500, China
  • Received:2024-12-13 Revised:2025-03-08 Online:2026-03-12
  • About author:LIU Huashuai,born in 2003,postgra-duate.His main research interest is data and knowledge engineering.
    DUAN Liang,born in 1986,Ph.D,associate professor,is a member of CCF(No.95258M).His main research interests include graph analysis and knowledge engineering.
  • Supported by:
    Major Project of Science and Technology of Yunnan Province(202202AD080001) and Xingdian Young Talent Program of Yunnan Province(C6213001195).

摘要: 故障根因分析旨在找到导致特定问题、故障或事件发生的原因,是多个领域中追踪溯源的重要支撑技术,但现有方法在效率、准确性和稳定性等方面仍不能满足故障根因分析任务的实际需求。对此,将贝叶斯网作为相关属性之间依赖关系表示和推理的知识框架,提出基于贝叶斯网的故障根因分析方法。首先,针对高维数据和稀疏样本带来的挑战,提出基于向量量化自编码器的高维属性约简算法,并给出α-BIC评分准则,高效地学习根因贝叶斯网(Root Cause Bayesian Network,RCBN)。随后,基于贝叶斯网嵌入技术实现RCBN的高效推理,高效计算各原因条件下故障产生的可能性,进而使用因果模型中的Blame机制度量各原因对给定故障的贡献度,从而实现故障根因分析。在3个公共数据集和3个合成数据集上的实验结果表明,所提方法的平均检测准确性和效率明显优于对比方法,在CHILD数据集上精度提升了7%,运行时间快了60%。

关键词: 故障根因分析, 贝叶斯网, 向量量化自编码器, 贝叶斯信息准则, 根因贡献度

Abstract: Fault root cause analysis is to find the occurrence cause of specific problems,faults and events,becoming the important technique for origin tracing in several paradigms.However,existing methods still cannot satisfy practical requirements of efficiency,accuracy and stability.BN(Bayesian network) is used as the knowledge framework for representing and inferring the depen-dencies among relevant attributes.Specifically,the vector quantized variational autoencoder algorithm for attribute reduction is proposed at first.Then,the α-BIC scoring metric is adopted to learn RCBN efficiently.Following,efficient inferences in RCBN are implemented by BN embedding by calculating the probabilities of fault occurrence for given causes.Finally,the Blame mechanism in causal model is adopted to evaluate the contribution of causes w.r.t.given faults and fulfill fault root cause analysis.Experimental results on 3 public datasets and 3 synthetic datasets show that the average accuracy and efficiency of the proposed fault detection are better than current representative methods,such that the precision is 7% higher and the running time is 60% faster than the comparison methods.

Key words: Fault root cause analysis, Bayesian network, Vector quantized variational autoencoder, Bayesian information criterion, Root cause contribution

中图分类号: 

  • TP391
[1]CHENG Y,WANG L,ZHAO X Y.A Review of Root CauseAnalysis Research[J].Computer Application Research,2023,40(4):961-966.
[2]WANG L,ZHANG C Y,DING R M,et al.Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2023:5116-5125.
[3]JIA T,LI Y,WU Z H.A review of fault diagnosis in distributed software systems based on log data[J].Journal of Software,2020,31(7):1997-2018.
[4]XUE W,PENG M,MA Y,et al.Classification-based approach for cell outage detection in self-healing heterogeneous networks[C]//Proceedings of the IEEE Wireless Communications and Networking Conference.Piscataway,NJ:IEEE,2014:2822-2826.
[5]YANG S,SHAN C,YANG W,et al.CMMD:Cross-metricmulti-dimensional root cause analysis[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2022:4310-4320.
[6]BUDHATHOKI K,MINORICS L,BLOBAUM P,et al.Causal structure-based root cause analysis of outliers[C]//Proceedings of 39th International Conference on Machine Learning.New York:ACM,2022:2357-2369.
[7]ZHANG L W,GUO H P.An introduction to Bayesian networks[M].Beijing:Science Press,2006:30-192.
[8]VAN DEN OORD A,VINVALS O.Neural discrete representation learning[C]//Proceedings of the 31st Advances in Neural Information Processing Systems.Massachusetts:MIT Press,2017:6306-6315.
[9]CHOCKLER H,HALPERN J Y.Responsibility and Blame:a structural-model approach[J].Journal of Artificial Intelligence Research,2004,22:93-115.
[10]QI Z W,YUE K,DUAN L,et al.Matrix factorization based Bayesian network embedding for efficient probabilistic infe-rences[J].Expert Systems With Applications,2020,169:114294.
[11]LIU P,CHEN Y,NIE X,et al.Fluxrank:A widely-deployable framework to automatically localizing root cause machines for software service failure mitigation[C]//Proceedings of the 30th International Symposium on Software Reliability Engineering.New York:IEEE,2019:35-46.
[12]CUNHA P,RODRIGO H,GOEDTEL A,et al.A comprehensive evaluation of intelligent classifiers for fault identification in three-phase induction Motors[J].Electric Power Systems Research,2015,127:249-258.
[13]WANG X,YAN K.Fault Detection and Diagnosis of HVAC System Based on Federated Learning[J].Computer Science,2022,49(12):74-80.
[14]LIANG H Y.Fault Diagnosis of Power Transformer Based on Stacked Sparse Autoencoder and XGBoost[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2024(6):65-71.
[15]JIANG W B,BAI Y B.APGNN:Alarm propagation graph neural network for fault detection and alarm root cause analysis[J].Computer Networks,2023,220:322-327.
[16]YAN S,SHAN C,YANG W,et al.CMMD:Cross-Metric Multi-Dimensional Root Cause Analysis[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).2022:4310-4320.
[17]RANJITA B,RAHUL K,RAMACHANDRAN R,et al.Adtri-butor:Revenue debugging in advertising systems[C]//Procee-dings of the 11th USENIX Symposium on Networked Systems Design and Implementation.2014:43-55.
[18]LIU J X,WU N,DING F.Fault Detection Based on Dead Rec-koning in VANETs[J].Computer Science,2022,49(12):319-325.
[19]LI Z,LUO C,ZHAO Y,et al.Generic and robust localization of multi-dimensional root causes[C]//Proceedings of the 30th International Symposium on Software Reliability Engineering(ISSRE).2019:47-57.
[20]CHEN B,LI J,WEI J.A graph-based algorithm for root cause analysis of faults in telecommunication networks[C]//Procee-dings of the 19th International Conference on Automation Science and Engineering.2023:1-7.
[21]MATSUO Y,NAKANO Y,WATANABE A,et al.Root-cause diagnosis for rare failures using Bayesian network with dynamic modification[C]//Proceedings of the IEEE International Conference on Communications.2018:1-6.
[22]WEE Y Y,CHEAH W P,TAN S C,et al.A method for rootcause analysis with a Bayesian belief network and fuzzy cognitive map[J].Expert Systems with Applications,2015,42(1):468-487.
[23]WUNDERLICH P,NIGGEMANN O.Structure learning me-thods for Bayesian networks to reduce alarm floods byidenti-fying the root cause[C]//Proceedings of the 22nd IEEE International Conference on Emerging Technologies and Factory Automation.2017:1-8.
[24]ZHANG T,CHEN Q,JIANG Y,et al.Root cause analysis for wireless network fault localization[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2022:9301-9305.
[25]ZHANG Y,GAO G,WANG B,et al.A novel ensemble method for k-earest neighbor[J].Pattern Recognition,2019,85:13-25.
[26]ROKHORENKOVA L,GUSEV G,VOROBEV A,et al.Cat-Boost:Unbiased boosting with categorical features[C]//Proceedings of the 32nd Advances in Neural Information Processing System.2018:6638-6648.
[27]LYU Z,LIU Y,WANG X,et al.A knowledge-enhanced Transformer-FL method for fault root cause localization[C]//Proceedings of the 33rd ACM International Conference on Information and Knowledge Management.2024:1607-1616.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!