计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 143-150.doi: 10.11896/jsjkx.241200100
刘华帅, 陶厚国, 岳昆, 段亮
LIU Huashuai, TAO Houguo, YUE Kun, DUAN Liang
摘要: 故障根因分析旨在找到导致特定问题、故障或事件发生的原因,是多个领域中追踪溯源的重要支撑技术,但现有方法在效率、准确性和稳定性等方面仍不能满足故障根因分析任务的实际需求。对此,将贝叶斯网作为相关属性之间依赖关系表示和推理的知识框架,提出基于贝叶斯网的故障根因分析方法。首先,针对高维数据和稀疏样本带来的挑战,提出基于向量量化自编码器的高维属性约简算法,并给出α-BIC评分准则,高效地学习根因贝叶斯网(Root Cause Bayesian Network,RCBN)。随后,基于贝叶斯网嵌入技术实现RCBN的高效推理,高效计算各原因条件下故障产生的可能性,进而使用因果模型中的Blame机制度量各原因对给定故障的贡献度,从而实现故障根因分析。在3个公共数据集和3个合成数据集上的实验结果表明,所提方法的平均检测准确性和效率明显优于对比方法,在CHILD数据集上精度提升了7%,运行时间快了60%。
中图分类号:
| [1]CHENG Y,WANG L,ZHAO X Y.A Review of Root CauseAnalysis Research[J].Computer Application Research,2023,40(4):961-966. [2]WANG L,ZHANG C Y,DING R M,et al.Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2023:5116-5125. [3]JIA T,LI Y,WU Z H.A review of fault diagnosis in distributed software systems based on log data[J].Journal of Software,2020,31(7):1997-2018. [4]XUE W,PENG M,MA Y,et al.Classification-based approach for cell outage detection in self-healing heterogeneous networks[C]//Proceedings of the IEEE Wireless Communications and Networking Conference.Piscataway,NJ:IEEE,2014:2822-2826. [5]YANG S,SHAN C,YANG W,et al.CMMD:Cross-metricmulti-dimensional root cause analysis[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2022:4310-4320. [6]BUDHATHOKI K,MINORICS L,BLOBAUM P,et al.Causal structure-based root cause analysis of outliers[C]//Proceedings of 39th International Conference on Machine Learning.New York:ACM,2022:2357-2369. [7]ZHANG L W,GUO H P.An introduction to Bayesian networks[M].Beijing:Science Press,2006:30-192. [8]VAN DEN OORD A,VINVALS O.Neural discrete representation learning[C]//Proceedings of the 31st Advances in Neural Information Processing Systems.Massachusetts:MIT Press,2017:6306-6315. [9]CHOCKLER H,HALPERN J Y.Responsibility and Blame:a structural-model approach[J].Journal of Artificial Intelligence Research,2004,22:93-115. [10]QI Z W,YUE K,DUAN L,et al.Matrix factorization based Bayesian network embedding for efficient probabilistic infe-rences[J].Expert Systems With Applications,2020,169:114294. [11]LIU P,CHEN Y,NIE X,et al.Fluxrank:A widely-deployable framework to automatically localizing root cause machines for software service failure mitigation[C]//Proceedings of the 30th International Symposium on Software Reliability Engineering.New York:IEEE,2019:35-46. [12]CUNHA P,RODRIGO H,GOEDTEL A,et al.A comprehensive evaluation of intelligent classifiers for fault identification in three-phase induction Motors[J].Electric Power Systems Research,2015,127:249-258. [13]WANG X,YAN K.Fault Detection and Diagnosis of HVAC System Based on Federated Learning[J].Computer Science,2022,49(12):74-80. [14]LIANG H Y.Fault Diagnosis of Power Transformer Based on Stacked Sparse Autoencoder and XGBoost[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2024(6):65-71. [15]JIANG W B,BAI Y B.APGNN:Alarm propagation graph neural network for fault detection and alarm root cause analysis[J].Computer Networks,2023,220:322-327. [16]YAN S,SHAN C,YANG W,et al.CMMD:Cross-Metric Multi-Dimensional Root Cause Analysis[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD).2022:4310-4320. [17]RANJITA B,RAHUL K,RAMACHANDRAN R,et al.Adtri-butor:Revenue debugging in advertising systems[C]//Procee-dings of the 11th USENIX Symposium on Networked Systems Design and Implementation.2014:43-55. [18]LIU J X,WU N,DING F.Fault Detection Based on Dead Rec-koning in VANETs[J].Computer Science,2022,49(12):319-325. [19]LI Z,LUO C,ZHAO Y,et al.Generic and robust localization of multi-dimensional root causes[C]//Proceedings of the 30th International Symposium on Software Reliability Engineering(ISSRE).2019:47-57. [20]CHEN B,LI J,WEI J.A graph-based algorithm for root cause analysis of faults in telecommunication networks[C]//Procee-dings of the 19th International Conference on Automation Science and Engineering.2023:1-7. [21]MATSUO Y,NAKANO Y,WATANABE A,et al.Root-cause diagnosis for rare failures using Bayesian network with dynamic modification[C]//Proceedings of the IEEE International Conference on Communications.2018:1-6. [22]WEE Y Y,CHEAH W P,TAN S C,et al.A method for rootcause analysis with a Bayesian belief network and fuzzy cognitive map[J].Expert Systems with Applications,2015,42(1):468-487. [23]WUNDERLICH P,NIGGEMANN O.Structure learning me-thods for Bayesian networks to reduce alarm floods byidenti-fying the root cause[C]//Proceedings of the 22nd IEEE International Conference on Emerging Technologies and Factory Automation.2017:1-8. [24]ZHANG T,CHEN Q,JIANG Y,et al.Root cause analysis for wireless network fault localization[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.2022:9301-9305. [25]ZHANG Y,GAO G,WANG B,et al.A novel ensemble method for k-earest neighbor[J].Pattern Recognition,2019,85:13-25. [26]ROKHORENKOVA L,GUSEV G,VOROBEV A,et al.Cat-Boost:Unbiased boosting with categorical features[C]//Proceedings of the 32nd Advances in Neural Information Processing System.2018:6638-6648. [27]LYU Z,LIU Y,WANG X,et al.A knowledge-enhanced Transformer-FL method for fault root cause localization[C]//Proceedings of the 33rd ACM International Conference on Information and Knowledge Management.2024:1607-1616. |
|
||