计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 125-135.doi: 10.11896/jsjkx.220200106

• 计算机软件 • 上一篇    下一篇

基于云平台日志的故障检测和复杂构件系统即时可靠性度量研究

王博1,2,3, 华庆一1, 舒新峰2   

  1. 1 西北大学信息科学与技术学院 西安710119
    2 西安邮电大学计算机学院 西安710121
    3 西安邮电大学陕西省网络数据智能处理重点实验室 西安710121
  • 收稿日期:2022-02-17 修回日期:2022-06-05 发布日期:2022-12-14
  • 通讯作者: 华庆一(nwuhuaqingyi@163.com)
  • 作者简介:(wangbo@xupt.edu.cn)
  • 基金资助:
    陕西省科技攻关(2016GY-123);陕西省重点研发项目(2020GY-210);河南省工业科学技术研究项目(212102210418);国家自然科学基金(61272286)

Study on Anomaly Detection and Real-time Reliability Evaluation of Complex Component System Based on Log of Cloud Platform

WANG Bo1,2,3, HUA Qing-yi1, SHU Xin-feng2   

  1. 1 School of Information Science and Technology,Northwestern University,Xi’an 710119,China
    2 School of Computer Science,Xi’an University of Posts and Telecommunications,Xi’an 710121,China
    3 Shaanxi Key Laboratory of Network Data Intelligent Processing,Xi’an University of Posts and Telecommunications,Xi’an 710121,China
  • Received:2022-02-17 Revised:2022-06-05 Published:2022-12-14
  • About author:WANG Bo,born in 1976,Ph.D,lectu-rer,is a member of China Computer Fe-deration.His main research interests include system reliability,software engineering and human-computer interaction.HUA Qing-yi,born in 1956,Ph.D,professor.His main research interests include human-computer interaction,re-commender systems,and user interface engineering.
  • Supported by:
    Key Science and Technology Program of Shaanxi Province,China(2016GY-123),Key Research and Development Projects of Shaanxi Province(2020GY-210),Industrial Science and Technology Research Project of Henan Province(212102210418) and National Natural Science Foundation of China(61272286).

摘要: 可靠性、可用性和安全性是软件质量度量的3个重要指标,而软件可靠性是软件质量最重要的指标。传统的软件可靠性评估将软件系统看作一个整体或者将软件系统调用结构视为静态结构。现今的软件结构发生了很多的改变,典型的有自主、协同、演进、动态和自适应等特征,已经渗入到当前的复杂网络结构软件系统中,传统的可靠性评估和预测方法已经不能适应当前复杂网络生态环境下的软件系统。在当前“软件定义一切”的高速信息化社会中,海量的信息系统产生了大规模的数据资源。现代信息系统的异构性、并行性、复杂性以及巨大的规模导致了日志资源的多样和复杂,基于系统日志的精准分析和故障预测对构建安全可靠的系统尤为重要。现有文献研究故障预测和软件可靠性的技术颇多,但是较少针对海量日志以及复杂构件进行软件即时可靠性度量。文中在系统分析日志解析、特征提取、故障检测、预测评估到即时可靠性计算的日志处理全过程中,使用集成学习模型对海量系统日志进行分析和故障预测,与传统的机器学习方法进行了比较,提高了故障预测的准确率、召回率和F1值;针对预测召回率低的情况,采用召回率修正即时可靠性的评估,较大程度地提高了即时可靠性的精度;根据个体的可靠性,通过基于马尔可夫理论的系统可靠性度量微服务复合构件的可靠性,从而为智能化运维提供精确的数据基础和故障定位依据。

关键词: 日志解析, 故障检测, 可靠性评估, 根因分析, 集成学习, 复杂构件

Abstract: Reliability,usability and security are three important indicators of software quality measurement,and software reliability is the most important indicator.Software system is regarded as a whole or viewed invocation structure of software as static structure in traditional software reliability evaluation and prediction.Today’s software architecture has changed significantly.Typical features such as autonomy,coordination,evolution,dynamic and adaptive have been infiltrated into the current complex network software system.Traditional reliability evaluation and prediction methods cannot adapt to such software architecture or environment.Currently,in the society of high-speed information,“software defines everything”.Massive information systems ge-nerate large-scale data resources.The diversity and complexity of log resources are the results of heterogeneity,parallelism,complexity and huge scale of modern information systems.Accurate analysis and anomaly prediction based on logs are particularly important for building safe and reliable systems.There are a lot of research on anomaly prediction and software reliability in the existing literatures,but there is little about real-time software reliability measurement for massive logs and complex network component systems.Accordingly,based on the complete procedures of log processing,from its analysis,feature extraction,anomaly detection and prediction evaluation to real-time reliability evaluation,this paper uses ensemble learning model to analyze and predict anomaly of the massive system logs.Comparisons with the traditional machine learning methods are made to improve the accuracy,recall rate and F1 value of anomaly prediction.The evaluation result is used to correct the real-time reliability in view of the low predicted recall rate,which greatly improves the accuracy of real-time reliability.According to the individual reliability,the system reliability based on Markov theory is used to measure the reliability of microservice composite components,so as to provide accurate data basis and anomaly location basis for intelligent operation and maintenance.

Key words: Log parsing, Anomaly detection, Reliability evaluation, Root cause analysis, Ensemble learning, Complex components

中图分类号: 

  • TP311.5
[1]HUANG H,ZHANG H,SHAO D.Practical Impacts of Automation Tools in Support of DevOps in China[J].Journal of Software,2019,30(10):3056-3070.
[2]SUN C A,JIN M Z,LIU C.Overviews on software Architecture[J].Journal of Software,2002,13(7):1228-1237.
[3]SUN W X,ZHAI Y L,BAO T H,et al.A Microservices Oriented Edge Computing Framework for LVC Simulation in the IoT Era[C]//Proceedings of the 11th International Conference on Computer Modeling and Simulation(ICCMS 2019).Association for Computing Machinery,New York,USA,2019:190-195.
[4]ZHU J,HE S,LIU J,et al.Tools and Benchmarks for Automated Log Parsing[C]//2019 IEEE/ACM 41st International Conference on Software Engineering.Software Engineering in Practice(ICSE-SEIP).2019:121-130.
[5]HE S,ZHU J,HE P,et al.Experience Report:System LogAnalysis for Anomaly Detection[C]//2016 IEEE 27th International Symposium on Software Reliability Engineering(ISSRE).2016:207-218.
[6]VERVAET A.MoniLog:An Automated Log-Based AnomalyDetection System for Cloud Computing Infrastructures[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).2021:2739-2743.
[7]MALLIKARJUN B C,ANNAPOORNESHWARI K,MADHANY M,et al.Intelligent Automated Text Processing System-An NLP Based Approach[C]//2020 5th International Conference on Communication and Electronics Systems(ICCES).2020:1026-1030.
[8]RAND J,MIRANSKYY A.On Automatic Parsing of Log Records[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering:New Ideas and Emerging Results(ICSE-NIER).2021:41-45.
[9]XIAO T,QUAN Z,WANG Z J,et al.LPV:A Log Parser Based on Vectorization for Offline and Online Log Parsing[C]//2020 IEEE International Conference on Data Mining(ICDM).2020:1346-1351.
[10]YADAV R B,KUMAR P S,DHAVALE S V.A Survey on Log Anomaly Detection using Deep Learning[C]//2020 8th International Conference on Reliability,Infocom Technologies and Optimization(Trends and Future Directions)(ICRITO).2020:1215-1220.
[11]HAN S B,WU Q H,ZHANG H,et al.Log-Based Anomaly Detection With Robust Feature Extraction and Online Learning[J].IEEE Transactions on Information Forensics and Security,2021,16:2300-2311.
[12]MARLAITHONG T,BARROSO V C,PHUNCHONGHARN P.A Hyperparameter Tuning Approach for an Online Log Parser[C]//18th International Conference on Electrical Enginee-ring/Electronics,Computer,Telecommunications and Information Technology(ECTI-CON).2021:1036-1040.
[13]WON H,KIM Y.Performance Analysis of Machine Learning Based Fault Detection for Cloud Infrastructure[C]//2021 International Conference on Information Networking(ICOIN).2021:877-880.
[14]YANG L,CHEN J J,WANG Z,et al.Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).2021:1448-1460.
[15]HAN Y,MA Y,WANG J,WANG J.Research on ensemblemodel of anomaly detection based on autoencoder[C]//2020 IEEE 20th International Conference on Software Quality,Reliability and Security(QRS).2020:414-417.
[16]PROVOTAR O I,LINDER Y M,VERES M M.Unsupervised Anomaly Detection in Time Series Using LSTM-Based Auto-encoders[C]//2019 IEEE International Conference on Advanced Trends in Information Theory(ATIT).2019:513-517.
[17]PU G,WANG L J,SHEN J,et al.A Hybrid Unsupervised Clustering-Based Anomaly Detection Method[J].Tsinghua Science and Technology,2021,26(2):146-153.
[18]YILMAZ S F,KOZAT S S.Robust Anomaly Detection via Sequential Ensemble Learning[C]//2020 28th Signal Processing and Communications Applications Conference(SIU).2020:1-4.
[19]DANG Y,LIN Q,HUANG P.AIOps:Real-World Challenges and Research Innovations[C]//IEEE/ACM 41st International Conference on Software Engineering:Companion Proceedings(ICSE-Companion).2019:4-5.
[20]HE P,ZHU J,HE S,et al.Towards Automated Log Parsing for Large-Scale Log Data Analysis[J].IEEE Transactions on Dependable and Secure Computing,2018,15(6):931-944.
[21]ZHANG Y,TIČO P,LEONARDIS A,et al.A Survey on Neural Network Interpretability[J].IEEE Transactions on Emerging Topics in Computational Intelligence,2021,5(5):726-742.
[22]LATHA R S,SREEKANTH G R R,SUGANTHE R C,et al.A survey on the applications of Deep Neural Networks[C]//2021 International Conference on Computer Communication and Informatics(ICCCI).2021:1-3.
[23]RINCY T N,GUPTA R.Ensemble Learning Techniques and its Efficiency in Machine Learning:A Survey[C]//2nd Inter-national Conference on Data,Engineering and Applications(IDEA).2020:1-6.
[24]SRAVANTHI N,VENKAT M L,HARSHINI S,et al.An Ensemble Approach to Predict Weather Forecast using Machine Learning[C]//2020 International Conference on Smart Electronics and Communication(ICOSEC).2020:436-440.
[25]WANG K,LIU X,ZHAO J,et al.Application Research of Ensemble Learning Frameworks[C]//2020 Chinese Automation Congress(CAC).2020:5767-5772.
[26]TANG X,ASTLE Y S,FREEMAN C.Deep Anomaly Detection with Ensemble-Based Active Learning[C]//2020 IEEE International Conference on Big Data(Big Data).2020:1663-1670.
[27]BECKER S,SCHMIDT F A,GULENKO,et al.Towards AIOps in Edge Computing Environments[C]//2020 IEEE International Conference on Big Data(Big Data).2020:3470-3475.
[28]ANNA L,SHELLY G,ELLIOT K K.AIOps for a Cloud Object Storage Service[C]//2019 IEEE International Congress on Big Data(Big Data Congress).2019:165-169.
[29]VAARANDI R.A data clustering algorithm for mining patterns from event logs[C]//Proceedings of the 3rd IEEE Workshop on IP Operations & Management(IPOM 2003)(IEEE Cat.No.03EX764).2003:119-126.
[30]VAARANDI R,PIHELGAS M.LogCluster-A data clusteringand pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).2015:1-7.
[31]ADETOKUNBO A O,MAKANJU A.Nur Zincir-Heywood,and Evangelos E.Milios.Clustering event logs using iterative partitioning[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’09).Association for Computing Machinery,New York,USA,2009:1255-1264.
[32]FU Q,LOU J,WANG Y,et al.Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis[C]//2009 Ninth IEEE International Conference on Data Mining.2009:149-158.
[33]DU M,LI F.Spell:Streaming Parsing of System Event Logs[C]//2016 IEEE 16th International Conference on Data Mining(ICDM).2016:859-864.
[34]HE P,ZHU J,ZHENG Z,et al.Drain:An Online Log Parsing Approach with Fixed Depth Tree[C]//2017 IEEE International Conference on Web Services(ICWS).2017:33-40.
[35]TIMČENKO V,GAJIN S.Ensemble classifiers for supervised anomaly based network intrusion detection[C]//2017 13th IEEE International Conference on Intelligent Computer Communication and Processing(ICCP).2017:13-19.
[36]DU M,LI F F,ZHENG G N,et al.DeepLog:Anomaly Detection and Diagnosis from System Logs through Deep Learning[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(CCS ’17).Association for Computing Machinery,New York,USA,2017:1285-1298.
[37]LIU D P,ZHAO Y J,XU H W,et al.Opprentice:Towards Practical and Automatic Anomaly Detection Through Machine Learning[C]//Proceedings of the 2015 Internet Measurement Conference(IMC ’15).Association for Computing Machinery,New York,USA,2015:211-224.
[38]NEDELKOSKI S,CARDOSO J,KAO O.Anomaly Detectionfrom System Tracing Data Using Multimodal Deep Learning[C]//2019 IEEE 12th International Conference on Cloud Computing(CLOUD).2019:179-186.
[39]DROMARD J,ROUDIÈRE GOWEZARSKI P.Online and Sca-lable Unsupervised Network Anomaly Detection Method[J].IEEE Transactions on Network and Service Management,2017,14(1):34-47.
[40]YOO T H.The Infinite NHPP Software Reliability Model based on Monotonic Intensity Function[J/OL].https://www.researchgate.net/publication/282984295_The_Infinite_NHPP_Software_Reliability_Model_based_on_Monotonic_Intensity_Function.
[41]JELINSKI Z,MORANDA P.Software Reliability Research[C]//Statistical Computer Performance Evaluation.1972:465-484.
[42]YI J,LUO X,AO J X,et al.Software fault classification prediction model based on Markov chain[J].Journal of University of Chinese Academy of Sciences,2013,30(4):562-567.
[43]ZHANG H,ZHANG X.Data Mining Static Code Attributes to Learn Defect Predictors[J].IEEE Transactions on Software Engineering,2007,33(9):635-637.
[44]MUHAMMED M Ö,UNAL C,AHMET Z.A novel defect prediction method for web pages using k-means++[J].Expert Systems with Applications,2015,42(19):6496-6506.
[45]JIN C,JIN S W.Software reliability prediction model based on support vector regression with improved estimation of distribution algorithms[J].Applied Soft Computing,2014,15:113-120.
[46]OKAMURA H,DOHI T.A Novel Framework of Software Reliability Evaluation with Software Reliability Growth Models and Software Metrics[C]//IEEE 15th International Symposium on High-Assurance Systems Engineering.2014:97-104.
[47]HARIKESH B Y,DILIP K Y.A fuzzy logic based approach for phase-wise software defects prediction using software metrics[J].Information and Software Technology,2015,63(C):44-57.
[48]SELIYA N,KHOSHGOFTAAR T M.Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2007,37(2):201-211.
[49]SELIYA N,KHOSHGOFTAAR T M.Software quality estimation with limited fault data:a semi-supervised learning perspective[J].Software Quality Journal,2007,15(3):327-344.
[50]XU W,HUANG L,FOX A,et al.Detecting large-scale system problems by mining console logs[C]//Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles(SOSP ’09).Association for Computing Machinery,New York,USA,2009:117-132.
[51]KUBAT P.Assessing reliability of modular software[J].Operations Research Letters,1989,8(1):35-41.
[52]GOKHALE S S,TRIVEDI K S.Analytical Models for Architecture-Based Software Reliability Prediction:A Unification Framework[J].IEEE Transactions on Reliability,2006,55(4):578-590.
[53]LI B X,SU Z Y,ZHOU Y,et al.A user-oriented Web service reliability model[C]//IEEE International Conference on Systems,Man and Cybernetics.2008:3612-3617.
[54]WEI Y,SHEN X.Heterogeneous Architecture-Based Software Reliability Estimation:Case Study[C]//Third International Conference on Convergence and Hybrid Information Techno-logy.2008:286-290.
[55]GOKHALE S S,MICHAELLI R S.A simulation approach tostructure-based software reliability analysis[J].IEEE Transactions on Software Engineering,2005,31(8):643-656.
[56]GUO Y.Research on Reliability Evaluation for Compnnent-based Software System[D].Harbin:Harbin Institute of Technology,2013.
[57]WANG W L,WU Y,CHEN M H.An architecture-based software reliability model[C]//Proceedings 1999 Pacific Rim International Symposium on Dependable Computing.1999:143-150.
[1] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[2] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[3] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[4] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[5] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[6] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[7] 刘家希, 吴娜, 丁飞.
车联网中基于航位推算的故障检测方法
Fault Detection Based on Dead Reckoning in VANETs
计算机科学, 2022, 49(12): 319-325. https://doi.org/10.11896/jsjkx.220200155
[8] 王先圣, 严珂.
基于联邦学习的暖通空调系统故障检测与诊断
Fault Detection and Diagnosis of HVAC System Based on Federated Learning
计算机科学, 2022, 49(12): 74-80. https://doi.org/10.11896/jsjkx.220700280
[9] 王迎晖, 李维华, 李川, 陈伟, 文俊颖.
基于注意力机制与集成学习的甲型H5N1流感病毒抗原相似性预测
Prediction of Antigenic Similarity of Influenza A/H5N1 Virus Based on Attention Mechanism and Ensemble Learning
计算机科学, 2022, 49(11A): 210900032-6. https://doi.org/10.11896/jsjkx.210900032
[10] 包春晖, 庄毅, 郭黎烨.
一种面向SDN的移动网络可靠性评估算法
SDN Oriented Mobile Network Reliability Evaluation Algorithm
计算机科学, 2022, 49(11A): 211000080-8. https://doi.org/10.11896/jsjkx.211000080
[11] 徐坤财, 冯宝, 陈业航, 刘昱, 周皓阳, 陈相猛.
结合深度学习与改进的极限学习机的集成学习胸腺瘤CT图像预测方法
Thymoma CT Image Prediction Method Based on Deep Learning and Improved Extreme Learning Machine Ensemble Learning
计算机科学, 2022, 49(11A): 211200097-6. https://doi.org/10.11896/jsjkx.211200097
[12] 魏军胜, 刘琰, 陈静, 段顺然.
一种自适应权重的多分类通用集成方法
Universal Multi-class Ensemble Method with Self Adaptive Weights
计算机科学, 2022, 49(11): 212-220. https://doi.org/10.11896/jsjkx.210900054
[13] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[14] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[15] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!