计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 65-72.doi: 10.11896/jsjkx.230900161
卢家伟1, 卢士达2, 刘思思2, 吴承荣1
LU Jiawei1, LU Shida2, LIU Sisi2, WU Chengrong1
摘要: 日志解析是一种从原始日志文件中提取有效信息的技术,它可以用于系统故障诊断、性能分析、安全审计等领域。日志解析的主要挑战在于日志数据的非结构化、多样性和动态性。不同的系统和应用程序可能使用不同的日志格式,随着时间的推移,日志格式也会发生变化。文中提出一种能够自适应不同日志源和日志格式变化的在线日志解析方法BertLP,它使用预训练语言模型Bert,并结合自适应聚类算法对日志中的单词进行静动态识别,从而对日志进行分组生成日志模板。BertLP方法不需要人工定义日志模板或正则表达式,也不需要对单词进行频率统计,而是通过学习日志消息的语义和结构特征,来自动识别日志字段和类型。在多个公开日志数据集上的对比实验显示,BertLP方法在日志解析的准确率上比现有最佳方法提高了6.1%,并且在日志解析任务上表现更好。
中图分类号:
[1] YU S,CHEN N,WU Y,et al.Self-supervised log parsing using semantic contribution difference[J].Journal of Systems and Software,2023,200:111646. [2] ZHOU R,HAMDAQA M,CAI H,et al.Mobilogleak:A preli-minary study on data leakage caused by poor logging practices[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2020:577-581. [3] AMAR A,RIGBY P C.Mining historical test logs to predictbugs and localize faults in the test logs[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:140-151. [4] EL-MASRI D,PETRILLO F,GUÉHÉNEUC Y G,et al.A systematic literature review on automated log abstraction techniques[J].Information and Software Technology,2020,122:106276. [5] CHEN R,ZHANG S,LI D,et al.Logtransfer:Cross-system log anomaly detection for software systems with transfer learning[C]//2020 IEEE 31st International Symposium on Software Reliability Engineering(ISSRE).IEEE,2020:37-47. [6] HE S,HE P,CHEN Z,et al.A survey on automated log analysisfor reliability engineering[J].ACM Computing Surveys(CSUR),2021,54(6):1-37. [7] VAARANDI R.A data clustering algorithm for mining patterns from event logs[C]//Proceedings of the 3rd IEEE Workshop on IP Operations & Management(IPOM 2003)(IEEE Cat.No.03EX764).IEEE,2003:119-126. [8] VAARANDI R,PIHELGAS M.Logcluster-a data clusteringand pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).IEEE,2015:1-7. [9] DAI H,LI H,CHEN C S,et al.Logram:Efficient Log Parsing Usingn-Gram Dictionaries[J].IEEE Transactions on Software Engineering,2020,48(3):879-892. [10] MIZUTANI M.Incremental mining of system log format[C]//2013 IEEE International Conference on Services Computing.IEEE,2013:595-602. [11] SHIMA K.Length matters:Clustering system log messagesusing length of words[J].arXiv:1611.03213,2016. [12] DU M,LI F.Spell:Online streaming parsing of large unstruc-tured system logs[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(11):2213-2227. [13] HE P,ZHU J,ZHENG Z,et al.Drain:An online log parsing approach with fixed depth tree[C]//2017 IEEE International Conference on Web Services(ICWS).IEEE,2017:33-40. [14] SEDKI I,HAMOU-LHADJ A,AIT-MOHAMED O,et al.AnEffective Approach for Parsing Large Log Files[C]//2022 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2022:1-12. [15] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018. [16] STROBELT H,HOOVER B,SATYANARAYAN A,et al.LMdiff:A visual diff tool to compare language models[J].ar-Xiv:2111.01582,2021. [17] GUO H,YUAN S,WU X.Logbert:Log anomaly detection via bert[C]//2021 International Joint Conference on Neural Networks(IJCNN).IEEE,2021:1-8. [18] LEE Y,KIM J,KANG P.Lanobert:System log anomaly detection based on bert masked language model[J].Applied Soft Computing,2023,146:110689. [19] OLINER A,STEARLEY J.What supercomputers say:A study of five system logs[C]//37th annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN’07).IEEE,2007:575-584. [20] ZHU J,HE S,HE P,et al.Loghub:A large collection of system log datasets for ai-driven log analytics[C]//2023 IEEE 34th International Symposium on Software Reliability Engineering(ISSRE).IEEE,2023:355-366. [21] ZHANG T,QIU H,CASTELLANO G,et al.System Log Parsing:A Survey[J].IEEE Transactions on Knowledge and Data Engineering,2022,35(8):8596-8614. [22] LANDAUER M,ONDER S,SKOPIK F,et al.Deep learning for anomaly detection in log data:A survey[J].Machine Learning with Applications,2023,12:100470. [23] MACBETH G,RAZUMIEJCZYK E,LEDESMA R D.Cliff’s Delta Calculator:A non-parametric effect size program for two groups of observations[J].Universitas Psychologica,2011,10(2):545-555. |
|